Forbes Global 2000 Companies


Introduction

The Forbes Global 2000 is an annual ranking of the top 2,000 public companies in the world based on a mix of four metrics: sales, profit, assets and market value. Methodology can be viewed here.


Technologies

Python, Pandas, Numpy, and Matplotlib.


Goal

There are 5 interesting questions that I wish to explore to satisfy my curiosity:

  1. The 5 most ranked countries.
  2. The 5 most frequent headquarted states in the US.
  3. Visualize industry percentage in a pie chart.
  4. The top 5 US Software & Programming companies with the highest ROA.
  5. The 5 most profitable companies in the world.

You can download the dataset here (original or what we will be using).
I hope you are excited about this exploration like I do, let's begin!

Exploration 1: The 5 most ranked countries

Let's first import pandas and then read in the csv file

import pandas as pd
forbes = pd.read_csv("forbes2000_2018.csv")

We will use .value_counts() method to sort the "country" column from the most frequent country to the least.
Use .head() to get the top 5 rows, then display the result using print(). Easy.
print(forbes["country"].value_counts().head())
The .head() method, by default, retrieves the top 5 rows.
For example, if we want the top 7 rows instead, we can do .head(7).
Here's the answer to our first question!

X1 output



Exploration 2: The 5 most frequent headquarted states in the US

First, we save all the rows that have "United States" as a value in the column "country" into a variable called "usa".

usa = forbes.loc[forbes["country"] == "United States"]
Then we simply sort and retrieve the top rows like we did in the first exploration!
print(usa["state"].value_counts().head(10))
But here we will retrieve the top 10 rows instead of just 5 because my curiosity urges me to :P.

X2 output



Exploration 3: Visualize industry percentage in a pie chart

Let's first see how many unique industries there are in the dataset.

print(forbes["industry"].unique().count)
X3 # of unique industries

Can you imagine how chaotic our pie chart will look like
when there are 83 slices of pie with so many labels in a circle?
But hell, let's do it anyway ... coz ... why not? There's nothing to be ashamed of here :D.

We will use matplotlib library to create the chart. Let's import it now.
import matplotlib.pyplot as plt
To create a pie chart, we need all the labels and sizes for each slice of pie.
labels = forbes["industry"].value_counts().index.tolist()   # ordered from most to least frequent
sizes = forbes["industry"].value_counts().tolist()          # sizes in % unit

I wanna "explode" the biggest slice of the pies to add some spice to the visual.
explode = [0.1]      # explode only the biggest slice, 0.1 to explode, 0 for the non-explodes
for i in range (len(labels)-1):
    explode.append(0)

Now that we have all we need. Let's draw and display our pie chart!
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', startangle=90)
ax1.axis('equal')    # ensures that pie is drawn as a circle
plt.show()

I hope you are ready for what is about to come.
X3 output
Not too bad after all.
We also get to visualize the industries that top the list and roughly get a sense of the distribution of industry percentages. How nice!



Exploration 4:
The top 5 US Software & Programming companies with the highest ROA

This one is also easy. Let's first define ROA.
ROA, stands for "return on assets", simply put, is a ratio that tells how well a company can generate profits with the amount of assets it has.
X4 ROA formula
To begin with, we save all the rows that represent US companies that are in the "Software & Programming" industry into "usa_software".

usa_software = forbes.loc[ ((forbes["country"] == "United States") & (forbes["industry"] == "Software & Programming")) ]
Then we compute ROA for each of these companies and add the ROAs as a new column in "usa_software".
usa_software["roa"] = forbes["profits"] / forbes["assets"]
Now we sort "usa_software" by the new ROA column in descending order, and retrieve the top 5 rows. That's all!
print(usa_software.sort_values(by=["roa"], ascending=False).head())
Our result:
X4 ROA formula



5. The 5 most profitable companies in the world

ROA is about generating profits out of available assets.
Profitability, on the other hand, is about how efficient a company makes its money.
Having known that profits is revenue that remains after expenses...

X5 profitability formula

Like in the previous exploration. We simply calculate the profitabilities, sort in descending order and take the top 5 rows.

forbes["profitability"] = forbes["profits"] / forbes["revenue"]
print(forbes.sort_values(by=["profitability"], ascending=False).head())

and we'll get:
X5 output




The End.
I hope you enjoyed the reading. <3