Number Crunching Tools
The number crunching process can be broken down into three parts:
Data collection: The first step is collecting the data. In some cases, this can be as simple as accessing a computerized database. In others, it will require running experiments or surveys.
While collecting the data, the following choices an analyst needs to make.
a. Public company vs private company data: It is far easier to extract data of public companies than a private company.
b. Accounting vs market data: Public companies need to make regulatory disclosure of their accounting data. The market data, i.e., the market prices, bid-ask spread, trading volume, etc. can be easily found for public data. Accounting data is relatively difficult to gather for a private company. The market data of private companies is very difficult to get. This is because the private equity transactions are done in a very confidential manner and there is no regulatory requirement for the private company to disclose these data.
c. Domestic vs global data: Analysts need to make choices whether the averages or macro and micro numbers used in their analysis need to be domestic or global.
d.Quantitative vs qualitative data: Databases tend to be mostly quantitative because quantitative data is easier to store and retrieve. However, nowadays due to the surge in social media sites the development of more sophisticated techniques for reading, analyzing, and storing qualitative data are present.
Additionally, there are a few biases in data collection that analysts need to be aware about.
a. Selection bias: Since collection of data is not an easy task, this can give rise to selection bias. This means that it has been noticed that selection of a particular time period of data can alter the averages and hence change the decision making. For, example, over the last twenty years, growth stocks (such as Amazon, Tesla, etc.) have outperformed the market. However, if we see a fifty-year history, value stocks seem to have outperformed. This is called selection bias. Although this bias is unavoidable, knowing such bias exists can improve decision making through introspection.
b. Survivorship bias: These bias states that databases reflect only the survivors. For example, when we look at the returns of Sensex, we look at the return of companies that are a part of Sensex at present. However, many companies have been excluded from Sensex over the years and the present returns do not reflect the losses suffered by investors in case they were holding these stocks.
c. Noise and error: In the big data world, due to an abundance of data, analysts get distracted from data that are actually irrelevant for their decision making. This is called noise.
Data analysis: Once the data has been collected, not only does it have to be summarized and described, but you have to look for relationships in the data that you can use in your decision making. It is at this stage that statistical analysis comes into play.
Analysts and investors use statistical tools like averages, correlation, beta, standard deviation, etc. in order to analyze the data. However, there are few biases as well:
a. Averages are cursed since it is very much affected by outliers. For example, company results in the year 2020 were outliers and most companies present their average performance by excluding the year since it was an outlier.
b. Standard deviation assumes that the distribution is normally distributed. However, in real life data is seldom normally distributed. Hence it can be dangerous at times to consider normal distribution or standard deviation in calculating risk.
Presentation: Having analyzed the data, you have to present it, not only so others can see and use the information you have gleaned from the data, but so you yourself have a sense of what that information is.
From the next chapter onwards we will start by building narratives and then connecting with numbers. The learnings from the past chapters will be utilized going forward.