A Complete Guide to Dissertation Data Analysis

The analysis chapter is one of the most important parts of a dissertation where you demonstrate the unique research abilities. That is why it often accounts for up to 40% of the total mark. Given the significance of this chapter, it is essential to build your skills in dissertation data analysis.

Typically, the analysis section provides an output of calculations, interpretation of attained results and discussion of these results in light of theories and previous empirical evidence. Oftentimes, the chapter provides qualitative data analysis that do not require any calculations. Since there are different types of research design, let’s look at each type individually.

1. Types of Research

The dissertation topic you have selected, to a considerable degree, informs the way you are going to collect and analyse data. Some topics imply the collection of primary data, while others can be explored using secondary data. Selecting an appropriate data type is vital not only for your ability to achieve the main aim and objectives of your dissertation but also an important part of the dissertation writing process since it is what your whole project will rest on.

Selecting the most appropriate data type for your dissertation may not be as straightforward as it may seem. As you keep diving into your research, you will be discovering more and more details and nuances associated with this or that type of data. At some point, it is important to decide whether you will pursue the qualitative research design or the quantitative research design.

1.1. Qualitative vs Quantitative Research

1.1.1. Quantitative Research

Quantitative data is any numerical data which can be used for statistical analysis and mathematical manipulations. This type of data can be used to answer research questions such as ‘How often?’, ‘How much?’, and ‘How many?’. Studies that use this type of data also ask the ‘What’ questions (e.g. What are the determinants of economic growth? To what extent does marketing affect sales? etc.).

An advantage of quantitative data is that it can be verified and conveniently evaluated by researchers. This allows for replicating the research outcomes. In addition, even qualitative data can be quantified and converted to numbers. For example, the use of the Likert scale allows researchers not only to properly assess respondents’ perceptions of and attitudes towards certain phenomena but also to assign a code to each individual response and make it suitable for graphical and statistical analysis. It is also possible to convert the yes/no responses to dummy variables to present them in the form of numbers. Quantitative data is typically analysed using dissertation data analysis software such as Eviews, Matlab, Stata, R, and SPSS.

On the other hand, a significant limitation of purely quantitative methods is that social phenomena explored in economic and behavioural sciences are often complex, so the use of quantitative data does not allow for thoroughly analysing these phenomena. That is, quantitative data can be limited in terms of breadth and depth as compared to qualitative data, which may allow for richer elaboration on the context of the study.

1.1.2. Qualitative Data

Studies that use this type of data usually ask the ‘Why’ and ‘How’ questions (e.g. Why does social media marketing is more effective than traditional marketing? How do consumers make their purchase decisions?). This is non-numerical primary data represented mostly by opinions of relevant persons.

Qualitative data also includes any textual or visual data (infographics) that have been gathered from reports, websites and other secondary sources that do not involve interactions between the researcher and human participants. Examples of the use of secondary qualitative data are texts, images and diagrams you can use in SWOT analysis, PEST analysis, 4Ps analysis, Porter’s Five Forces analysis, most types of Strategic Analysis, etc. Academic articles, journals, books, and conference papers are also examples of secondary qualitative data you can use in your study.

The analysis of qualitative data usually provides deep insights into the phenomenon or issue being under study because respondents are not limited in their ability to give detailed answers. Unlike quantitative research, collecting and analysing qualitative data is more open-ended in eliciting the anecdotes, stories, and lengthy descriptions and evaluations people make of products, services, lifestyle attributes, or any other phenomenon. This is best used in social studies including management and marketing.

It is not always possible to summarise qualitative data as opinions expressed by individuals are multi-faceted. This to some extent limits the dissertation data analysis as it is not always possible to establish cause-and-effect links between factors represented in a qualitative manner. This is why the results of qualitative analysis can hardly be generalised, and case studies that explore very narrow contexts are often conducted.

For qualitative data analysis, you can use tools such as nVivo and Tableau.

1.2. Primary vs Secondary Research

1.2.1. Primary Data

Primary data is data that had not existed prior to your research and you collect it by means of a survey or interviews for the dissertation data analysis chapter. Interviews provide you with the opportunity to collect detailed insights from industry participants about their company, customers, or competitors. Questionnaire surveys allow for obtaining a large amount of data from a sizeable population in a cost-efficient way. Primary data is usually cross-sectional data (i.e., the data collected at one point of time from different respondents). Time-series are found very rarely or almost never in primary data. Nonetheless, depending on the research aims and objectives, certain designs of data collection instruments allow researchers to conduct a longitudinal study.

1.2.2. Secondary data

This data already exist before the research as they have already been generated, refined, summarized and published in official sources for purposes other than those of your study study. Secondary data often carries more legitimacy as compared to primary data and can help the researcher verify primary data. This is the data collected from databases or websites; it does not involve human participants. This can be both cross-sectional data (e.g. an indicator for different countries/companies at one point of time) and time-series (e.g. an indicator for one company/country for several years). A combination of cross-sectional data and time-series data is panel data. Therefore, all a researcher needs to do is to find the data that would be most appropriate for attaining the research objectives.

Examples of secondary quantitative data are share prices; accounting information such as earnings, total asset, revenue, etc.; macroeconomic variables such as GDP, inflation, unemployment, interest rates, etc.; microeconomic variables such as market share, concentration ratio, etc. Accordingly, dissertation topics that will most likely use secondary quantitative data are FDI dissertations, Mergers and Acquisitions dissertations, Event Studies, Economic Growth dissertations, International Trade dissertations, Corporate Governance dissertations.

Two main limitations of secondary data are the following. First, the freely available secondary data may not perfectly suit the purposes of your study so that you will have to additionally collect primary data or change the research objectives. Second, not all high-quality secondary data is freely available. Good sources of financial data such as WRDS, Thomson Bank Banker, Compustat and Bloomberg all stipulate pre-paid access which may not be affordable for a single researcher.

1.3. Quantitative or Qualitative Research… or Both?

Once you have formulated your research aim and objectives and reviewed the most relevant literature in your field, you should decide whether you need qualitative or quantitative data.

If you are willing to test the relationship between variables or examine hypotheses and theories in practice, you should rather focus on collecting quantitative data. Methodologies based on this data provide cut-and-dry results and are highly effective when you need to obtain a large amount of data in a cost-effective manner. Alternatively, qualitative research will help you better understand meanings, experience, beliefs, values and other non-numerical relationships.

While it is totally okay to use either a qualitative or quantitative methodology, using them together will allow you to back up one type of data with another type of data and research your topic in more depth. However, note that using qualitative and quantitative methodologies in combination can take much more time and effort than you originally planned.

2. Types of Analysis

2.1. Basic Statistical Analysis

The type of statistical analysis that you choose for the results and findings chapter depends on the extent to which you wish to analyse the data and summarise your findings. If you do not major in quantitative subjects but write a dissertation in social sciences, basic statistical analysis will be sufficient. Such an analysis would be based on descriptive statistics such as the mean, the median, standard deviation, and variance. Then, you can enhance the statistical analysis with visual information by showing the distribution of variables in the form of graphs and charts. However, if you major in a quantitative subject such as accounting, economics or finance, you may need to use more advanced statistical analysis.

2.2. Advanced Statistical Analysis

In order to run an advanced analysis, you will most likely need access to statistical software such as Matlab, R or Stata. Whichever program you choose to proceed with, make sure that it is properly documented in your research. Further, using an advanced statistical technique ensures that you are analysing all possible aspects of your data. For example, a difference between basic regression analysis and analysis at an advanced level is that you will need to consider additional tests and deeper explorations of statistical problems with your model. Also, you need to keep the focus on your research question and objectives as getting deeper into statistical details may distract you from the main aim. Ultimately, the aim of your dissertation is to find answers to the research questions that you defined.

Another important aspect to consider here is that the results and findings section is not all about numbers. Apart from tables and graphs, it is also important to ensure that the interpretation of your statistical findings is accurate as well as engaging for the users. Such a combination of advanced statistical software along with a convincing textual discussion goes a long way in ensuring that your dissertation is well received. Although the use of such advanced statistical software may provide you with a variety of outputs, you need to make sure to present the analysis output properly so that the readers understand your conclusions.

3. Examples of Methods of Analysis

3.1. Event Study

If you are studying the effects of particular events on prices of financial assets, for example, it is worth to consider the Event Study Methodology. Events such as mergers and acquisitions, new product launches, expansion into new markets, earnings announcements and public offerings can have a major impact on stock prices and valuation of a firm. Event studies are methods used to measure the impact of a particular event or a series of events on the market value. The concept behind this is to try to understand whether sudden and abnormal stock returns can be attributed to market information pertaining to an event.

Event studies are based on the efficient market hypothesis. According to the theory, in an efficient capital market, all the new and relevant information is immediately reflected in the respective asset prices. Although this theory is not universally applicable, there are many instances in which it holds true. An event study implies a step-by-step analysis of the impact that a particular announcement has on a company’s valuation. In normal conditions, without the influence of the analysed event, it is assumed that expected returns on a stock would be determined by the risk-free rate, systematic risk of the stock and risk premium required by investors. These conditions are measured by the capital asset pricing model (CAPM).

There can primarily be three types of announcements which can constitute event studies. These include corporate announcements, macroeconomic announcements, as well as regulatory events. As the name suggests, corporate announcements could include bankruptcies, asset sales, M&As, credit rating downgrades, earnings announcements and announcements of dividends. These events usually have a major impact on stock prices simply because they are directly interlinked with the company. Macroeconomic announcements can include central bank announcements of changes in interest rates, an announcement of inflation projections and economic growth projections. Finally, regulatory announcements such as policy changes and new laws announcement can also impact the stock prices of companies, and therefore can be measured using the method of event studies.

A critical issue in event studies is choosing the right event window during which the analysed announcements are assumed to produce the strongest effect on share prices. According to the efficient market hypothesis, no statistically significant abnormal returns connected with any events would be expected. However, in reality, there could be rumours before official announcements and some investors may act on such rumours. Moreover, investors may react at different times due to differences in speed of information processing and reaction. In order to account for all these factors, event windows usually capture a short period before the announcement to account for rumours and an asymmetrical period after the announcement.

In order to make event studies stronger and statistically meaningful, a large number of similar or related cases are analysed. Then, abnormal returns are cumulated, and their statistical significance is assessed. The t-statistic is often used to evaluate whether the average abnormal returns are different from zero. So, researchers who use event studies are concerned not only with the positive or negative effects of specific events but also with the generalisation of the results and measuring the statistical significance of abnormal returns.

3.2. Regression Analysis

Regression analysis is a mathematical method applied to determine how explored variables are interconnected. In particular, the following questions can be answered. Which factors are the most influential ones? Which of them can be ignored? How do the factors interact with one another? And the main question, how significant are the findings?

The type most often applied in the dissertation studies is the ordinary least squares (OLS) regression analysis that assesses parameters of linear relationships between explored variables. Typically, three forms of OLS analysis are used.

Longitudinal analysis is applied when a single object with several characteristics is explored over a long period of time. In this case, observations represent the changes of the same characteristics over time. Examples of longitudinal samples are macroeconomic parameters in a particular country, preferences and changes in health characteristics of particular persons during their lives etc. Cross-sectional studies on the contrary, explore characteristics of many similar objects such as respondents, companies, countries, students over cities in a certain moment of time. The main similarity between longitudinal and cross-sectional studies is that the data over one dimension, namely across periods of time (days, weeks, years) or across objects, respectively.

However, it is often the case that we need to explore data that change over two dimensions, both across objects and periods of time. In this case, we need to use a panel regression analysis. Its main distinction from the two mentioned above is that specifics of each object (person, company, country) are accounted for.

The common steps of the regression analysis are the following:

  • Start with descriptive statistics of the data. This is done to indicate the scope of the data observations included in the sample and identify potential outliers. A common practice is to get rid of the outliers to avoid the distortion of the analysis results.
  • Estimate potential multicollinearity. This phenomenon is connected with strong correlation between explanatory variables. Multicollinearity is an undesirable feature of the sample as regression results, in particular the significance of certain variables, may be distorted. Once multicollinearity is detected, the easiest way to eliminate it is to omit one of the correlated variables.
  • Run Regressions. First, the overall significance of the model is estimated using the F-statistic. After that, the significance of particular variable coefficient is assessed using t-statistics.
  • Don’t forget about diagnostic tests. They are conducted to detect potential imperfections of the sample that could affect the regression outcomes.

Some nuances should be mentioned. When a time series OLS regression analysis is conducted, it is feasible to conduct a full battery of diagnostic tests including the test of linearity (the relationship between the independent and dependent variables should be linear); homoscedasticity (regression residuals should have the same variance); independence of observations; normality of variables; serial correlation (there should no patterns in a particular time series). These tests for longitudinal regression models are available in most software tools such as Eviews and Stata.

3.3. Vector Autoregression

A vector autoregression model (VAR) is a model often used in statistical analysis, which explores interrelationships between several variables that are all treated as endogenous. So, a specific trait of this model is that it includes lagged values of the employed variables as regressors. This allows for estimating not only the instantaneous effects but also dynamic effects in the relationships up to n lags.

In fact, a VAR model consists of k OLS regression equations where k is the number of employed variables. Each equation has its own dependent variable while the explanatory variables are the lagged values of this variable and other variables.

  • Selection of the optimal lag length

Information criteria (IC) are employed to determine the optimal lag length. The most commonly used ones are the Akaike, Hannah-Quinn and Schwarz criteria.

  • Test for stationarity

A widely used method for estimating stationarity is the Augmented Dickey-Fuller test and the Phillips-Perron test.  If a variable is non-stationary, the first difference should be taken and tested for stationarity in the same way.

  • Cointegration test

The variables may be non-stationary but integrated of the same order. In this case, they can be analysed with a Vector Error Correction Model (VECM) instead of VAR. The Johansen cointegration test is conducted to check whether the variables integrated of the same order share a common integrating vector(s). If the variables are cointegrated, VECM is applied in the following analysis instead of a VAR model. VECM is applied to non-transformed non-stationary series whereas VAR is run with transformed or stationary inputs.

  • Model Estimation

A VAR model is run with the chosen number of lags and coefficients with standard errors and respective t-statistics are calculated to assess the statistical significance.

  • Diagnostic tests

Next, the model is tested for serial correlation using the Breusch-Godfrey test, for heteroscedasticity using the Breusch-Pagan test and for stability.

  • Impulse Response Functions (IRFs)

The IRFs are used to graphically represent the results of a VAR model and project the effects of variables on one another.

  • Granger causality test

The variables may be related but there may exist no causal relationships between them, or the effect may be bilateral. The Granger test indicates the causal associations between the variables and shows the direction of causality based on interaction of current and past values of a pair of variables in the VAR system.