An introduction to BigData in Finance: the econometric point of view


Ultra-high-frequency data are probably the perfect representative for the financial markets of the advent of BigData.

It is well known that over the last decade the availability of high-frequency data has rapidly disrupted the financial industry as we knew it. What might be less known is the impact from an econometric point of view. In fact, UHF data have been able to put in jeopardy most of the model traditionally used in econometrics analysis due to their peculiar characteristics. Despite the problem posed by the data, an ever-growing interest in high-frequency data has been registered in the academic world. This attention is motivated by the intrinsic great potential that tick-by-tick observations have. The study of these data gives the possibility to profoundly comprehend the complexity of financial instruments and precisely assess the financial risk involved. Additionally, they are of direct interest for market microstructure issues: allowing to model real-time market dynamics, conduct an in-depth analysis of the market infrastructure, the mechanisms participating in the price discovery process or the strategic behaviour of market participants. Finally, practitioners determine their trading decisions by observing tick-by-tick data, as they are the original form of market prices. Hence, high-frequency data should be the primary object of research for the analysis of financial markets.

Yet, what makes the analysis of tick data so problematic? The difficulties to be faced are due to a wide range of microstructure effects that are usually summarized in the so-called “microstructure noise” ε.

We can roughly divide the microstructure characteristics generating ε into three main groups, each encompassing a wide variety of phenomena:

  • The frictions inherent the trading process (e.g. rounding errors, bid-ask bounces, trading occurring on different markets, etc.).
  • Informational effects, as the gradual response of prices to a block trade, the strategic component of the order flow, the differences in trade sizes or informational content of price changes, etc.
  • Data recoding errors or measurement errors such as prices entered as zero, missing or wrong observations, delayed recorded observations, etc.

However, among the others, the most striking defining feature of high-frequency data is the unequally spacing of observations, to the extent of having multiple transactions registered in the same second at different prices. In fact, while such cases can be due to miss-recorded observations, it is also true that the time-scale used for HFD is rounded to the millisecond (but more frequently to the second); thus can easily happen that multiple consecutive transaction occurring in-between two consecutive instant in time, say 𝑡𝑖 and 𝑡𝑖+1, are recorded with the same time stamp.

The spacing inequality of tick-by-tick data probably represents the most complex feature to deal with in an econometric contest as the random timing of events require the researcher to develop new statistical tools to handle the features proper to this kind of data in order to avoid incurring in biased estimations.

Undoubtedly, despite the great advancement made in the high-frequency field, the study of financial market with such level of accuracy will continue posing new challenges to researchers but promises to open to great findings that could strongly influence the financial industry.