The modeling of the order book is certainly one of the major chal-

lenges for the contemporary econometrics. The constant and fast grow on a

world-level of the electronic platforms and the associated increasing number of

traders, led to a market of extreme complexity. The traditional trading based

on human traders has been rapidly substituted with algorithms, able to respond

and adapt to the changes in the market and trade accordingly at an extraordi-

nary speed.

As a consequence, a primary problem related to the actual volume of the trans-

action, in the sense of the number of events that take place and the amount of

information that needs to be processed. On trading platforms such as NYSE

the size of the compressed raw market data (so called ITCH feed) is likely to be

around 25GB for a single trading day. Obviously, the complexity of the order

book modeling goes beyond the theoretical ﬁnancial-econometrical issues but

begins with the feasible and eﬃcient data management problem (data accessi-

bility, information extraction, fast computation of variables that may require

to lookup in the ﬁles for millions of records for being computed). Think that

commonly the algorithms trade at a millisecond frequency and that very likely

we observe several events per millisecond, this through all the market opening

and pre-post opening hours. At a ﬁrst glance, the order book data looks like

a massive list of events with some corresponding features, of highly stochastic

nature, i.e. a chaos.

Trades (either physical persons or not) access the market by submitting or-

ders. An order expresses the willingness of the traded of buying or selling a

given quantity of a share for a given price. Orders can be submitted against

the best available price or not, we talk of market or limit orders respectively.

The purpose of the limit orders is that of trading (e.g. a buy market order

gets matched with the outstanding best sell orders, leading to a trade). On the

other hand, limit orders don’t get an automatic match since submitted at and

arbitrary price, which is not the best one. The trader therefore has the belief

(rational) that the market will move till the point that the submitted limit or-

der reached the current best price level and gets traded against a market order.

Clearly the mechanism of order submission is highly stochastic since each trader

places his orders based on his own beliefs (e.g. diﬀerent algorithms), i.e. prices,

quantities, bid-ask side and frequency of submission are completely up to the

trader. Also, a submitted order can be canceled and the cancel can be either

total (all the submitted quantity is removed) or partial. The same applies to

the trades, since the market order quantity not necessarily equals the quantity

to match, so a trade occurs but the remaining quantity remains in the queue.

Order submissions, trades and cancellations represent the three types of events

that aﬀect the order book. The time-ordered list of the events corresponds to

the so-called message book. The snapshot of the outstanding limit orders in a

given moment instead depicts the so-called order book state. The algorithms are

instructed with sophisticated machine-learning techniques to submit the three

types of messages above mentioned, ﬁghting among each other to get a trade

and submit orders and cancels based on some optimal criteria.

In this framework it is clear that the mathematical modeling of the dynamics of

the order book states and features is a non-trivial problem. There is no unique

approach to tackle this problem. There are analytic models relying on some

market-based assumptions as well as totally data-driven models that exploit

machine learning techniques to predict the short-term dynamics of the book.

Interestingly there are also models that rely on the chaos theory (using tech-

niques commonly applied to natural sciences to describe phenomena, such as

particle dynamics in gases, ruled by randomness only) trying to characterize the

chaotic and apparently illogical behavior of the book, renouncing to ﬁnd logical

patterns and to seek for ﬁnancial-justiﬁed hypotheses to build these models on.

If on one hand the number of agents trading in the order book implies a

widespread number of algorithms developers that almost exclusively with ma-

chine learning techniques attempt the prediction of its future dynamics -more

or less successfully-, the ﬁnancial-econometric modeling is a non-saturated re-

search ﬁeld in that gives the researches many directions to explore.

Being part of the BigData ﬁnance group and speciﬁcally working on such con-

temporary and challenging topic, rather than making me feel alone in a desolate

desert of high complexity of apparent irrational stochastic, makes me feel ex-

cited for the research opportunities this research area provides. Since I started

this journey I have been forced to develop a number of cross-sectional skills (e.g.

cloud computing, big data analytic, machine learning techniques, econometric

theory for point processes) that the contemporary researches must handle and

be acquainted with to deal with this kind of problems. Well far from being an

expert, I sense that this set of multidisciplinary skills are required for the future

data analyst and scientist and feel glad to have the opportunity to work in this

direction within the BigData ﬁnance network.

*Martin Magris is based at Tampere University of Technology 2016-2019, and his research project is Order Books Dynamics and Announcement Effects during Financial Crisis (WP3)*