The main training objective of BigDataFinance is to meet the increasing commercial demand for well-trained researchers with experience in both Big Data techniques and Finance. The main research object is to develop and implement new quantitative models and econometric methods for empirical financial research and risk management by bridging the gap between research methodologies in Finance and Data Science. The ITN has four Work Packages (WP) with following sub-objectives:

  • provide sophisticated methods in data science for financial modelling and risk management and pay special attention to machine learning and knowledge extraction algorithms (WP 1);
  • use and develop methods and visualisation approaches with complex networks to increase understanding of the phenomena in financial markets arising from large and diverse data sets (WP 2);
  • provide new econometric methods augmented with textual data sources and (ultra-)high-frequency stock market data for advanced risk management (WP 3);
  • develop and publish an open financial sentiment index based on social media data (WP 4);
  • implement new Big Data applications with machine learning and knowledge extraction techniques for advanced financial risk and investment management in real business environments (WP 4).

These objectives are expected to stimulate practitioners to exploit data-driven and robust risk management models and abandon less realistic ones for safer operation with financial instruments and better focus on the real properties of financial and economic variables arising from real data. On the academic side, BigDataFinance promotes the use of massive integrated data sources to develop innovative financial models and methods reassessing the unrealistic assumptions so far applied with homogenous data sets. We anticipate that these objectives will be vitally important to the financial sector, solution providers, and researchers.

Research Programme and Work Packages

To achieve the objectives, a research programme structured into four scientific Work Packages (WP) has been set up.

WP1, Data Science in Finance. The first WP entitled “Data Science in Finance” lays a solid foundation for the other WPs in financial data processing and analysis by focusing on Big Data techniques and hence helps the other WPs technically to process and analyse massive data sets. The goals of this Data Science WP are (i) to extract knowledge from data and to produce data samples, products, and features for further analysis, and (ii) to develop and implement techniques from machine learning, signal processing, and statistics for data analysis in finance. In particular, novel machine learning methods specially designed for big data financial analysis will be developed and applied to learning the characteristics of financial markets and extracting knowledge from textual data on different computational platforms. This WP not only provides academically and practically important and original methods for learning from massive data sets but is also of crucial importance to the other WPs in this network that utilise large data sets in financial modelling and risk management.

WP2, Complex Networks in Finance. The second WP, “Complex Networks in Finance”, aims to understand the real properties and phenomena of financial markets by applying complex network analysis with exceptionally rich data sets. Importantly, visualisation algorithms, which are vital to efficiently and effectively use Big Data, will be developed and used. In WP2, analysis is model-free; the overall aim is to understand the features of real financial system by letting the data speak. For example, one RP in WP2 studies how empirical investor networks and corporate ownership bases change around news arrivals during financial crises; another RP analyses systemic risk from a network perspective, combining theoretical modelling, empirical analysis and practical (policy) applications to answer very topical question such as if there is a resilient architecture of the financial system and if we should put restrictions to institutions that are too-big or too connected to fail. Clearly, WP2 depends on WP1 extract features from the original data sets.

WP3, Financial Econometrics with High-Frequency Data and News Announcements. WP3 represents model-driven research in that it aims to develop and use econometric models with high-frequency stock prices and limit orders and massive data sets on macro and firm-specific news arrivals. Overall, this WP seeks to create empirically valid models augmented with news indicators to model the effects and interactions between news arrivals, volatility, correlations, and order book dynamics. These new models link the news arrival data to the high-frequency financial series, providing new approaches for risk management in WP4. WP3 directly employs knowledge extraction techniques from WP1. Moreover, WP2 findings on the effects of news arrivals on investor behaviour are closely linked to WP3 development of realistic models for describing real phenomena in markets.

WP4, Big Data Applications in Finance. Fourth and finally, WP4 employs techniques in data science (WP1) and improved financial econometric methods (WP3) to develop Big Data applications in Finance and especially in Financial Risk Management that satisfy the empirical properties of financial markets investigated in WP2. In addition to new data-driven risk management methods and tools (RPs 11-13), WP4 provides a new open mood and confidence index (RP 10). This WP aims to improve the open access to new data products, which will obviously have a huge societal impact. Since many WP4 research projects are hosted by private sector beneficiaries or partners, the results will not only be published in academic outlets but critically both informed by and used in practice.

The scientific work packages are complemented with three WPs for training, management, and