RP3: Deep Knowledge Extraction from Financial, Business, and Social Text (WP1)


This research project aims to transform unstructured textual content in multiple languages and formats into a structured form suitable for traditional analytic techniques in financial decision-making. The challenge is to extract semantically annotated facts in the form of relationships between concepts and entities mentioned in a stream of documents and social media. The task fits nicely into the framework of Machine Reading (or micro-reading) and represents a new direction at the intersection of natural language, semantics, and knowledge-driven reasoning. The knowledge extraction methods developed in this RP will be tested and employed in other WPs. Particularly the projects in WP3 (on financial econometrics with news announcements) directly depend on this research project to augment econometric models with news arrivals.

The final outcome will be both relational (for light weight representation) and reasoning-capable logical relationships (for deep representation). Current practical work in this area relies heavily on curated event types and declarative extraction techniques, leading to a sharp trade-off between precision and recall, a lack of transparency in market behaviours, and an assumed independence between extracted structures. Instead, this project will propose new frameworks for incorporating a deeper context to further the understanding of the relative value of extracted knowledge.

Early Stage Resercher working on the project: James Hodson

Supervisor: Doctor Marko Grobelnik, Jožef Stefan Institute / marko.grobelnik(at)ijs.si


JSI logo