L3 limit order book (LOB) data contains 30% more price discovery relevant information than L1 data. The problem is this data is difficult to work with and the information is hard to extract. While a small handful of very well resourced leading market participants have capitalized on being able to use this data over the last decade, most major financial institutions are still not able to.
Some believe that L3 information is not relevant to those trading through brokers, nor relevant to those trading at medium to low frequencies. To which we would make three observations. Firstly, while it is true that L3 data does hold information on short-term patterns, it is only by down-sampling this data that lower frequency data is obtained. Pre-processing with a zero-one filter is a crude tool, throwing away information whether it is relevant or not. The statistically correct approach is to allow the model to decide how to handle the data. Secondly, the “leading market participants” referred to above generally ply their trade through what many consider to be ‘predatory’ trading. This can mean providing liquidity when it suits, but as soon as a natural order is detected in the L3 data, leaning against that order and taking liquidity – the end result being a cost to the provider of the natural order. Only by understanding the statistical dynamics of parent-child submission through using L3 data can such predatory approaches be defeated. Thirdly is the ability to accurately simulate the market using agent based models (ABMs), trained using L3 data. From trained ABMs synthetic market data can be generated. The ability to simulate medium-low frequency trading strategies not just against a one-time realization of market data, but across many realizations enables a step change in statistical significance of predictor design.
The workflow of using L3 data is as follows: Data is collected by performing packet capture at the colo with potentially every pcap location recording data from every matching engine. This captured data is then centralized and subject to a process of curation (eg book building, ticker mapping), normalization (eg transform to UTC, map fields to API dictionaries) and consolidation (eg enable my view of the European Consolidated Tape). An additional high-value step is combing the L3 public data with private order flow data to generate L4 data – enabling identification of beneficiaries orders in the book, along with other information such as order types and max show values. This data pipeline will include metadata management and potentially derived data management (eg generate intraday volume curves). Once the data is present it then needs to be combined with cheap compute at scale. This is either by an in-house farm or using the cloud, using either map-reduce or non-map-reduce systems. The workflow needs to allow for the range of L3 data use cases and where and how to implement fine grained and coarse grained parallelism to ensure sufficient speed. Finally, the workflow needs access to both open source analytics (eg pandas, tensorflow) and closed source analytics (eg MOSEK). The end user needs to be able to make any arbitrary calculation on any amount of L3 data (eg Russell 3000 for the last three years) and have results returned to them in an appropriate amount of time. At BMLL we see our value as being at the interfaces of this value stack, with the interfaces being represented as APIs.
A recent industry survey of quants found they felt they spent over 80% of their time performing menial duties around data and systems, as opposed to performing their value-add. Another recent survey of employers felt that they were severely under-resourced in the quant and engineering fields. In summary, managed services are powerful as they help solve for both these pain points. Managed services enable human resources to focus on their value add, while also speeding up the OODA loop of complex problem solving.