Interview: Gene Ekster, Co-Founder, Alternative Data Group

Ask any alternative data specialist, which part of their job they love the most and "hand-mapping each row to a company name" is a response you will never hear. For the uninitiated, mapping aka ticker tagging aka entity recognition is the inane job of having to identify official company information in an unlabeled dataset such as a common credit card statement. 

Here is a typical raw transaction: "THE APPLE COMPANY MOTT*S ONLINE" which should resolve to The Mott's Company wholly owned by Keurig Dr. Pepper (NYSE: KDP). It's easy to see how a keyword-based system (or even a human) might mistakenly map the transaction to Apple (NASDAQ: AAPL) because of that keyword presence. Yet, that was one of the easy cases; the harder ones are transactions like: "MINOT BK 6545 MINOT AFB ND~~08888~~xxxxxxxx" or "More Saving.  More Doing store XXX"

Even from these few examples, the hidden challenges of a mapping system begin to emerge :

  • It cannot be based on a list of keywords or regular expressions.
  • Only cutting edge AI can hold a candle to human level pattern recognition and even humans can barely resolve some transactions like the Burger King (MINOT BK...) example above.
  • It needs to track new and existing shifting ownership structures of every product, brand, subsidiary, acquirer, joint venture, conglomerate, holding company, etc. in the world. This task would grind down even an army of analysts, thousands strong.

Mapping the transaction data to structured company information is more than just identifying the correct entity. Mapping is an enabler; it enables more capabilities. If done upstream then it has a value multiplier effect, enabling the downstream ecosystems to swell with new, data thirsty applications. If done downstream, the in-house IP opens up alpha generation opportunities which do not exist if the data isn't structured.

In fact, in the last three years, users of such datasets have come to demand accurate tagging in all parts of the supply chain. Anything less would be a nonstarter. Frankly, unmapped data is no longer competitive. The data’s additional dimensions increase its use cases thus increasing its commercial value. Good mapping pays off.

At AltDG, we have spent the last three years of R&D in partnership with the NYU Courant Institute to create a universal mapping product aimed at helping the alternative data supply chain automatically tag a wide variety of datasets. We have followed a few fundamental principles when designing this system:

  • The mapping rules are created dynamically with machine learning, using human input only to spot check the quality.
  • Coverage must be global must include every public and private firm except for the moms-and-pops variety.
  • The results must reflect daily updates about brands, company names and ownership structures
  • There is no one silver bullet data source. AI agents must tap into many overlapping structured and unstructured data sources for each identification
  • The agents must compete with each other to deliver the final answer and must be designed to be self-learning
  • The system must respect clients' privacy and delete logs of incoming queries per request.

The AI-based system achieves a 96% - 99% accuracy with beta customers, and it is very satisfying to see the system self-tune by several percentage points within the first few weeks of the deployments. A few percentage points matter. Why? Isn't a basic system of hand-coded rules that achieve an accuracy of ~80% sufficient? The simple answer is no; because once developed, an AI-based system is cheaper to deploy, easier to maintain and performs better on the spot and over time.

When the AI-based merchant mapper was tested against other mapping methods, across various out-of-sample datasets, the findings (summarized below) show it to be a clear winner. We hope that hand mapping transactions will recede into history as one of the practices in the early days of alternative data which no analyst would have to suffer through again.