Take tour   Speak with an expert
Take tour Speak with an expert

Announcing AI File Summarization

Lucas Chapin

Head of Data

Introduction

Taking in large amounts of information is a fact of life for compliance professionals. From customer due diligence requirements to AML and fraud investigations, data is at the heart of all financial operations, making the efficient separation of signal from noise an essential aspect of compliance work.

But manually making your way through stacks of documents is a difficult and time-consuming endeavor. Between onboarding new customers, conducting enhanced due diligence, and ongoing monitoring, an entire team of analysts can be kept busy simply reading through files and extracting key information. 

That’s why we at Hummingbird are proud to announce the release of our new AI File Summarization feature. We believe this new tool will ease the burden on investigators, taking some of the grunt work off of their plates and allowing them to do their jobs more efficiently.

Introducing AI File Summarization

AI File Summarization allows your team to request an AI-generated summary of your case files. Simply upload the file, click the “Summarize” button, and ta-da! – the tool produces a short, concise encapsulation of all the key details. 

Here’s how it looks in action, using a criminal background check as an example: 

A Game-Changer for Customer Due Diligence

We’re particularly excited to release AI File Summarization as part of our broader Customer Due Diligence (CDD) solution, where our primary objective is to save investigators time during the information-heavy nature of conducting manual due diligence. In KYC, information on an individual is gathered from many different sources, including social media sites, personal or business websites, criminal history reports, open web search, and federal data sources such as sanctions lists. KYB diligence can be equally robust, with the investigator gathering large amounts of data from third-party (as well as open-source) tools and RFIs in order to build a picture of businesses and the people who own them.

But it’s not simply data volumes that make the difference. Much of CDD-related data is difficult to parse. Document types range from screenshots to PDFs, and files may span multiple languages even within the course of a single investigation. Complicating things further, documents are often awkwardly and inconsistently formatted. (Anyone well versed in background checks and criminal history records will tell you that finding a specific data point in some documents is like finding a needle in a haystack). 

Luckily, LLMs are well suited to help with all of these challenges. Our AI File Summarization feature adds immediate value by taking a first pass through all of this data, providing a succinct summary of what’s contained within the document along with the key points most relevant for an analyst. Notably, we designed this product specifically with compliance work in mind. Our  development focused on calibrating our model to meet compliance-grade criteria – minimizing hallucinations, prioritizing data security and privacy, and providing analysts with the ability to copy summaries directly into their review workflows. To make this happen, we worked directly with our customers and design partners at Stripe. Their feedback helped ensure that the AI File Summarization feature is performing exactly to the needs of large financial institutions with global compliance needs.

"AI file summarization is a real time-saver for investigators. The summaries are easy to digest and seamlessly integrated into the case. There's even support for multiple languages, which is hugely beneficial to a team like ours that conducts investigations globally. We just see so much potential for automating the team's work!"

- Britta Reinan, Program Manager, Financial Crimes Controls, Stripe

Built for Compliance

For Hummingbird Labs fans interested in the technical details, our AI File Summarization feature first utilizes advancements in Optical Character Recognition (OCR) to extract and structure text from a source document. It then passes the result into a multi-call generative AI layer. The first pass hones in on the nature of the document, such as the domain and main topics. We then map the document to bespoke prompt templates suited to the document domain and context of the broader investigation. This part of the process is designed to mirror the investigative mindset: the model has been trained to extract and organize exactly those details an investigator would want to know. Being deliberate in how prompts are mapped to the document domain is essential given how widely a subject’s LinkedIn page might differ from a criminal history report.


Further LLM calls use state-of-the-art practices such as chain of thought (CoT) prompting to again emulate an investigative approach to finding relevant information. Chain of verification (CoV), meanwhile, is used to reduce the risk of hallucinations and to check for internal consistency.

The result of all of this careful methodology is that the AI File Summarization tool is able to take a large, intractable document and condense it down to a summary paired with investigative insights. There are, of course, many cases where investigators may want to comb through the source document in detail, but it’s our hope that using the AI File Summarization feature will allow them to begin from a clear, concise starting point, quickly arriving at a place of big-picture understanding and knowing exactly where to focus before diving into the weeds. 

A note on security:
Our approach to AI feature development as it relates to data security is discussed our post “Developing a Framework for AI Model Selection,” but it’s worth reiterating that models are only hosted within our Virtual Private Clouds (VPCs). Our data is always encrypted at rest and in transit, and we never use customer data to train models. Finally, our solutions are designed to provide an audit trail of all activity, making it easy to share the full investigative history with regulators.

Looking Forward

AI file summarization feature is currently generally available. It has been in beta testing with several of our customers as part of early-access design partnership conversations. If you’re interested in seeing it in action, please don’t hesitate to book a demo

One of the most exciting parts about the release of AI File Summarization is that it feels like just the start to more advanced automation. The capabilities we’re able to provide today open the door to many more, exciting new projects. For example, once individual documents have been summarized, a natural next step might be to build a feature capable of comparing documents for internal consistency (e.g. does a subject’s LinkedIn profile, business website, and transaction history all seem consistent?).

Another thing we’re excited about is the fact that – having taken the time to build a strong AI foundation – all of our features can now improve quickly, adapting to the latest tech advancements with very minimal effort. Our R&D team has developed a scoring system to evaluate how well different LLMs perform specific to the compliance domain (something we hope to cover in another post), while our engineering team has built abstraction layers into our codebase, allowing us to easily swap out different components of our AI stack. The result is that we’re poised to take advantage of the latest advances in machine learning and AI regardless of where those advances come from. For the end user, this means AI compliance features that will all grow in speed, accuracy, and general utility as the industry matures.

Stay Connected

Subscribe to receive new content from Hummingbird