Jessica Zhang
Senior Machine Learning Engineer
In a recent Hummingbird Labs post, we discussed the challenges of using generative AI for SAR/STR narratives. In that post, written at the beginning of the year, we noted that while it was relatively easy to spin up a LLM model to generate a SAR narrative, it was exponentially harder to get it to produce one that was accurate and verifiable.
Since then, our Data team has given this question a lot of blood, sweat, and tears. We built a new architecture from the ground up and partnered with our in-house investigators to ensure that we’re able to produce consistently high quality narratives.
If you’ve experimented at all with publicly available AI models, you’ll have already seen the power of LLMs, which in seconds can produce full-length essays or solve complex math problems. So what’s stopping these models from being able to quickly generate a SAR narrative?
The answer, as it turns out, has to do with the type of output being requested. As a written record, SAR Narratives are complex. They incorporate potentially dozens of subjects and hundreds of transactions. The investigator must use their deep understanding of the context behind the case in order to compose a concise, coherent narrative. Expecting an LLM to produce a narrative of the same quality by providing investigative context and some light prompting simply isn’t practical.
That brings us to an important point: here at Hummingbird, we are not building narrative generation tools with the aim of replacing the investigator. We’re not even interested in building a model capable of producing a narrative so complete that all the investigator must do is simply “review and approve” the generated response. It’s our belief that this approach underestimates the complexity (and value!) of the human work involved, and would lead to more problems than it solves.
So what will our narrative generation tool look like? First of all, it is designed to provide a strong, accurate first draft on which the investigator and AI can work collaboratively. This, we believe, will provide compliance teams with an immense amount of value. If we can pull together the most relevant information from the case, provide an analysis to support a file (or no file) decision, ensure complete accuracy, and get to a first draft that’s 80% of the way complete, we can save the investigator a lot of the tedium that comes with narrative writing.
In order to do that, however, the model has to know what makes a strong narrative in the first place. Which is exactly the problem our Data team set out to solve.
Given a SAR narrative’s complex requirements, any LLM model designed to work in conjunction with an investigator on the narrative must have highly specific knowledge of the reporting requirements across various FIUs. In order to accomplish this, we drew on our in-house experts, as well as resources provided by FinCEN and other FIUs we support, to compile a working list of characteristics that are universal to well-written SAR narratives. We then converted these characteristics into a set of evaluation metrics that we could measure each AI-generated narrative against.
We also found that simply sending the data of an investigation directly to an LLM is insufficient. Results are better when the LLM is incorporated into a broader software architecture with data pre-processing steps and agent chaining with sub-models focused on highly complex components of an investigation, such as transactions.
For those who think that developing safe, effective AI features doesn’t involve the manual taskwork that AI tools are often tasked with eliminating, consider this: there’s no shortcut to labeling AI generated narratives to assess quality! Our team of in-house investigators individually assessed a large corpus of narratives, breaking each one down into its constituent parts and creating a schematic to demonstrate why it was so effective. It was a monumental task, but one that was essential in providing the data and learnings required to calibrate and fine-tune our AI Narrative Generation tool.
With work developing our AI narrative generation capabilities underway, we turned to the other important problem we knew we’d need to solve – the AI narrative editor interface. Without a smooth and compelling user experience, our model would never achieve its full potential. We wanted to build a tool that would allow for natural human <> AI collaboration.
With that in mind, we’ve been working on a user interface to facilitate the following aspects of narrative work:
A push of a button generates a draft SAR narrative report, which the investigator can use to accelerate report development.
An obvious and all-important point: one of the major risks in a generated narrative is the potential for hallucinations and misrepresentations of the data provided. At Hummingbird, we only release tools and features when we are confident that they meet the rigorous standards of compliance work, so this was a question we looked to address in a comprehensive manner.
Our solution is two-fold. First, we have carefully designed the generation to solely use information provided in the case, as well as grounding it with sources. This prevents the model from seeking any outside, potentially non-pertinent information when crafting its draft response.
Second, Narrative Generation is only one half of our narrative editor. The other half is a tool for narrative validation. This feature – which we have been working on in conjunction with the narrative generation tool – reviews all the information contained in the narrative. It then validates all data points (such as names, places, payment methods, etc.) and ensures that they match the information contained within the case. It also checks to make sure that all crucial case elements are present within the narrative, that the narrative presents a cohesive story, and that all elements expected by the financial intelligence unit (FIU) of the relevant jurisdiction are captured. The combination of narrative generation, narrative validation, and human investigation work together as a system of checks and balances, developing narratives more quickly and comprehensively, but also ensuring that they are as accurate and impactful as possible.
What do we expect from these new features? Time savings for the investigator (improved efficiency is always a compliance goal), but also the assurance and peace of mind that comes from knowing that all necessary and critical information is included in the narrative. Our customers have told us that they routinely spend 30+ minutes writing the narrative portion of a filing, even in the case of no file decisions. We believe our new AI narrative generation and validation tools can save a substantial amount of this time.
As we continue to develop AI features, we’re constantly thinking about how the human <> computer interaction model is evolving in the age of AI, where the AI acts as a supercharger and collaborator rather than replacement for investigators. By embracing this philosophy, we can build AI features that are effective and fit-for-purpose while also remaining safe, secure, and accurate.Subscribe to receive new content from Hummingbird