Using AI to Improve Objective Coding

September 11, 2024

Summary

Large language models (LLMs) excel at general-purpose tasks, but if you have a specific and repeatable use case, a fine-tuned model will outperform them. This results in more accurate outputs, faster inferences, and lower costs.

If you’re interested in a demonstration, feel free to reach out to me.

Background

Jocko Willink was once asked what exercise someone should do if they wanted to improve the number of pull-ups they could do. His response was, simply, “Pull-ups.” This direct answer emphasized doing the very thing the individual was avoiding. The same principle applies to diet and exercise — if you want to lose weight, you need to diet and exercise. There’s no way around that fact (though new medications may aid the diet aspect).

Carrying this principle forward, there is no free lunch when it comes to using AI in your processes. In the legal world, tasks like objective coding in eDiscovery processing can cost between $3 to$ 4 per 1,000 pages. This may tempt you to simply wrap your own process around an LLM call, but doing so is not free and potentially will cost you more time and money.

What is Objective Coding

Objective coding (according to Wikipedia) is the process of generating summary or keyword data from a document. Unlike subjective coding, objective coding deals with factual information, such as the document’s original date, author, and type. This is especially crucial in litigation, where each party must efficiently organize vast volumes of data. Objective coding is a critical step in a law firm’s organizational process during discovery.

Search (now AI) consultants have frequently performed this task, and with the rise of LLMs, it might seem like a solved problem. While LLMs offer powerful ad-hoc capabilities through zero-shot questions, achieving high accuracy requires fine-tuning with specific data.

An example of an email coded

Prompting your way to Objective Coding

Recently, we experimented with using Microsoft’s GraphRag for this process. GraphRag performs tasks like entity extraction on unstructured text and stores it in a knowledge graph. While we’ll explore knowledge graphs in future posts, our focus here is on the challenges we encountered.

It quickly became evident that LLMs were not sufficiently accurate enough for the task. Moreover, the prompts generated by GraphRag totaled around 3,000 tokens before the text was even processed. To put this in perspective:

When you process a document, it is tokenized and split into chunks — typically 500 tokens per chunk. If 500 tokens of your document are wrapped in 3,000 tokens of prompt, that results in 3500 input tokens sent to the LLM. The LLM’s response forms the output tokens. Ideally, it is a concise answer, but you are paying for both input and output tokens.

Let’s break down the potential costs using the Enron dataset as our example (using GPT4):

The Enron dataset contains approximately 500, emails, estimated at 500 million to 1 billion tokens
1,000,000 API calls * 3,500 tokens / 1,000 tokens * $0.025 =$ 87,500 inbound
1,000,000 API calls * 500 tokens outbound / 1,000 tokens * $0.01 =$ 5,000 outbound
Estimated Total Cost: $92,500

In our case, with GraphRag, we used multiple methods, including Ollama with various models, and Azure’s OpenAI. With Azure, we encountered a limit on tokens per minute that slowed down the integration. I am sure as you use Azure more the limit would be increased but it did slow our ingestion down and limits the advantages of the “on demand cloud”

Regardless of the number of tokens you can process a minute, with 500 million tokens and 500 tokens per call, you are looking at processing a lot of tokens. This is a substantial cost for a single task, that in our tests achieved around 50% accuracy. Add to that the processing time for each call.

A Better Approach

Given the limitations of the native LLM approach, we began exploring vision models to improve performance. As Supreme Court Justice Stewart famously said about obscenity, “I Know It When I See It”. Reproducing that human intuition is the key to making AI effective in this process.

Many legal documents have structure which is extremely useful in context, which is often lost when converted to plain text and tokenized for LLMs. By switching to a vision model and sending both text and the image of the document, we achieved remarkable improvements.

This approach leverages the visual cues and layout of documents, mimicking how humans process information and leading to more accurate and efficient coding.

An example of an email coded

Closing Thoughts

While general-purpose LLMs offer powerful capabilities, specialized tasks like objective coding benefit significantly from more tailored approaches. Our vision model-based solution demonstrates that by combining AI with domain-specific knowledge, we can achieve higher accuracy, faster processing times, and lower costs. If you’re facing similar challenges in document processing or other specialized tasks, we invite you to reach out and explore how our innovative AI solutions can benefit your organization.