Can GenAI Process Data Better than Humans?
Everyone makes mistakes. For individuals, they’re often positive learning experiences. For businesses, they can mean large quantities of cash down the drain.
Human error is neither rare nor avoidable.
According to Dr. Graham Edkins, the average person typically makes between three and six mistakes every hour. Considering that a single person creates about 1.7 MB of data every second, this begs the question—do you have any idea how much of your data is being entered in error?
A National Center for Biotechnology Information (NCBI) study on data evaluation found that humans have a typical error rate of 6.5%. The exact ratio was 650 errors per 10,000 data entries. It’s important to note that this is a single-entry scenario with no distractions. As you can imagine, this rate increases dramatically when the person is faced with time constraints, multitasking, repetitive tasks, or a large number of pieces of information to process.
This was exactly the problem our client was trying to solve; freeing their 30-person team from the burden of triple-checking every data entry when processing vendor data.
Manual Data Processing: A Flawed Approach
The Stakes Are High
Our client—a global telecommunications service provider—wanted to speed up their internal processes related to quoting network services from their partners. This was a major undertaking since the number of vendors across all global markets numbered in the hundreds.
That is where things got complicated. As a global leader, our client cooperates with a large network of localized telecom carriers that periodically update their prices in cost books. If there is a pricing mismatch, the price offered to end customers will lower the company’s profits or even prevent it from generating any profits at all. This is a serious risk for the company’s liquidity which is why it’s being managed by a dedicated internal team on a daily basis.
A Slow And Resource-Heavy Process
To efficiently process cost books, our client has a 30-person department whose primary goal is to continuously update vendor prices. This is to ensure that customers buy the services they want at the correct prices.
The typical process starts with obtaining cost files from their vendors, storing them internally, and preparing them in the required format. This is a tedious and time-consuming process. It is also prone to errors because of the many manual tasks and points where data can be processed incorrectly.
Delays Have Real Consequences
Once prepared, the files are sent to a third-party technology provider with dedicated software to process network services quoting data. During this process, they return the most current quotes from vendors available in a given local market through a dedicated API.
As a result, the entire process for just one supplier will typically take at least a couple of days and sometimes as long as a few weeks. This poses a severe risk, as the company’s margin can be reduced or even zeroed out in the event of a sudden change in multiple cost files from different telecom vendors.
Generative AI Changes the Narrative
Our customer wanted to find out if a generative AI model could properly automate (read and process) cost files. This technology is highly successful at reducing human errors, especially in databases, which reduces the amount and cost of manual labor.
Preparing a PoC
To prepare a training database, we requested a sample of real telco data (anonymized) to check whether a generative AI model could properly process it and output an acceptable result. After that was confirmed, we added a dedicated Python interpreter tasked with analyzing any input file provided. This showed the consecutive steps taken in processing the cost file.
Taking a New Approach With GenAI
One of the biggest challenges we faced was supporting a wide variety of different input file formats. We had assumed we could simply convert any input to text and use an LLM to extract the relevant information. However, we soon realized that such a generic conversion could lead to a loss of information, especially around the spatial arrangement of the text. In turn, this would make it impossible to analyze the data accurately.
To overcome this issue, we decided to leverage an agent-based approach.
This framework allows for more complex operations. It incorporates concepts such as scheduling, storage, and tools (e.g., code interpreters). Instead of extracting information in a single API call, tasks are broken down into smaller steps. The LLM can understand the file format, use the most appropriate method to read and present its contents, and finally extract the relevant information in the desired format.
An Elegant Solution
The PoC frontend application consists of a simple screen with two buttons: one to upload a file and the other to start processing it. It works as follows:
- The user provides an input file(s) with cost data
- The model analyzes the file structure and content
- Relevant data is extracted in a desired output format
- Once the outcome is displayed, the user either accepts or rejects the result
- If accepted, the user stores the processed data in a specified location
- If rejected, the user requests adjustments by re-running the model with changed prompts, and this step is repeated until the output is accepted
The output cost data was compared with real network service cost data provided by the client. The model was able to parse all file formats specified by the client (PDF, CSV/Excel, email messages), proving that manual labor can be reduced or even eliminated in the long term.
Next Steps
The plan is to continue to improve the PoC by allowing users to change GPT prompts. We have also suggested further refinements to improve file format processing and output data visualization.
If fully implemented, the client will benefit from the ability to instantly process multiple cost files with minimal user assistance. The entire process will be nearly fully automated, eliminating most manual errors and significantly improving the processing time. In the longer term, it will also be possible to stop using the third-party technology provider currently responsible for processing those files.
Once refined, the solution has the potential to save our client millions of dollars a year in overhead costs.
Drowning in Data?
Speak with our GenAI experts to make it manageable.
Was this article useful for you?
Get in the know with our publications, including the latest expert blogs
End-to-End Digital Transformation
Reach out to our experts to discuss how we can elevate your business