Can GenAI Process Data Better than Humans?

Piotr Wawryka Business October 15, 2024 7 min read

Everyone makes mistakes. For individuals, they’re often positive learning experiences. For businesses, they can mean large quantities of cash down the drain.

Human error is neither rare nor avoidable.

According to Dr. Graham Edkins, the average person typically makes between three and six mistakes every hour. Considering that a single person creates about 1.7 MB of data every second, this begs the question—do you have any idea how much of your data is being entered in error?

A National Center for Biotechnology Information (NCBI) study on data evaluation found that humans have a typical error rate of 6.5%. The exact ratio was 650 errors per 10,000 data entries. It’s important to note that this is a single-entry scenario with no distractions. As you can imagine, this rate increases dramatically when the person is faced with time constraints, multitasking, repetitive tasks, or a large number of pieces of information to process.

This was exactly the problem our client was trying to solve; freeing their 30-person team from the burden of triple-checking every data entry when processing vendor data.

Manual Data Processing: A Flawed Approach

The Stakes Are High

Our client—a global telecommunications service provider—wanted to speed up their internal processes related to quoting network services from their partners. This was a major undertaking since the number of vendors across all global markets numbered in the hundreds.

That is where things got complicated. As a global leader, our client cooperates with a large network of localized telecom carriers that periodically update their prices in cost books. If there is a pricing mismatch, the price offered to end customers will lower the company’s profits or even prevent it from generating any profits at all. This is a serious risk for the company’s liquidity which is why it’s being managed by a dedicated internal team on a daily basis.

A Slow And Resource-Heavy Process

To efficiently process cost books, our client has a 30-person department whose primary goal is to continuously update vendor prices. This is to ensure that customers buy the services they want at the correct prices.

The typical process starts with obtaining cost files from their vendors, storing them internally, and preparing them in the required format. This is a tedious and time-consuming process. It is also prone to errors because of the many manual tasks and points where data can be processed incorrectly.

Delays Have Real Consequences

Once prepared, the files are sent to a third-party technology provider with dedicated software to process network services quoting data. During this process, they return the most current quotes from vendors available in a given local market through a dedicated API.
As a result, the entire process for just one supplier will typically take at least a couple of days and sometimes as long as a few weeks. This poses a severe risk, as the company’s margin can be reduced or even zeroed out in the event of a sudden change in multiple cost files from different telecom vendors.

Generative AI Changes the Narrative

Our customer wanted to find out if a generative AI model could properly automate (read and process) cost files. This technology is highly successful at reducing human errors, especially in databases, which reduces the amount and cost of manual labor.

Preparing a PoC

To prepare a training database, we requested a sample of real telco data (anonymized) to check whether a generative AI model could properly process it and output an acceptable result. After that was confirmed, we added a dedicated Python interpreter tasked with analyzing any input file provided. This showed the consecutive steps taken in processing the cost file.

Taking a New Approach With GenAI

One of the biggest challenges we faced was supporting a wide variety of different input file formats. We had assumed we could simply convert any input to text and use an LLM to extract the relevant information. However, we soon realized that such a generic conversion could lead to a loss of information, especially around the spatial arrangement of the text. In turn, this would make it impossible to analyze the data accurately.

To overcome this issue, we decided to leverage an agent-based approach.

This framework allows for more complex operations. It incorporates concepts such as scheduling, storage, and tools (e.g., code interpreters). Instead of extracting information in a single API call, tasks are broken down into smaller steps. The LLM can understand the file format, use the most appropriate method to read and present its contents, and finally extract the relevant information in the desired format.

An Elegant Solution

The PoC frontend application consists of a simple screen with two buttons: one to upload a file and the other to start processing it. It works as follows:

  1. The user provides an input file(s) with cost data
  2. The model analyzes the file structure and content
  3. Relevant data is extracted in a desired output format
  4. Once the outcome is displayed, the user either accepts or rejects the result
  5. If accepted, the user stores the processed data in a specified location
  6. If rejected, the user requests adjustments by re-running the model with changed prompts, and this step is repeated until the output is accepted

The output cost data was compared with real network service cost data provided by the client. The model was able to parse all file formats specified by the client (PDF, CSV/Excel, email messages), proving that manual labor can be reduced or even eliminated in the long term.

Next Steps

The plan is to continue to improve the PoC by allowing users to change GPT prompts. We have also suggested further refinements to improve file format processing and output data visualization.

If fully implemented, the client will benefit from the ability to instantly process multiple cost files with minimal user assistance. The entire process will be nearly fully automated, eliminating most manual errors and significantly improving the processing time. In the longer term, it will also be possible to stop using the third-party technology provider currently responsible for processing those files.

Once refined, the solution has the potential to save our client millions of dollars a year in overhead costs.

Drowning in Data?

Speak with our GenAI experts to make it manageable.

    Fields marked with * are required

    You have the right to withdraw your consent at any time by managing your preferences, without affecting the lawfulness of processing based on consent before its withdrawal. This means that any processing carried out prior to the withdrawal of consent will remain valid and lawful. If you have any questions regarding the processing of your personal data, please contact us at [email protected].

    Preferences Regarding Marketing Communications

    We value your privacy and want to ensure that your experience with our marketing communications is tailored to your preferences. Please take a moment to let us know how you would like to receive and interact with our marketing materials.

    1. Preferred Communication Channels

    Please indicate your preferred communication channels for receiving marketing communication:

    2. Communication Frequency

    How often would you like to receive marketing communication from us?

    3. Content Preferences

    Select the types of marketing content you are interested in receiving:

    Was this article useful for you?

    Get in the know with our publications, including the latest expert blogs