Quality vs Costs: Walking the Tightrope in AI Data

Every company that needs high-quality, complex data to train their AI apps and LLM models is asking its vendors to walk a tightrope and deliver against stringent quality metrics specific to their needs - while staying on a tight budget. 

At e2f, we’re not fazed by this because we’ve been dancing on this tightrope - successfully -  for the last twenty years, for some of the largest organizations across the world, including the Fortune 10. 

Our journey has taught us invaluable lessons on striking the right balance, ensuring that clients receive the best possible outcomes while controlling costs and staying on budget. 

Here are some things we learned - ranging from the completely expected to what might have blindsided us (just the very first time), starting with the basics.

01

Project Management Do’s and Don'ts

Successful AI Data projects - like others - depend on effective and efficient project management. Here are some do’s and don’ts that are specific to AI Data projects. We suggest that you use these do’s and don’ts as a checklist.

Do:

  • Create a detailed plan: Understand every nuance of the project scope from the outset so that you can allocate the right resources at the onset - and easily flex when you encounter any challenges.

  • Establish clear milestones: Breaking down your AI Data project into clear, manageable stages with specific deadlines will keep the team on track and facilitate smoother execution.

  • Prioritize open effective communication: While this might seem like a no-brainer, there’s no such thing as overcommunication when it comes to fast-paced AI Data projects. Establish regular touch points within your team - and with your clients - so that everyone’s completely aligned on progress and risks.  

Don't:

  • Ignore flexibility: Go in with the expectation that if something can go wrong - if will. If you and your project managers go into your projects with this mindset, it will help you be flexible when you need to course correct - as and when the team encounters roadblocks or hiccups.

  • Overlook risk management: Identify potential risks early (timelines, schedules, quality checks, staffing, holidays, and other factors) and have a plan to address them to ensure the project stays on course, even when unexpected issues arise.

  • Sacrifice quality for cost: While managing costs is crucial, compromising on AI Data quality can have immediate negative impacts on the project - and long-term reputational impact on your organization. So don’t cut corners, and don’t compromise!

02

Tools and Communication

Clear and open communication is crucial in aligning internal and client expectations, especially regarding what constitutes "quality" — a highly subjective, company-specific metric in the area of AI Data.

Aligning on quality standards and project goals early on prevents miscommunications and ensures that you, your team, and the client are all working towards the same objectives. 

Use modern communication and collaboration tools like Slack, Chime, Microsoft Teams, JIRA, and others for your internal teams to collaborate with each other and to flag issues as and when they come up. These tools also make it easy to invite external (client) teams, communicate with them in real-time, share files, get instant feedback, and more. This approach not only prevents costly rework but also solidifies client-vendor relationships by providing clients with full transparency and earning their trust.

03

Achieving Quality with Quality Control and Calibration

There are several different types of metrics that are important in the AI Data industry. These could be data-set specific, such as relevance and accuracy metrics, safety and ethical considerations, model functionality and robustness, user experience, and policy compliance. Or, they could be annotation-specific, such as Inter Annotator Agreement (IAA), classification metrics  (accuracy, precision, F1 Score, etc.), ranking metrics,  context utilization, significant changes, ChatGPT detection, Bleu, Rouge, BERT Score, and others. 

Once you collaborate and align with your clients on the project’s requirements, you can jointly determine which metrics are relevant for that project and what the associated thresholds are that are acceptable to the client.  

The takeaway here is that there is a wide array of quality metrics, and you must work with your clients to align on what success looks like before you kick off the project.

Meeting these requirements requires strict quality control, resource management, training and/or retraining as needed, calibrating the work of Data Annotators, and creating a ‘continuous loop of improvement’. 

In the words of Nadya Rodionova, e2f’s Quality Assurance Manager, here’s how she and her team ensure quality on AI data projects:

We track the quality per metric and per annotator. For example, through IAA we can see what labels have a higher disagreement rate, in which case we need to identify the root cause. If it’s because of a misunderstanding of GL, we re-train annotators for that specific label, or if it’s because of a vague definition of the label in the GL, we need to change the wording, add examples, tighten the definition, etc. We also track the quality of each annotator and reviewer through the scores (a combination of manual and auto scoring) and we can stop and re-train or stop and reassign the annotator based on the score. We also keep track of high performers and promote them to reviewers/ QAers and keep them in the loop for similar projects.

Quality Control: Beyond Checking Boxes

Quality control in AI data projects transcends the routine checklist. 

It has to be a dynamic, ongoing process designed to identify and rectify errors, inconsistencies, and any deviation from the project's quality benchmarks. However, QC isn't just about finding faults; it's also about preventive measures.

Implementing rigorous QC at various stages of the project allows for early detection of potential issues, significantly reducing the need for extensive rework, which can be both costly and time-consuming.

Calibration

Calibration, on the other hand, involves aligning the work of Data Annotators with the golden set data—a benchmark dataset that defines the project's quality standards. This step is crucial, especially when working in client tools, where your QA/QC teams engage in early preventive actions. By identifying discrepancies early, you can guide the annotators more effectively, ensuring their output consistently aligns with the expected quality. This not only enhances the quality but also streamlines the process, mitigating unnecessary costs associated with correction cycles.

The Continuous Loop of Improvement

Central to this approach is the principle of continuous improvement. By regularly reviewing and refining our QC and calibration processes, you can adapt to new challenges and incorporate learnings from each project. This mindset of perpetual learning ensures the elevation of quality standards over time without proportionally increasing costs.

04

Managing Client Expectations

While everything we discussed so far is important to successfully delivering on AI Data projects, the final, critical piece – and some may even call this the most important one – is to manage client expectations. 

Pilot Projects

One of the best ways for clients and you and your team to ensure the success of your ‘production’ project, is to try to scope out a pilot project. This helps clients get a first-hand feel for your team’s skills, and it helps you showcase your team, your ability to orchestrate and deliver on complex, fast-moving AI data projects, and most importantly, it helps achieve early alignment with the client. All of this ensures complete alignment on both sides and as a bonus – it minimizes rework for you and your team. 

The Foundation: Clear Agreements and Open Channels

You must set and manage clear, realistic expectations not just from Day 1 - but from “Day 0”. As in, even before you kick off a project, you must make sure that clients know what to expect, when to expect it, and in what form to expect it. 

From the onset, establish a clear agreement on the project's scope, timelines, and costs to lay the groundwork for a smooth collaboration. This involves having a detailed discussion on the deliverables, the quality standards expected, and the timeline for the project's completion. Establish and maintain open channels of communication (we talked about some of the tactics and tools above) to keep the client informed and engaged, offering them a window into the project's progress and any challenges that may come up.

The Balancing Act: Prioritizing Client Needs

Understanding and prioritizing what's most important to the client—be it pricing, quality, timeline, or velocity—will enable you to tailor your approach accordingly. This balancing act is crucial in managing resources effectively while ensuring the client's primary objectives are met.

Your aim is to deliver content that not just meets but anticipates the client's needs, adapting to the industry's rapid pace where the deadline, often, is "yesterday.”

Preventing Scope Creep

One of the most common pitfalls for any project - and especially AI Data projects – is scope creep: the gradual expansion of the project's scope beyond the original parameters, often without corresponding adjustments in budget or timelines.

If this hasn’t happened to you yet, mark our words, it will. But if you expect this, and you’re prepared, you can mitigate this risk. Rigorous documentation and change management processes will help ensure that any adjustments to the project scope are fully vetted and agreed upon by all stakeholders. This will not only prevent cost overruns but it will also ensure that your AI Data project remains aligned with the client's expectations and business goals.

05

Using Machine Learning to Control Costs

One of the most powerful tools in your arsenal as you seek to control costs - while meeting your clients’ quality standards - is machine learning (ML) automation. 

The Role of Machine Learning in Enhancing Efficiency

Pre-labeling traditionally requires significant human effort and time, which can increase costs. However, you can dramatically streamline content pre-labeling by deploying ML algorithms and automating the initial stages of data annotation. This automation serves as the foundation, setting the stage for high-precision work.

The Human Touch in Quality Assurance

However, automation does not mean sidelining the human element. After the ML-driven pre-labeling, your skilled human experts can step in to perform a meticulous review. This blend of machine efficiency and human precision ensures that the final output meets your clients’ stringent quality standards.

Balancing Precision with Cost-Effectiveness

We’ve found that integrating machine learning into our workflow allows us to offer our services at cost-effective rates, ensuring that our clients receive the best value for their investment without stretching their budgets. To be honest, we could even say that our experience with implementing machine learning automation across various projects has been nothing short of transformative! If you’re an AI Data services provider, you should try it. If you’re looking for AI Data vendors, make sure to ask your vendor about this.

06

Conclusion

The delicate balance between managing costs and delivering quality in AI Data services is crucial to client satisfaction and the success of the project. Through strategic project management, clear communication, rigorous quality control, and effective client expectation management, this balance can be achieved. We’ve been doing this successfully at e2f for the last twenty years, and we hope that our ‘lessons learned’ will help you do this, too.

Previous
Previous

Burst Capacity and The AI Arms Race

Next
Next

What Does Ethical Sourcing Have to do with Training and Fine-tuning AI and LLM Apps?