Richard Batt |
61% of Companies Say Their Data Is Not Ready for AI
Tags: AI Strategy, Operations
Why Data Readiness Is the Real Blocker
I was in a board meeting with a healthcare client last month when they admitted something: they had invested £2 million in AI initiatives, and almost none of them were working. The CTO looked frustrated. The CFO looked angry. But when we dug into it, the problem was not the AI. It was the data.
Key Takeaways
- Why Data Readiness Is the Real Blocker and what to do about it.
- The Data Readiness Checklist, apply this before building anything.
- Why Traditional ETL Architectures Create Friction and what to do about it.
- The Phased Approach That Does Not Require a Multi-Million Pound Migration, apply this before building anything.
- The AI-First Data Architecture, apply this before building anything.
Data was scattered across 14 different systems. Some systems had no API access. Data definitions were inconsistent (one system called it "customer_id," another called it "custID," and a third stored it as "Client Number"). Data quality was terrible. And there was no way to connect data across systems to train models.
This is not unique to this client. Seventy percent of companies struggle to scale AI on proprietary data. Sixty-one percent admit their data is not ready for GenAI. This is not a technology problem. It is a data problem.
And the fix is not what most people think it is.
The Data Readiness Checklist
I have worked through data readiness with enough organizations that I now have a framework. It is a checklist. Work through it. Be honest about where you are. Then figure out what to fix first.
Question 1: Can You Find Your Data?
Start with the obvious. Do you know where your data is? Not philosophically. Practically. If I asked you where customer contact information lives, could you tell me in 10 seconds?
Most organizations say no. Data is scattered. It is in Salesforce, Google Sheets, an old database from 2008, two different ERP systems, someone's laptop, a CSV file in Google Drive. There is no central map of where data lives.
Practical tip: Create a data inventory. It does not need to be sophisticated. Just a spreadsheet. For each dataset: where is it? Who owns it? How often is it updated? Who has access? This takes a few weeks. Do it.
Question 2: Is Your Data Consistent?
This is where most data readiness projects fail. You have data, but it is a mess.
Customer data lives in three systems and they do not agree. One system says the customer is in "London." Another says "GB-LON." A third stores latitude and longitude. Are these three records the same customer? You cannot tell.
Product data is the same. One system calls it SKU. Another calls it Product Code. They use different numbering schemes. Do they reference the same products? No idea.
This is not a small problem. You cannot train AI models on inconsistent data. The model learns noise instead of patterns.
Practical tip: Pick one dataset. Audit it for consistency. What variations exist? Create a data dictionary. Define what "customer" means. Define what "address" means. Define what "product" means. Make it the standard.
Question 3: What Is Your Data Quality?
Consistency is necessary but not sufficient. You also need quality. Is the data accurate? Is it complete?
Most datasets I see have massive quality problems. Customer records with no email address. Product records with prices that are obviously wrong. Dates that are impossible. Duplicates that are not marked as duplicates.
Here is the problem: bad data is invisible until you try to use it. You might have 10 million customer records, but if 30 percent are duplicates or invalid, your actual usable records are 7 million. AI models trained on bad data produce bad results.
Practical tip: Audit one dataset for quality. Sample 1,000 records. Manually review them. How many are missing critical fields? How many are duplicates? How many have obviously wrong values? Do the math. This tells you your quality percentage. Most companies are shocked. Most are below 80 percent.
Question 4: Can You Access Your Data Programmatically?
This is technical but critical. Can your AI systems actually read the data they need?
Some data is locked in databases that have no API. You have to export it manually. Some data is in systems that do have APIs, but access is restricted. Some data is in spreadsheets. Some data is in PDFs.
If you cannot programmatically access data, you cannot automate the flow of data to AI systems. You are stuck with manual workflows.
Practical tip: For each critical dataset, answer this question: can our AI systems read this data automatically? If the answer is no, it is a blocker. You need either to get API access or to accept manual data handling.
Question 5: Are Your Data Silos Actually Silos?
Most organizations have data silos. Marketing has their own data. Sales has theirs. Operations has theirs. They do not talk to each other.
Sometimes this is intentional. Sometimes it is just how things grew. Either way, it is a problem for AI. AI often needs to connect data across silos. Marketing data plus sales data plus customer data plus finance data. You cannot answer interesting questions from single silos.
But breaking silos is hard. It requires governance. It requires people to agree on definitions. It requires sharing data across teams that are sometimes competitive.
Practical tip: Identify the top three business questions that require cross-silo data. Prioritize solving those. Do not try to break all silos at once.
Question 6: What Is Your Data Refresh Cycle?
AI needs fresh data. If your customer data is refreshed every month but your sales data is refreshed every week, they drift out of sync. Models trained on one yesterday and another today produce weird results.
Worse: some data might be a year old. Historical data is fine for analysis, but if you are feeding it into real-time decisions, you have a problem.
Practical tip: For each critical dataset, document the refresh cycle. Then think about what AI use cases need what data freshness. Real-time decisions need near-real-time data. Historical analysis can use older data. Match your data refresh to your use case.
Question 7: Do You Have Data Lineage?
This is the fancy term for: do you know where your data came from and what it has been through?
You have a number in a dashboard. Where did it come from? What transformations was it through? If it is wrong, where is the error? If you cannot answer these questions, you have a problem.
Data lineage is essential for debugging. An AI model produces a weird result. Is it because the input data is wrong? Or is the model broken? You cannot tell without lineage.
Practical tip: Start documenting data lineage for your most critical datasets. It does not need to be sophisticated. A diagram showing where data comes from and what happens to it is a start.
Why Traditional ETL Architectures Create Friction
Here is something that frustrates me. Most organizations use traditional ETL (Extract, Transform, Load) architecture. You extract data from source systems. You transform it in a data warehouse. You load it into applications.
This works fine for traditional applications. But it creates friction for AI. AI wants real-time access to current data. ETL gives you batch data from yesterday. AI wants to combine data across sources. ETL puts everything in one place and flattens it. AI wants to trace lineage. ETL obscures it.
Modern data architectures (data lakes, data meshes) are better for AI. They keep data closer to source. They maintain relationships. They provide real-time access. But they require rethinking how you build data infrastructure.
Practical tip: If you are building new data infrastructure for AI, skip traditional ETL. Look at modern approaches: event streaming (Kafka), data lakes (S3), or data mesh approaches. They play better with AI.
The Phased Approach That Does Not Require a Multi-Million Pound Migration
I hear horror stories about data modernization projects. Companies spend millions. Take three years. Then the CTO leaves and nobody knows what they built.
You do not need that. You can modernize in phases. You do not have to replace everything at once.
Phase 1: Data Inventory and Quality Audit (Weeks 1-4)
Figure out what you have. Create an inventory. Audit quality. Cost: basically free. Value: high. You now know what you are dealing with.
Phase 2: Fix the Highest-Impact Data (Weeks 5-12)
Identify the dataset that would unlock the most AI value if it were clean and accessible. Fix that one. Deduplicate. Standardize definitions. Add API access if needed. Cost: £10K to £50K depending on complexity. Value: high. Now you have one dataset that works.
Phase 3: Connect It to Your Second-Most-Important Dataset (Weeks 13-20)
Now you have two datasets that talk to each other. You can start doing real analysis. You can train models. Cost: another £10K to £50K. Value: multiplied. Now you have connected data.
Phase 4: Rinse and Repeat (Ongoing)
Keep fixing and connecting datasets. Do not try to fix everything. Just keep improving. After 12 months of this approach, you are in much better shape and you have spent £50K to £150K instead of £2 million.
The AI-First Data Architecture
If you are starting fresh (building a new company, new business unit), think about data differently. Instead of starting with ETL and data warehouses, start with APIs and event streams.
Design your data architecture assuming AI will consume it. That means: real-time access, rich relationships between datasets, lineage tracking, quality monitoring. It is actually simpler architecture than traditional ETL, and it works better for both traditional applications and AI.
The Measurement That Matters
Here is what I track to know if a company is data-ready: what percentage of their data can be accessed programmatically by an AI system right now?
Less than 20 percent: you are not ready. Too much manual work. More than 20 percent: you have a foundation. More than 50 percent: you are pretty ready. More than 80 percent: you are in good shape.
Most companies I see are at 15 to 25 percent. That is why AI projects struggle.
The Tough Conversation
Here is the conversation I have with every client: your data is not ready, and that is not your fault. It is not anyone's fault. It is just how companies grow. Data accumulates. Systems accumulate. You do not have a grand plan for data from day one.
But you do have a choice now. Invest in data readiness before you invest heavily in AI. Or keep trying to build AI on a broken foundation. Most companies choose the first option once they see the cost of the second.
Richard Batt has delivered 120+ AI and automation projects across 15+ industries. He helps businesses deploy AI that actually works, with battle-tested tools, templates, and implementation roadmaps. Featured in InfoWorld and WSJ.
Frequently Asked Questions
How long does it take to build AI automation in a small business?
Most single-process automations take 1-5 days to build and start delivering ROI within 30-90 days. Complex multi-system integrations take 2-8 weeks. The key is starting with one well-defined process, proving the value, then expanding.
Do I need technical skills to automate business processes?
Not for most automations. Tools like Zapier, Make.com, and N8N use visual builders that require no coding. About 80% of small business automation can be done without a developer. For the remaining 20%, you need someone comfortable with APIs and basic scripting.
Where should a business start with AI implementation?
Start with a process audit. Identify tasks that are high-volume, rule-based, and time-consuming. The best first automation is one that saves measurable time within 30 days. Across 120+ projects, the highest-ROI starting points are usually customer onboarding, invoice processing, and report generation.
How do I calculate ROI on an AI investment?
Measure the hours spent on the process before automation, multiply by fully loaded hourly cost, then subtract the tool cost. Most small business automations cost £50-500/month and save 5-20 hours per week. That typically means 300-1000% ROI in year one.
Which AI tools are best for business use in 2026?
It depends on the use case. For content and communication, Claude and ChatGPT lead. For data analysis, Gemini and GPT work well with spreadsheets. For automation, Zapier, Make.com, and N8N connect AI to your existing tools. The best tool is the one your team will actually use and maintain.
Put This Into Practice
I use versions of these approaches with my clients every week. The full templates, prompts, and implementation guides, covering the edge cases and variations you will hit in practice, are available inside the AI Ops Vault. It is your AI department for $97/month.
Want a personalised implementation plan first? Book your AI Roadmap session and I will map the fastest path from where you are now to working AI automation.