Richard Batt |
Building AI Workflows That Handle Edge Cases Without Breaking
Tags: AI, Architecture
Loan processing workflow. Beautiful in testing. Catastrophic in production. Why? Nobody planned for edge cases. Handwritten notes. Unusual formatting. Scanned images. The AI failed silently. After two weeks, 800+ unprocessed applications piled up.
Key Takeaways
- Why AI Workflows Break in Production and what to do about it.
- Core Principles for strong AI Workflows.
- Practical Design Patterns for Edge Cases.
- Human-in-the-Loop Design.
- The Real Cost of Robustness, apply this before building anything.
The problem: they never planned for edge cases. What happens when the applicant's name is on the document in three different formats? What happens when the loan amount is written as "£500,000" on one page and "Five hundred thousand pounds" on another? What happens when the collateral section is missing entirely? What happens when the AI makes an error and extracts the wrong number?
None of these scenarios were handled. The workflow would either crash or produce nonsensical output. After two weeks in production processing 200 applications per day, they'd accumulated a backlog of 800+ documents that the system had failed to process. The real cost wasn't the development time. It was the operational disaster and the decision to temporarily shut down automated processing and go back to manual work.
This happens constantly with AI workflows. People build something impressive in a demo, put it in production, and watch it break the moment it encounters data that doesn't match the assumptions built into the system. The difference between a demo and a production system isn't the AI itself. It's the engineering around it.
Across 120+ projects I've consulted on, I've seen this failure pattern repeat. It's preventable, but it requires thinking about the entire system, not just the AI component.
Why AI Workflows Break in Production
The fundamental issue is that AI models are probabilistic, not deterministic. A traditional software system takes input, follows a defined algorithm, and produces output. If the output is wrong, you can trace back through the logic to find where the algorithm failed. An AI system takes input, passes it through a neural network, and produces output. If the output is wrong, you often can't trace back through the logic, you just know it failed.
This means the input has to be extremely well-behaved for the system to work. And in the real world, input is never that well-behaved.
I worked with a customer support team that implemented an AI system to categorise support tickets. In testing: 94 percent accuracy. In production: 60 percent accuracy. Why the difference? Because in testing, the tickets were well-formatted, clearly written, and used consistent language. In production, tickets were misspelled, incomplete, mixed multiple issues, and used regional slang. The AI model wasn't trained on real-world ticket variation.
The second issue is error propagation. An AI workflow often involves multiple steps: extract data with AI, validate it with rules, send it to another system, maybe process it with another AI model, and then take an action. If any step fails silently, downstream steps are working with corrupt data. I saw a reporting workflow where the first AI step occasionally extracted the wrong date. This error propagated through three more steps until it produced a report with completely wrong information. Because the error was silent, it took two weeks to discover.
The third issue is the assumption that the AI will always return something useful. If the AI is asked to extract information from a document and the information isn't there, what does it do? Some models will make something up. Some will return an empty value. Some will return an error message. If you don't plan for what actually happens, your workflow breaks.
Core Principles for strong AI Workflows
Here are the engineering practices I now implement in every AI workflow I build. They're not novel, they're standard engineering practices applied to AI systems.
First principle: be explicit about assumptions. Before you build anything, document exactly what you're assuming about input. What format will it be in? What's the maximum size? What characters might it contain? What information will always be present? What might be missing? Write this down. Then, build validation that checks whether real-world input matches these assumptions.
I worked with a team building an invoice processing system. The assumption was: invoices will have a clear invoice number, date, and total amount in a standard location. Reality: invoices are formatted differently by different vendors, invoice numbers are in different places, amounts are sometimes broken into multiple line items. The validation caught this, and they had to rethink their approach.
Second principle: always have a fallback path. Don't assume the AI will always produce valid output. Plan for what happens when it doesn't. The fallback might be: return an error and don't process this record. Or it might be: return the raw output without processing. Or it might be: send this to a human for manual review. But have a defined fallback, and make sure it's actually implemented.
I implemented a system that used AI to categorise customer requests. The fallback for any request the AI was uncertain about: put it in a queue for human review. This meant that 92 percent of requests were handled automatically, but the 8 percent where the AI wasn't confident went to a human. The overall system worked because we had a defined fallback.
Third principle: validate at every step. If you have a multi-step workflow, validate output after each step. Check that the output actually makes sense. Check that it matches the schema you're expecting. Check for common errors. If validation fails, stop and route to a fallback.
I worked with a document processing team that was doing this correctly: extract text from document with AI, validate that text contains required fields, extract specific data points with a second AI model, validate that data points are in the right format, save to database, validate that data was saved correctly. Eight validation checkpoints for a four-step workflow. This seems excessive until you realise that without those checkpoints, errors propagate silently and destroy data integrity.
Fourth principle: log everything. When an AI workflow fails, you need to understand why. Log the input, log the AI output, log validation results, log which path was taken through the fallback logic. When something goes wrong, these logs let you understand what happened and whether it was an input problem, an AI error, or a logic error.
I reviewed a data extraction workflow that was failing on 2 percent of records. Without logs, this was impossible to debug. With logs, we could see that the AI was failing on documents with specific characteristics, handwritten notes, unusual formatting, scanned images. Once we understood the pattern, we could build preprocessing to handle these cases better.
Fifth principle: build monitoring and alerting. Don't assume the workflow will keep working. Monitor the output quality continuously. Track: how many records are being processed, how many are failing, what's the error rate from the AI, how many are hitting fallback paths. If the error rate spikes, alert someone.
I worked with a team that deployed a workflow and forgot about it. Two months later, they realised that the AI's accuracy had degraded to 40 percent, nobody was monitoring it. With proper monitoring, they would have been alerted to the degradation within days. As it was, they'd already processed thousands of records with wrong output.
Practical Design Patterns for Edge Cases
Given those principles, here are specific design patterns that work well in practice.
Pattern one: validation with confidence scoring. Some AI models can return a confidence score along with their output, how confident are they in the answer? Use this. Set a threshold: if confidence is above 90 percent, process automatically. If it's between 70 and 90 percent, flag for human review. If it's below 70 percent, don't process. This is a simple way to route uncertain cases without requiring explicit edge case handling.
Pattern two: input normalisation. Before feeding data to the AI, normalise it. Clean up formatting, standardise encoding, remove ambiguity. A financial team I worked with was extracting amounts from documents. Before using AI, they had a preprocessing step that identified all the different ways amounts could be formatted (£1,000, £1000, 1000 pounds, etc.) and normalised them to a standard format. This massively reduced the AI's job and reduced error rates.
Pattern three: output constraints. When the AI produces output, force it into a defined schema. If it's supposed to return a date, parse it as a date and validate that it's actually a valid date. If it's supposed to return a number, try to parse it as a number and validate that it's in the expected range. If parsing fails, that's an error condition that triggers a fallback.
I implemented this for a system that was using AI to extract data from unstructured documents. The AI would return things like "approximately £500k" or "around 5 million quid". The output constraint step would try to parse this as a number. When parsing failed, the system flagged it for human review. This caught ambiguous or uncertain output.
Pattern four: redundant extraction. For critical information, don't rely on a single AI extraction. Run it through two models or ask the model twice with different prompts. If both return the same answer, you have high confidence. If they differ, flag for human review. This costs more, but for critical data, it's worth it.
Pattern five: staged processing. Don't try to solve everything in one step. Break the workflow into stages. Stage one: extract broad categories. Stage two: extract specific details within those categories. If stage one fails, you don't need to run stage two. If stage two fails, you have stage one's output to fall back on. This limits error propagation.
I worked with a compliance team that was using AI to identify risks in contracts. They did this in stages: stage one, identify contract type. Stage two, identify key terms relevant to that type. Stage three, flag potential compliance issues. If stage one failed to identify the contract type, they didn't run stages two and three. This meant errors didn't propagate and the system stayed strong.
Human-in-the-Loop Design
The most strong AI workflows aren't purely automated. They're designed with humans in the loop.
This doesn't mean slow. A well-designed human-in-the-loop system can be faster than a fully automated system because it focuses human effort only where it's needed. I built a system that processed 500 customer requests per day: 85 percent were handled entirely by AI, 12 percent were flagged for AI with human review, 3 percent went straight to humans. The humans spent 4 hours per day handling the 12 percent and 3 percent, which meant 497 requests were handled with roughly 4 hours of human effort. Without the AI, processing 500 requests would have taken multiple people a full day.
The design pattern: the AI does what it's good at (fast, consistent processing), and humans handle exception cases. But crucially, you need a system for capturing human decisions and feeding them back. When a human reviews an AI decision and overrides it, that data should be captured. Over time, you can use that data to improve the AI or update the validation rules.
I worked with a customer support team that implemented this. They had an AI system categorising tickets. When humans disagreed with the categorisation, it was logged. After six months of data, they retraced the AI model using the human corrections. Accuracy improved from 82 percent to 91 percent.
The Real Cost of Robustness
Here's what people often get wrong: they think building a strong AI workflow costs the same as building a basic one. It doesn't. Robustness adds work.
A basic AI workflow: feed data in, get answer out. Maybe 40 percent of development time goes to AI implementation, 60 percent to the basic infrastructure. A strong AI workflow: input validation, output validation, error handling, fallback logic, monitoring, alerting, human-in-the-loop review process, logging. Now, maybe 30 percent of development time goes to AI implementation, 70 percent to robustness.
This is why demos look so impressive and production systems struggle. Demos skip all the robustness work. Production systems can't.
The financial services firm I mentioned at the start? They had a demo that took three weeks to build. They had a production-ready version that took eight weeks to build. The extra five weeks was all robustness: handling edge cases, error paths, validation, monitoring, human fallback processes. That five weeks prevented a disaster.
I've learned to budget for robustness upfront. When I estimate the cost of building an AI workflow, I assume 60 percent of effort goes to robustness. It sounds like waste until the system encounters a real-world edge case and keeps working.
Common Mistakes to Avoid
Based on 120+ implementations, here are the patterns that consistently lead to failure.
Mistake one: assuming the demo will work in production. The demo works on clean, test data. Production gets garbage. Plan for that.
Mistake two: treating the AI model as the only thing that matters. The model is 30 percent of the problem. The engineering around it is 70 percent.
Mistake three: not having a rollback plan. If the AI workflow goes wrong, you need a way to revert to manual processing. If you haven't planned this, you're stuck.
Mistake four: not monitoring error rates. If you don't know the system is failing, you can't fix it.
Mistake five: not involving operations in the design. The people who'll actually operate the system need to be involved in building it. They'll catch robustness issues that engineers miss.
Getting This Right
The teams that build strong AI workflows do this: they treat it as an engineering problem, not just an AI problem. They plan for failure modes upfront. They build validation at every step. They have clear fallback paths. They monitor obsessively. And they iterate based on what they learn from production.
It's not glamorous. It doesn't sound impressive in a pitch. But it's the difference between a system that works and one that fails in production.
Frequently Asked Questions
How long does it take to implement AI automation in a small business?
Most single-process automations take 1-5 days to implement and start delivering ROI within 30-90 days. Complex multi-system integrations take 2-8 weeks. The key is starting with one well-defined process, proving the value, then expanding.
Do I need technical skills to automate business processes?
Not for most automations. Tools like Zapier, Make.com, and N8N use visual builders that require no coding. About 80% of small business automation can be done without a developer. For the remaining 20%, you need someone comfortable with APIs and basic scripting.
Where should a business start with AI implementation?
Start with a process audit. Identify tasks that are high-volume, rule-based, and time-consuming. The best first automation is one that saves measurable time within 30 days. Across 120+ projects, the highest-ROI starting points are usually customer onboarding, invoice processing, and report generation.
How do I calculate ROI on an AI investment?
Measure the hours spent on the process before automation, multiply by fully loaded hourly cost, then subtract the tool cost. Most small business automations cost £50-500/month and save 5-20 hours per week. That typically means 300-1000% ROI in year one.
What are the main risks of implementing AI in my business?
The three biggest risks are: data quality issues (bad data in means bad decisions out), lack of oversight (automations running without monitoring), and vendor lock-in (building on a platform that changes pricing or features). All three are manageable with proper governance, documentation, and a multi-vendor strategy.
Put This Into Practice
I use versions of these approaches with my clients every week. The full templates, prompts, and implementation guides, covering the edge cases and variations you will hit in practice, are available inside the AI Ops Vault. It is your AI department for $97/month.
Want a personalised implementation plan first? Book your AI Roadmap session and I will map the fastest path from where you are now to working AI automation.