CwX 2026: How Caseware Verity closes the gap between AI capability and audit-grade reliability

Consistency, context and human judgment: the real requirements for AI in assurance

AI models can now outperform domain experts on graduate-level questions. Benchmarks that would have stumped a PhD a year ago are being cleared with room to spare. By almost any technical measure, the capability of these models has crossed a threshold that few people predicted this quickly.

And yet, for most accounting firms, that capability sits at arm's length because passing a benchmark and being trustworthy enough to inform a professional conclusion are two entirely different things. The distance between those two points is where most AI in audit and assurance either succeeds or fails and it was the subject of a candid session at CwX 2026 in Fort Lauderdale, led by the team behind Caseware Verity. Caseware Verity is engagement-aware AI embedded in Caseware Cloud that works inside engagements, drawing on permitted engagement data, firm methodology and audit context.

Reflecting on where AI models stood when Caseware built its AI-powered digital assistant, AiDA, in 2024, Quinn Daneyko, Senior Product Manager at Caseware, said "The AI models evolved dramatically. But they weren’t transformative. They couldn’t reason deeply. They didn’t have deep technical knowledge. They weren’t at the level of a strong domain expert across different verticals."

A powerful AI model is just the starting point

AiDA worked the way most generative AI tools did: a user sends a message, context gets attached, the model responds. Verity is structurally different, Daneyko noted. "You send a message to a model and it doesn't just respond. It reasons over that question. It calls different resources, gets what it needs from a variety of different sources, and continues to work through that information until it can achieve a certain outcome."

In practice that means Verity can draw on engagement context such as trial balance data, financial statements, risks, controls, materiality, engagement documents, checklists and firm knowledge bases, as well as authoritative standards where configured and available through parallel sub-agents, before producing a response grounded in the full context of the engagement.

But even that architecture, Daneyko said, is only the foundation. "With that core platform capability, we can now start adding more capabilities. We can start giving it access to more context, giving it the ability to start driving workflows within the engagement." The platform is the enabler. What gets built on top of it still requires something the platform alone can't provide.

The knowledge a model can’t train on

Jason Bradley, VP of AI and Methodology at Caseware, and a former standard-setter, regulator, and inspector, discussed the gap between what a general AI model knows and what a domain expert knows.

General models trained on publicly available data will contain things about audit and financial reporting. But what they contain reflects what was written down and accessible, not the interconnective professional judgment that makes standards actually function in practice.

"If you look at things like SAS 145," Bradley explained, "there’s a lot of requirements, but they’re all connected to each other. Very few standards are isolated. They’re hugely interconnected, and that interconnectivity may not be clear in a base model’s training."

His example of what that looks like in practice: give a general model a trial balance and ask for risks. You'll get something. Inventory is up 11%. Margins have compressed. The output is plausible. But it's missing the synthesis that a competent auditor performs almost automatically — connecting that movement to the board minutes, the prior year control environment, the specific risk context of this client, at this point in time. "Without that context," Bradley said, "you're missing the sophisticated part that a human would do, which is to connect the whole holistically."

Verity’s domain intelligence is delivered through structured methodology and instruction sets that encode professional judgment, standards interconnectivity and firm-specific methodology directly into the Caseware platform.

Consistency over brilliance

There's a dimension to trustworthiness that the AI capability debate usually misses.

"A baseline model without any tuning will sometimes do an OK job, sometimes do a great job, sometimes do a terrible job,” Bradley explained. “The problem is almost more the inconsistency than the quality of the output."

A profession trained from day one to be sceptical needs something it can calibrate for. Occasional brilliance isn't useful if it comes bundled with unpredictable failure. What firms need is outputs that are reliable enough to build a review process around and that requires not just good skills files, but a continuous evaluation framework that reruns assessments every time a new model is released, measures what changed, and adjusts accordingly. "In three months there'll be some new model," Bradley said. "We're setting ourselves up so that we can rerun this as a regression test, test it against whatever's emerged, and see how it changes."

Stacie Simmons, VP of AI at Caseware, connected this to governance. Transparency into the sources and context behind outputs, reviewable suggestions and firm-level controls over how agents behave are what makes consistent quality operationally possible and what allows professional accountability to stay where it belongs.

The human judgment line

Throughout the session, one principle came up repeatedly: AI-assisted outputs must be reviewed and accepted by a human before they are incorporated into an engagement,

"Human judgment is sacrosanct,” Bradley said. “Before anything is written into the engagement, a human will always have to be involved in this process." The agent can surface supporting context, propose risks or next steps, and provide source references where applicable. The reviewer decides.

The goal isn't to replace the judgment call. It's to give the person making it better material to work with: more context synthesised more thoroughly, with supporting context and source references made available where applicable. The difference between a reviewer spending their time on genuine judgment and a reviewer spending their time on remedial correction is where the quality improvement lives.

The gap between an AI model that clears a benchmark and one that a senior auditor would rely on is real. Domain knowledge embedded at the platform level, continuous evaluation, human judgment kept genuinely in the loop, firm-specific context that makes outputs fit for purpose — none of that comes ready-made.

But the direction is clear enough. And for firms still waiting for AI that feels trustworthy rather than just impressive, the session offered an honest account of what closing that gap actually requires.

Learn more about Caseware Verity and Caseware Verity Agentic Suites.

‍

No items found.

Frequently Asked Questions

CwX 2026: How Caseware Verity closes the gap between AI capability and audit-grade reliability

What is Caseware Verity?

Caseware Verity is engagement-aware AI embedded directly in Caseware Cloud that works inside audit engagements, drawing on permitted engagement data, firm methodology, and audit context to provide grounded, workflow-specific insights.

How is Verity different from a general AI tool like ChatGPT?

Unlike general AI models that respond based on publicly available training data, Verity reasons across the full context of an engagement, including trial balances, financial statements, risks, controls, and firm knowledge bases, before producing a response.

Does Verity replace professional judgment?

No. Human judgment remains central to the process. Before any AI-generated output is incorporated into an engagement, a human reviewer must assess and accept it. As Caseware's VP of AI and Methodology put it, "Human judgment is sacrosanct."

Why is consistency important for AI in audit?

Inconsistent AI outputs are a core reliability problem for audit professionals. Verity is supported by a continuous evaluation framework that reruns assessments each time a new model is released, so firms can build a dependable review process around its outputs.

How does Verity handle the interconnectivity of auditing standards?

Verity's domain intelligence is delivered through structured methodology and instruction sets that encode standards interconnectivity and professional judgment directly into the platform, addressing a gap that general AI models trained on public data typically cannot fill.

Can Verity incorporate my firm's own methodology?

Yes. Verity draws on firm-specific knowledge bases and methodology, and operates within firm-configured permissions and controls, so guidance reflects your firm's standards rather than generic best practices.

How does Verity support transparency in AI-generated outputs?

Verity provides source references alongside its suggestions, allowing reviewers to verify the engagement data, standards, or firm guidance that informed each output before accepting it into the engagement file.

How does Verity differ from Caseware's earlier AI tool, AiDA?

AiDA followed a standard prompt-and-response model, while Verity reasons over a question by calling multiple resources and working through information from various sources until it can achieve a specific outcome within the engagement workflow.

‍