137AI > Compliance & Conformity > Third-Party AI Audits

Third-Party AI Audits

Third-party audit practice addresses the broader landscape of external assessment for AI beyond the specific EU Notified Body framework. The discipline operates across voluntary audits, sector-specific audits, AI Safety Institute evaluation, academic and research evaluation, journalistic investigation, civil society audits, and the developing AI-specific commercial audit market. The combined infrastructure provides external accountability that operator internal assessment alone cannot produce.

The page pairs with adjacent work covered separately. Notified Bodies addresses the specific EU public-authority designated assessment bodies operating under EU regulations including the AI Act. Accountability addresses the broader integration discipline of responsibility allocation. Red Teaming addresses the adversarial evaluation discipline. This page covers third-party audit as broader category including all the different forms third-party assessment takes across the AI ecosystem.

Why Third-Party Audit Is a Distinct Discipline

Third-party audit provides specific properties that other forms of assessment do not. The distinction matters because the audit framework operates differently from internal review, customer assessment, or regulatory examination.

Third-party position supports independence that other positions do not produce. The auditor is independent from operator and from customer; the position supports credibility that purely internal or purely customer-driven assessment cannot match.

Third-party audit provides documented findings that support specific purposes. Audit reports, certifications, attestations, and similar audit products provide structured documentation that operators can present to customers, regulators, investors, and other stakeholders.

Third-party audit involves substantive methodology. The audit follows specific methodology, examines specific evidence, and produces conclusions based on the methodology and evidence. The structured approach differs from informal assessment in ways that affect what conclusions audit supports.

Third-party audit operates across multiple contexts. Voluntary audits address operator-chosen scope; required audits address regulatory or contractual requirements; investigative audits address specific concerns. The diverse contexts produce diverse audit landscape.

Third-party audit infrastructure continues to develop for AI. The discipline is at earlier stage of maturity for AI than for financial audit, cybersecurity audit, or quality management audit. The development continues alongside broader AI compliance infrastructure development.

Audit Methodology Categories

Multiple distinct audit methodologies have emerged for AI with different scopes, methodologies, and applications. Mature operators may engage multiple audit types rather than relying on any single methodology.

Audit Type	What It Addresses	Typical Methodology
Algorithmic audit	Bias, fairness, accuracy, and broader performance properties of AI systems	Statistical analysis of model behavior across populations, benchmark comparison, counterfactual analysis, disparate impact assessment
Compliance audit	Conformance with specific regulatory or standard requirements	Evidence review against specific requirements, gap analysis, certification against frameworks
Security audit	AI-specific cybersecurity practices and vulnerabilities	Cybersecurity assessment adapted to AI including model security, training pipeline, inference infrastructure
Safety audit	Operational safety practice and safety case validation	Safety case review, evidence assessment, methodology evaluation, often aligned with UL 4600 or equivalent frameworks
Process audit	AI development and deployment processes	Process documentation review, process execution observation, evidence of process compliance
Outcome audit	Downstream effects of AI deployment in production	Population-level analysis of AI deployment outcomes, longitudinal studies, affected party engagement
Data audit	Training data quality, provenance, and characteristics	Dataset analysis, provenance verification, representativeness assessment, label quality evaluation
Governance audit	Organizational governance of AI development and deployment	Governance structure review, decision-making process evaluation, accountability assignment assessment
Impact audit	Broader societal, environmental, and stakeholder impacts of AI	Impact assessment methodology, stakeholder engagement, comparison against established practice

The methodologies are not mutually exclusive. Comprehensive third-party assessment may engage multiple methodologies; specific assessments often combine elements from multiple categories. The categorization supports understanding what specific assessments accomplish.

Standards and Certifications for AI Audit

Several standards and certification programs provide infrastructure for third-party AI audit. The infrastructure continues to develop.

ISO/IEC 17000-series standards provide foundational audit infrastructure. ISO/IEC 17020 addresses inspection bodies; ISO/IEC 17021 addresses management system certification bodies; ISO/IEC 17025 addresses testing and calibration laboratories; ISO/IEC 17065 addresses product certification bodies; ISO/IEC 17029 addresses validation and verification bodies. The standards specify requirements for audit body competence, impartiality, and methodology that apply across audit types including AI.

ISO/IEC 42006 provides specific requirements for audit and certification of AI management systems. The standard addresses certification of organizations against ISO/IEC 42001 and provides the auditor-side infrastructure that supports the AI management system certification market.

SOC 2 (System and Organization Controls 2) reports provide attestation framework for AI vendors that has been adapted for AI-specific concerns. The AICPA framework provides Type 1 (point-in-time) and Type 2 (operational effectiveness over period) reports that AI vendors increasingly provide to customers.

AICPA Statement on Standards for Attestation Engagements (SSAE) provides foundational framework for the SOC reports and broader attestation work. The framework operates across industries with AI-specific application developing.

Sector-specific audit standards including healthcare audit standards (HITRUST, HIPAA Security Rule audit), financial services audit standards (SOC for Cybersecurity, financial sector specific frameworks), and others apply with AI-specific extensions.

The NYC Local Law 144 framework for bias audits of automated employment decision tools provides one of the first specific AI audit regulatory frameworks. The detailed treatment appears in Bias & Fairness. The framework continues to inform broader AI bias audit practice development.

The aggregate audit standards landscape supports varied audit practice with the specific framework selected based on what the audit addresses.

Who Performs AI Audits

Multiple organizational types perform AI audits with different positioning, capabilities, and methodologies.

Major audit firms including Deloitte, EY, KPMG, and PwC (the Big Four) have substantial AI audit practice. The firms leverage broad audit infrastructure, deep client relationships, and substantial methodology development to perform AI audits across many client contexts. The Big Four position produces both substantial market presence and specific limitations including potential conflicts with audit clients.

Mid-tier audit firms including BDO, Grant Thornton, RSM, and others provide AI audit services with somewhat different positioning. The mid-tier firms may serve different client segments and offer different methodology approaches than the Big Four.

Specialized AI audit firms have emerged with focus specifically on AI assessment. Firms including BABL AI, Holistic AI, Trail of Bits, and others offer AI-specific audit services with deep methodology focus. The specialized firms provide depth in AI methodology that general audit firms may not match.

Cybersecurity firms with AI capability address AI-specific security audit. The cybersecurity audit infrastructure has been extending to AI with substantial activity across major cybersecurity firms.

Academic researchers perform audits through research methodology. Academic audits produce both specific findings and methodology development that broader audit practice draws on. The work has different incentive structure than commercial audit and produces different output.

Civil society organizations including Algorithmic Justice League, AI Now Institute, and others perform audits through public-interest methodology. The work has produced substantial influence on broader audit practice and public understanding of AI deployment.

AI Safety Institutes including UK AISI, US AISI, and equivalent institutes perform evaluation work that operates in the audit landscape though framed as evaluation rather than audit. The institutional position is distinctive and the work continues to develop.

Internal audit functions extending to AI provide first-party assessment that operates alongside third-party work. The detailed treatment of internal audit appears in Accountability. Internal audit is not third-party but pairs with third-party work in mature operator practice.

Bug bounty programs and similar adversarial assessment infrastructure operate in audit-adjacent space. The detailed treatment appears in Red Teaming.

Journalistic investigation has produced substantial audit-like work that operates outside the formal audit framework. Major investigative journalism on AI including ProPublica's COMPAS reporting, Reuters' Amazon hiring algorithm coverage, and equivalent work shapes broader audit landscape through specific findings and methodology examples.

Specific Notable AI Audit Work

Several specific audit projects have shaped AI audit practice and warrant direct reference.

The NYC Local Law 144 bias audit requirements for automated employment decision tools have produced substantial audit activity since 2023 implementation. The requirements specify what audits must include, who can perform them, and what must be publicly disclosed. The framework provides one of the first concrete regulatory audit requirements specific to AI, with operational learning extending beyond NYC.

The Optum healthcare AI audit work documented in Obermeyer et al. (2019) provides foundational reference for AI bias audit methodology. The work identified substantial racial bias in widely-deployed healthcare AI affecting tens of millions of patients; the methodology has informed subsequent healthcare AI audit work substantially.

The Apple Card credit decision audit work conducted by NY DFS provides reference for regulatory audit of consumer AI. The investigation examined whether algorithmic credit decisions produced gender-based disparate impact with substantial methodology development through the investigation.

The ProPublica COMPAS audit published in 2016 provides foundational reference for criminal justice AI audit. The work identified racial disparities in risk assessment outputs; the methodology and findings have substantially shaped subsequent criminal justice AI audit work.

Major AI vendor system cards including OpenAI GPT-4 system card, Anthropic Claude system cards, and equivalent work from other vendors include audit-like content though framed as vendor self-disclosure. The work informs broader audit practice and supports third-party auditor methodology development.

The AI Safety Institute evaluations of frontier models including pre-deployment evaluations of major model releases provide reference for institutional evaluation work. The evaluations produce findings that inform both vendor practice and broader public understanding.

Academic auditing work across multiple universities has produced substantial methodology development. The work continues across many specific applications including facial recognition, content moderation, hiring algorithms, and other AI applications.

Civil society audit work including Algorithmic Justice League research on facial recognition bias, AI Now Institute work on multiple AI applications, and equivalent civil society research has substantively shaped audit practice and public understanding.

The Auditor Independence Question

Independence is foundational to third-party audit but is operationally complex. The independence question affects what audits can credibly establish.

Financial relationships affect independence. When operators pay auditors, the relationship creates incentive for auditors to produce findings operators favor. The pattern affects audit credibility regardless of auditor intent or capability.

Access dependencies affect independence. Auditors who need ongoing operator cooperation for future audits may face pressure to produce findings that preserve the relationship. The dependency affects what auditors will report or emphasize.

Methodology selection affects independence. Operators may select auditors based partly on methodology choices, with auditors using more conservative methodology more likely to receive engagements than auditors using more aggressive methodology. The selection produces market dynamics affecting what audit practice develops.

Conflicts of interest affect specific engagements. Auditors providing consulting services to operators face conflicts when subsequently auditing those operators. The conflicts are addressed in some audit frameworks through specific independence requirements but remain operationally complex.

Industry capture affects audit practice broadly. Auditors who depend on industry for engagements may develop methodology and norms that favor industry positioning. The pattern is well-documented across audit industries and applies to AI audit.

The independence challenges are not unique to AI audit. Financial audit, cybersecurity audit, and other established audit industries have addressed independence through specific frameworks including PCAOB oversight of public company auditors, AICPA independence rules, and equivalent infrastructure. AI audit infrastructure continues to develop equivalent frameworks.

External oversight of auditors addresses some independence concerns. National accreditation bodies that accredit auditors, regulatory oversight of audit firms operating in regulated sectors, and broader oversight infrastructure support independence beyond what auditor self-regulation alone produces. The oversight infrastructure for AI audit continues to develop.

Specific independence considerations for AI audit include the relationship between AI vendor and AI auditor in markets where major audit firms serve major AI vendors as both audit and consulting clients. The patterns warrant specific attention as the AI audit market develops.

AI Safety Institute Evaluation

AI Safety Institute evaluation operates in audit-adjacent space with specific institutional characteristics that distinguish it from commercial audit. The detailed treatment of AISI work appears in Red Teaming and International Coordination; the audit dimension warrants specific treatment here.

AISI evaluation is conducted by national government bodies with public-interest mission rather than commercial audit firms. The institutional position produces different incentive structure than commercial audit, with substantive implications for what evaluations cover and how findings are handled.

AISI access to pre-deployment models provides substantive audit-like work. Multiple frontier model releases have included AISI evaluation as part of pre-deployment work. The access supports evaluation that purely external work could not perform.

AISI methodology development continues. The institutes have been developing methodology specifically for frontier AI evaluation including dangerous capability evaluation, broader safety evaluation, and emerging methodology for evaluating increasingly capable systems.

AISI findings disclosure operates through specific institutional channels. Findings may inform vendor practice, regulatory positioning, public understanding, and international coordination. The disclosure varies based on national security considerations, vendor relationship considerations, and broader policy considerations.

AISI coordination through bilateral arrangements and broader international engagement extends institutional infrastructure beyond any single jurisdiction. The MoU between US AISI and UK AISI provides foundational coordination model that broader arrangements may follow.

AISI evaluation is voluntary from operator side. Major frontier labs have engaged AISI evaluation through specific arrangements rather than regulatory mandate. The voluntary nature affects what evaluation operators undertake but operates alongside developing regulatory frameworks that may produce mandatory evaluation requirements.

The AISI model has been substantively important for frontier AI evaluation but does not extend to all AI audit needs. Non-frontier AI applications, deployment-specific concerns, and the broader range of AI audit requirements operate through other infrastructure. AISI work supplements rather than replaces broader audit infrastructure.

Documentation and Disclosure

Audit produces documentation that supports specific purposes. The documentation and disclosure dimensions affect what audit accomplishes.

Audit reports document findings, methodology, evidence, and conclusions. The reports support operator-customer relationships, regulatory engagement, and broader stakeholder communication. Report content and format depend on the audit type and the specific framework involved.

Audit attestations provide structured statements about specific properties. SOC 2 reports, ISO certifications, and equivalent attestations provide formatted disclosure that customers and other stakeholders can engage with consistently.

Public disclosure of audit findings varies substantially across audit types. NYC LL144 bias audits require specific public disclosure; SOC 2 reports are typically shared only with specific customers under confidentiality; academic and civil society audits typically include public disclosure; regulatory audits may produce confidential or public findings depending on framework.

Embargo arrangements protect specific findings from premature disclosure. Sensitive findings affecting deployed AI, findings related to specific vulnerabilities, and findings whose public disclosure could produce harm may face embargo for some period.

Confidentiality arrangements limit what auditors can disclose. Commercial audit typically operates under substantial confidentiality with reports flowing only to engaging operator and specific authorized recipients. The confidentiality affects what audit practice contributes to broader field understanding.

Coordinated disclosure for security findings extends the cybersecurity model to AI audit findings. The pattern involves operator notification, remediation period, and structured public disclosure of findings affecting deployed AI.

The disclosure landscape continues to develop. The tension between operator confidentiality interest and public benefit from disclosure shapes the framework that audit practice continues to navigate.

The Developing AI Audit Market

The AI audit market has been developing rapidly with substantial activity across multiple dimensions.

Market growth has been substantial. AI audit demand from operators, customers, and regulators has produced substantial market expansion since 2023. The growth continues alongside broader AI deployment expansion.

Service offerings have expanded across audit firms. Major audit firms have developed AI audit practices; specialized firms have emerged; sector-specific audit firms have extended to AI. The aggregate service offering provides substantial market capacity.

Pricing for AI audit varies substantially. Different audit types, different operator scale, different scope, and different audit firm positioning all affect pricing. Operators evaluating AI audit benefit from market understanding rather than assuming consistent pricing.

Methodology development continues across the audit community. The methodology for specific audit types continues to mature with substantial activity in algorithmic audit, AI security audit, and broader AI audit categories.

Standardization efforts address methodology consistency. ISO/IEC, AICPA, and other standards bodies have been developing standards for AI audit methodology. The standardization supports more consistent practice across audit firms but takes time to develop.

Auditor competence development addresses the substantive technical work AI audit requires. Training programs, certification programs, and broader auditor education continue to develop. The infrastructure supports market growth.

Regulatory drivers shape market development. NYC LL144, EU AI Act, emerging state AI legislation, and other regulatory developments produce specific audit demand that shapes market growth.

Customer expectations drive audit beyond regulatory mandate. Many operators pursue AI audit to support customer relationships, investor relationships, and broader stakeholder positioning rather than strictly for regulatory compliance.

Limitations of Third-Party Audit

Third-party audit has substantial limits that operators and stakeholders should engage directly.

Audit findings are point-in-time. Audits assess what was observed at the audit time; subsequent operator changes may affect audit findings substantively. Ongoing surveillance audits address this partially but produce less detailed assessment than initial audits.

Audit scope is bounded. Audits address what they specifically examine; matters outside scope are not assessed. Operators that pass audits may have substantive concerns outside the audit scope.

Methodology limits affect what audit can establish. AI audit methodology continues to develop; current methodology has known limits that affect what audit can credibly establish. The limits do not eliminate audit value but bound what specific audits accomplish.

Auditor capability varies. Different auditors apply different methodology rigor, technical depth, and broader competence. Audit value depends substantially on the specific auditor performing the work.

The audit fallacy is operationally significant. Operators that have been audited may be assumed to be operating well; the audit may not establish what the assumption implies. Mature stakeholders evaluate audit substantively rather than treating audit completion as automatic positive signal.

Independence challenges discussed above affect audit credibility. The challenges are partially addressed through audit infrastructure but cannot be fully eliminated in market-driven audit.

Cost-benefit considerations bound what audit covers. Comprehensive audit is expensive; audit scope reflects cost constraints alongside substantive coverage decisions. Cost limits affect what specific audits accomplish.

The relationship between audit findings and operator improvement is uncertain. Audits may produce findings that operators address substantively; audits may produce findings that operators handle through documentation without substantive change. The improvement dimension depends on operator engagement beyond audit completion.

Practical Implications for Operators

For operators engaging third-party audit, the landscape produces several practical implications.

Audit purpose shapes audit selection. Operators benefit from clarity about what specific audit accomplishes before engagement. Customer-driven audits, regulatory audits, and investor-driven audits may have different specific requirements.

Auditor selection involves substantive considerations including methodology, competence, independence, cost, timeline, and relationship considerations. Operators benefit from systematic auditor selection rather than ad hoc engagement.

Documentation preparation supports efficient audit. Operators with mature documentation infrastructure enable more efficient and substantive audit than operators with minimal documentation.

Audit response addresses both substantive remediation and disclosure. Operators that engage findings substantively produce different operational outcomes than operators that treat findings as compliance documentation.

Audit relationship continuity supports ongoing engagement. Long-term auditor relationships produce both familiarity and independence challenges; mature operator practice balances these considerations deliberately.

Multiple audit relationships address different specific purposes. Operators may engage multiple auditors for different audit types rather than relying on single auditor for all assessment. The pattern supports breadth and avoids excessive dependency on any single audit relationship.

Internal audit integration supports unified practice. Internal audit and third-party audit address different but related dimensions; operators that integrate the two produce more coherent assessment than operators that treat them separately.

Disclosure practice for audit findings affects stakeholder relationships. Operators that disclose audit findings substantively produce different stakeholder relationships than operators that minimize disclosure. The choice involves trade-offs that operators navigate deliberately.

The Reframe

Third-party audit practice addresses the broader landscape of external assessment for AI beyond the EU Notified Body framework. The discipline operates across voluntary audits, sector-specific audits, AI Safety Institute evaluation, academic and research evaluation, journalistic investigation, civil society audits, and the developing AI-specific commercial audit market. Audit methodology categories include algorithmic, compliance, security, safety, process, outcome, data, governance, and impact audits with different scopes and applications. Standards and certifications including ISO/IEC 17000-series, ISO/IEC 42006, SOC 2 framework, AICPA SSAE, sector-specific frameworks, and emerging AI-specific frameworks provide audit infrastructure. Audit performers include Big Four audit firms, mid-tier firms, specialized AI audit firms, cybersecurity firms, academic researchers, civil society organizations, AI Safety Institutes, internal audit functions extending to AI, bug bounty programs in audit capacity, and investigative journalism producing audit-like work. Specific notable audit work including NYC LL144 bias audits, Optum healthcare AI audit, Apple Card audit, ProPublica COMPAS audit, vendor system cards, AISI frontier model evaluations, academic auditing work, and civil society audit work has shaped the field. The auditor independence question involves financial relationships, access dependencies, methodology selection, conflicts of interest, industry capture, and developing oversight infrastructure. AI Safety Institute evaluation operates in audit-adjacent space with distinctive institutional positioning. Documentation and disclosure dimensions including audit reports, attestations, public disclosure variance, embargo arrangements, confidentiality, and coordinated disclosure shape what audit accomplishes. The developing AI audit market involves substantial growth, expanded service offerings, varied pricing, methodology development, standardization efforts, auditor competence development, regulatory drivers, and customer expectations. Limitations including point-in-time nature, bounded scope, methodology limits, auditor capability variance, the audit fallacy, independence challenges, cost-benefit considerations, and uncertain improvement relationship warrant acknowledgment. For operators, the practical work involves audit purpose clarity, auditor selection, documentation preparation, substantive audit response, relationship continuity considerations, multiple audit relationship management, internal audit integration, and disclosure practice. The work of building adequate third-party audit infrastructure across the AI ecosystem is one of the substantive assessment projects the agentic AI era requires.

Related Coverage

Compliance & Conformity | Notified Bodies | Accountability | Red Teaming