AI in M&A document review tooling: a failure analysis of keyword and semantic retrieval

Published8 min read
researchaim&a

Research note · Mergeroom · April 2026

Summary

  • Buyer Q&A in sell-side processes routinely surfaces material contract provisions that keyword search in incumbent virtual data rooms (Datasite, Intralinks, Firmex, iDeals) fails to retrieve.
  • We identify three categories of failure: terminology variance, cross-reference resolution, and absence detection.
  • Vector-based semantic search, shipped as "the AI feature" in most incumbent VDRs over the 2024-2026 cycle, addresses the first category but fails partially on the second and almost entirely on the third which is beyond the capability of vector retrieval.
  • Graph-based retrieval (GraphRAG) has the capability to handle all three. At time of writing, no incumbent VDR has shipped this in production, with the biggest constraint being ontology design.
  • Economic impact is concentrated in $10M-$100M sell-side mandates, where seller-side associate capacity is insufficient to manually compensate for retrieval inadequacy.

1. Problem statement

A typical sell-side mandate generates several hundred buyer Q&A items across an 8-12 week diligence window. Within that volume, the questions that materially affect price (as opposed to disclosure cleanup) cluster into three categories that conventional search handles poorly.

1.1 Terminology variance

The same legal concept is drafted differently across counterparties. "Change of control" in one contract becomes "sale of substantially all assets" in another, "transfer of voting equity interests in excess of 50%" in a third, and a defined term ("Transaction") in capitalised recitals in a fourth. Drafting is intentionally non-standardised. Counterparties draft to their own forms, both for negotiating leverage on individual phrases and to keep judicial interpretation aligned with their preferred reading. Synonym dictionaries are not a workable solution because the variant set is effectively unbounded.

1.2 Cross-reference resolution

Material facts in a deal often emerge only when chains of cross-references resolve correctly across documents. If we consider a representative example: A credit facility's definition of "Permitted Indebtedness" is contained in Schedule 5.2, which references the cap table as of facility close. The cap table has been amended twice since, once for an ESOP expansion and once at the Series C. Whether the buyer's proposed financing trips the indebtedness covenant depends on resolving four documents in correct order. A keyword search returns Schedule 5.2 readily, but it will not signal that Schedule 5.2's operative definition has shifted because of amendments to the document Schedule 5.2 itself references.

1.3 Absence detection

A subset of buyer questions concerns documents that are not present in the data room. "Identify contracts where consent to assignment is required but no signed consent has been provided" defines a target set by absence. A standard semantic search will retrieve documents containing the words "consent" and "assignment". It cannot return the subset of contracts where consent is required but absent, because answering that question requires reasoning about what the data room should contain but does not.

2. Failure mode incidence

Based on internal review of buyer Q&A patterns from publicly filed M&A materials and ongoing testing of the Mergeroom document pipeline, the three categories above account for the majority of buyer Q&A items requiring more than 24-hour seller turnaround. Qualitative breakdown:

Failure categoryFrequencySeverity (impact when missed)
Terminology varianceHighModerate
Cross-reference resolutionModerateHigh
Absence detectionLowerHigh
Other (drafting ambiguity, document quality)ResidualVariable

Frequency and severity move inversely. Terminology-variance misses are common, but most are caught by buyer counsel before closing. The other two categories are rarer, and when undetected they tend to surface in late-stage diligence, where the cost translates directly into price.

3. Capability assessment by retrieval architecture

ArchitectureTerminology varianceCross-referenceAbsenceStatus in incumbent VDRs
Keyword (Boolean)FailsFailsFailsDefault pre-2024
Semantic (vector RAG)AcceptablePartialFailsDefault 2024-2026
Graph-based (GraphRAG)HandlesHandlesHandlesNot shipped

On terminology variance, vector retrieval performs adequately. The embedding for "transfer of voting equity interests" is sufficiently close in vector space to "change of control" that a competent retriever will surface the relevant passage on most queries. Performance deteriorates on cross-reference resolution. Vector embeddings encode semantic similarity between text passages, and the question of whether Schedule 5.2 in the credit agreement and Schedule 5.2 in the side letter refer to the same Schedule 5.2 is a question of document identity, which similarity embeddings do not represent. Absence detection is further removed from the capabilities of vector retrieval. The question "what is missing" has no similarity formulation, and layering an LLM over the retriever does not produce one.

4. Graph-based retrieval

GraphRAG (Microsoft Research, 2024) decomposes a document corpus into typed entities connected by typed relationships, and translates queries into graph traversals rather than text retrievals. The architecture has been deployed in domain-specific applications in healthcare informatics and intelligence analysis. M&A diligence is, to our knowledge, the open application area.

Three query patterns illustrate the difference from search-based retrieval:

  • A change-of-control query traverses each contract to its trigger conditions and matches function rather than phrasing. Provisions that do not contain the literal words "change of control" are identified when their function aligns with the proposed transaction.
  • A definition-resolution query traces the chain of references and amendments and returns the operative definition of a term as of the current date, rather than the most recent document in which the term appears.
  • An absence query asks for contracts that satisfy a positive condition (assignment requires counterparty consent) and lack a corresponding companion node (signed consent document). The query returns the gap directly.

Keyword retrieval fails on all three of these categories. While vector retrieval handles the first reasonably, the second and third is beyond what vector retrieval can express.

5. Why no incumbent has shipped graph retrieval

The binding constraint is domain ontology design. The graph database technology has been production-grade for several years across Neo4j, TigerGraph, and Memgraph. Off-the-shelf entity extraction from spaCy, AWS Comprehend, or generic LLM-based extractors produces flat entity lists that do not map to the way M&A practitioners reason about a deal. The schema has to be designed around the questions that get asked in actual diligence. The work is primarily domain research, and the engineering follows from the schema.

Two factors explain why incumbent VDRs have not pursued this. The first is customer base concentration. Datasite, Intralinks, and the upper tier of Firmex and iDeals are sold predominantly into bulge-bracket and upper-mid-market deal teams, where existing associate capacity compensates for the retrieval inadequacies described in §3. The customer pull for graph-grade retrieval originates in boutique segments where associate capacity is unavailable, and incumbent revenue is not concentrated in those segments. Roadmap investment in enterprise SaaS reflects where the paying customer base is.

The second factor is the configuration of incumbent product organisations. They are built for quarterly enterprise feature shipping and annual contract renewal. The feature requests from their existing customer base concern integrations, compliance certifications, and administrative tooling. Allocating engineering capacity to multi-quarter domain-research work, with no shippable output for an extended period, is not a defensible investment within those constraints.

6. Implications by deal segment

Economic impact is concentrated in deal segments where sell-side associate capacity is insufficient to compensate manually.

Deal segmentTypical seller benchManual compensation viable?
Mega-cap ($1B+)4+ associates per mandateYes
Upper mid-market ($100M-$1B)2-3 associates per mandateMarginal
Lower mid-market ($10M-$100M)0-1 associate plus paralegalNo
Sub-$10MPartner onlyNo

When buyer Q&A volume exceeds seller capacity, we typically observe one of two outcomes. Either the seller delivers incomplete responses and the buyer prices the uncertainty into the indicative offer, or the buyer conducts a parallel review on the seller's documents and the timeline extends by two to four weeks. The first outcome compresses valuation directly. The second compresses valuation more slowly, via the reputational signal that propagates through the buyer universe in concurrent and subsequent processes.

This is the economic case for AI-native diligence tooling at the boutique end of the market. The "AI makes diligence faster" framing used in incumbent marketing understates the actual change. Faster Q&A is incidental. The relevant question is whether deal sizes that previously could not sustain competent diligence support can sustain it now.

Bottom line

Incumbent VDR search architectures are sufficient for low-stakes data sharing of the fundraising-and-file-distribution variety. They are, however, inadequate for diligence-heavy mandates with material cross-document complexity, which we find describes most M&A processes. The architecture gap matters most in the boutique side of the market, where the seller does not have sufficient associate capacity available to compensate for tool inadequacy. Closing the gap is largely a domain-research problem, which makes it poorly suited to the shipping cadences and hiring profiles of incumbent VDR vendors. We expect entrants to define the category.


Mergeroom is building cross-document reasoning for boutique M&A advisors and law firms in the $10M-$100M sell-side segment. Design partner enquiries: contact@mergeroom.ai.