vai logo
vai
Use CasesShared SpaceDocs
Get Started
voyage-law-2

Turn Your Contract Library Into a Searchable Knowledge Base

Semantic search across legal documents, powered by a model trained on legal text
Start the WalkthroughDownload Sample Docs

The Problem

Legal professionals spend 20 to 40% of their time searching for information. Contract review requires cross-referencing clauses across dozens of agreements. Compliance teams must verify that policies align with regulatory requirements, often across hundreds of pages of regulation. Due diligence involves reading rooms full of documents to find specific provisions.

Keyword search fails legal work because legal language is deliberately precise but wildly inconsistent across documents. One contract says "indemnification," another says "hold harmless," a third says "defense and indemnity," all meaning approximately the same thing. A search for any one term misses the others. Semantic search understands the meaning, not just the words.

The Solution

vai turns your contract library into a searchable knowledge base in minutes. Point it at a folder of legal documents, and it handles chunking, embedding with Voyage AI's legal-domain model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "What are our data deletion obligations?" finds answers across your GDPR summary, CCPA policy, privacy policy, and data processing addendum, even when each uses different terminology.

Documents

Your files

Chunk

Split text

Embed

Voyage AI

Index

MongoDB Atlas

Search

Semantic query

Sample Document Set

15 synthetic but realistic documents, ~39KB total. Small enough to process in minutes, rich enough to produce meaningful search results.

Download All (15 files, ~39KB)
FileTopicSize

master-services-agreement.md

MSA template: scope, payment terms, IP provisions~4KB

saas-subscription-agreement.md

SaaS terms: uptime SLA, data handling, renewal/termination~3KB

data-processing-addendum.md

DPA with GDPR and CCPA provisions, sub-processor obligations~3KB

nda-mutual.md

Mutual NDA: definition of confidential info, exclusions, term~2KB

nda-unilateral.md

One-way NDA: receiving party obligations, return/destruction~2KB

employment-agreement.md

Employment terms: compensation, benefits, non-compete~3KB

independent-contractor.md

Contractor agreement: deliverables, IP assignment, indemnification~3KB

privacy-policy.md

Company privacy policy: data collection, retention, user rights~3KB

acceptable-use-policy.md

AUP for SaaS product: prohibited uses, enforcement, liability caps~2KB

ip-assignment-agreement.md

IP assignment: work product, prior inventions, moral rights~2KB

gdpr-compliance-summary.md

GDPR requirements: lawful basis, data subject rights, DPO~3KB

ccpa-compliance-summary.md

CCPA requirements: consumer rights, opt-out, service providers~3KB

soc2-policy-overview.md

SOC 2 Trust Services Criteria: security, availability, confidentiality~2KB

limitation-of-liability.md

Analysis of liability cap patterns across contract types~2KB

force-majeure-clauses.md

Force majeure provisions: triggering events, notice, remedies~2KB

Walkthrough

From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.

1
Step 1

Install vai

Install the vai CLI globally. If you already have it, skip to the next step.

$npm install -g voyageai-cli
added 1 package in 3s

1 package is looking for funding
  run `npm fund` for details
2
Step 2

Configure credentials

Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.

$vai config set api-key YOUR_VOYAGE_API_KEY
$vai config set mongodb-uri YOUR_MONGODB_URI
✓ api-key saved
✓ mongodb-uri saved

Your credentials are stored locally in ~/.vai/config.json and never shared.

3
Step 3

Download the sample documents

Grab the 15-file sample legal document set. These are synthetic but realistic contracts, policies, and regulatory summaries covering a fictional company's legal library.

$curl -L https://vaicli.com/use-cases/legal/sample-docs/sample-docs.zip -o sample-docs.zip
$unzip sample-docs.zip -d ./sample-docs
Archive:  sample-docs.zip
  inflating: ./sample-docs/master-services-agreement.md
  inflating: ./sample-docs/saas-subscription-agreement.md
  ...
  inflating: ./sample-docs/force-majeure-clauses.md
  15 files extracted
4
Step 4

Ingest and embed the documents

Run the vai pipeline to chunk, embed, and index all 15 documents. This uses voyage-law-2, a model specifically trained on legal text, and creates a vector search index in MongoDB Atlas.

$vai pipeline ./sample-docs/ --model voyage-law-2 --db legal_demo --collection legal_knowledge --create-index
◼ Scanning ./sample-docs/ ...
  Found 15 files (39KB total)

◼ Chunking documents ...
  Created 142 chunks (avg 274 chars)

◼ Embedding with voyage-law-2 ...
  ████████████████████████████████ 142/142 chunks
  Embedded in 2.8s (51 chunks/sec)

◼ Storing in MongoDB Atlas ...
  Database: legal_demo
  Collection: legal_knowledge
  Inserted 142 documents

◼ Creating vector search index ...
  Index "vector_index" created on field "embedding"
  Dimensions: 1024 | Similarity: cosine

✓ Pipeline complete — 15 files → 142 indexed chunks
5
Step 5

Run your first search

Test the knowledge base with a query that spans multiple documents. Notice how the legal-domain model finds relevant clauses even when the terminology differs.

$vai search "What are our obligations if a customer requests deletion of their data?" --db legal_demo --collection legal_knowledge
Query: "What are our obligations if a customer requests deletion of their data?"
Model: voyage-law-2 | Results: 5

1. gdpr-compliance-summary.md (score: 0.95)
   "Right to Erasure (Article 17): Data subjects have the right to
    obtain from the controller the erasure of personal data without
    undue delay. The controller shall erase personal data within
    30 days of receiving a verified request..."

2. ccpa-compliance-summary.md (score: 0.91)
   "Right to Delete (Section 1798.105): A consumer shall have the
    right to request that a business delete any personal information
    about the consumer which the business has collected..."

3. data-processing-addendum.md (score: 0.88)
   "Data Deletion: Upon termination of the Agreement or upon
    Controller's written request, Processor shall delete all
    Personal Data processed on behalf of the Controller..."
6
Step 6

Try cross-document queries

Run queries that require understanding legal concepts across different document types. This is where semantic search shines over keyword search.

$vai search "Compare the indemnification provisions across our contracts" --db legal_demo --collection legal_knowledge
Query: "Compare the indemnification provisions across our contracts"
Model: voyage-law-2 | Results: 5

1. independent-contractor.md (score: 0.93)
   "Indemnification: Contractor shall indemnify, defend, and hold
    harmless Company from any claims, damages, or expenses arising
    from Contractor's breach of this Agreement or negligence..."

2. master-services-agreement.md (score: 0.90)
   "Mutual Indemnification: Each party shall indemnify the other
    against third-party claims arising from (a) breach of
    representations, (b) willful misconduct, or (c) violation
    of applicable law..."

3. saas-subscription-agreement.md (score: 0.85)
   "Provider Indemnification: Provider shall defend Customer against
    any claim that the Service infringes a third party's intellectual
    property rights..."
7
Step 7

Explore in the playground

Launch the vai playground for a visual interface. Browse your indexed legal documents, run queries interactively, and compare how different models handle legal terminology.

$vai playground
◼ Starting vai playground ...
  Server running at http://localhost:1958

  Open your browser to explore:
  • Search your knowledge base
  • Compare embedding models
  • Visualize similarity scores

Try comparing voyage-law-2 results with voyage-4-large on the same legal query to see how the domain-specific model captures legal semantics.

Example Queries

See how semantic search handles real questions. Click a query to see the results.

What are our obligations if a customer requests deletion of their data?

Spans four documents: GDPR summary, CCPA summary, privacy policy, and DPA. Tests cross-document retrieval on the same legal concept expressed differently in each.

gdpr-compliance-summary.md

95% match

Right to Erasure (Article 17): Data subjects have the right to obtain from the controller the erasure of personal data without undue delay. The controller shall erase personal data within 30 days of receiving a verified request.

ccpa-compliance-summary.md

91% match

Right to Delete (Section 1798.105): A consumer shall have the right to request that a business delete any personal information about the consumer which the business has collected.

data-processing-addendum.md

88% match

Upon termination of the Agreement or upon Controller's written request, Processor shall delete all Personal Data processed on behalf of the Controller within 30 calendar days.

Compare the indemnification provisions across our contracts

Tests retrieval across MSA, contractor agreement, and SaaS agreement. Each uses slightly different indemnification language ("hold harmless," "defend and indemnify," "mutual indemnification").

What happens if we cannot meet the SLA due to a natural disaster?

Tests the intersection of force majeure provisions and SLA commitments across the SaaS agreement and force majeure clauses document.

Do our NDAs allow sharing confidential information with sub-processors?

Tests NDA exception clauses against DPA sub-processor provisions. A nuanced legal question requiring cross-document reasoning.

What non-compete restrictions apply to former employees?

Tests precise retrieval from the employment agreement, specifically the restrictive covenants section.

Try the Knowledge Base Live

This is a real chatbot powered by the 15 legal sample docs you just explored. Ask it about contracts, GDPR compliance, indemnification clauses, NDAs, or any of the legal documentation.

Why This Model?

ModelRelevanceNotes

voyage-law-2

Recommended

95%

Purpose-built for legal text. Best at distinguishing legal terms that have different meanings in general English ("consideration," "party," "instrument").

voyage-4-large

87%

Strong general-purpose model. Handles straightforward legal queries well, but misses nuance in cross-referencing clauses and legal term disambiguation.

voyage-4-lite

78%

Fast and cost-effective. Adequate for simple keyword-like queries, but struggles with the semantic precision legal search demands.

For legal documents, voyage-law-2 consistently outperforms general-purpose models on queries that require understanding legal-specific semantics. The difference is most pronounced on queries like "Compare indemnification provisions" where the model needs to recognize that "hold harmless," "defend and indemnify," and "mutual indemnification" all refer to the same legal concept. For simple factual retrieval, the gap narrows, but the domain model is the clear choice for any serious legal search application.

Scaling to Production

You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.

Privilege and confidentiality

Documents stay in your MongoDB Atlas cluster. Text is sent to Voyage AI for embedding (see their data handling policy). The resulting vectors do not contain readable text, but the stored chunks in MongoDB do. Plan your access controls accordingly.

Contract volume

A mid-size company might have 500 to 5,000 contracts. At this scale, initial embedding costs are modest with voyage-law-2, and queries cost fractions of a cent. Use vai estimate to project costs for your corpus size.

Metadata filtering

Legal search often needs filters by contract type, counterparty, or date range. vai supports metadata filters on search, so you can narrow results to "only NDAs signed in the last 2 years" before semantic ranking applies.

Keeping documents current

Contracts get amended, policies get updated. Re-run vai pipeline on updated files and it will re-chunk, re-embed, and update only the changed documents. Automate this as part of your document management workflow.

Conversational interface

The natural next step is vai chat: a compliance officer asking "Do we have any contracts expiring in the next 90 days with auto-renewal clauses?" and getting answers grounded in actual contract text.

Ready to build your knowledge base?

Install vai and go from documents to searchable knowledge in minutes.

$ npm install -g voyageai-cli

Download Desktop AppStar on GitHub

Explore other use cases: Healthcare, Legal, Finance, and more