voyage-finance-2

Semantic Search Across Financial Documents, In Minutes

Earnings calls, risk reports, and policy docs, searchable with a model trained on financial text

Start the Walkthrough Download Sample Docs

The Problem

Financial analysis requires synthesizing information across dozens of documents: earnings call transcripts, 10-K filings, risk committee reports, internal policy memos, and market research. An analyst asking "What did management say about margin pressure?" needs to find the relevant passage across hundreds of pages of transcripts, and keyword search returns far too many results for "margin" alone.

The financial domain also has its own vocabulary challenges. "Headwinds" means challenges. "Color" means additional detail. "Constructive" means cautiously optimistic. A semantic search system trained on financial text understands these conventions; a generic one doesn't.

The Solution

vai turns your financial document library into a searchable knowledge base in minutes. Point it at a folder of earnings transcripts, risk reports, and policy documents, and it handles chunking, embedding with Voyage AI's finance-domain model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "What are our biggest risk exposures?" finds answers across risk committee reports, market risk frameworks, and credit policies.

Documents

Your files

Chunk

Split text

Embed

Voyage AI

Index

MongoDB Atlas

Semantic query

Sample Document Set

15 synthetic but realistic documents, ~39KB total. Small enough to process in minutes, rich enough to produce meaningful search results.

Download All (15 files, ~39KB)

File	Topic	Size
q3-2025-earnings-call.md	Acme Corp Q3 2025 earnings call: revenue beat, margin pressure	~4KB
q4-2025-earnings-call.md	Q4 2025 earnings call: full-year results, 2026 guidance	~4KB
q3-2025-10q-summary.md	10-Q highlights: revenue breakdown, operating expenses, risk factors	~3KB
annual-report-summary.md	Annual report executive summary: strategy, markets, competition	~3KB
risk-committee-report.md	Risk committee quarterly report: credit, market, operational risk	~3KB
credit-policy.md	Corporate credit policy: approval tiers, concentration limits	~3KB
market-risk-framework.md	Market risk management: VaR methodology, stress testing	~2KB
interest-rate-analysis.md	Interest rate sensitivity: duration gaps, hedging strategy	~2KB
liquidity-policy.md	Liquidity management: reserve requirements, stress scenarios	~2KB
compliance-aml-summary.md	AML/KYC compliance: CDD requirements, SAR filing triggers	~3KB
vendor-risk-assessment.md	Third-party vendor risk: tiering, due diligence framework	~2KB
capital-allocation-memo.md	Capital allocation strategy: dividends, buybacks, M&A criteria	~2KB
esg-report-summary.md	ESG report: carbon targets, diversity metrics, governance	~2KB
fintech-partnership-memo.md	Strategic memo: fintech partnership, embedded finance, API strategy	~2KB
regulatory-change-tracker.md	Regulatory changes: Basel IV, DORA, SEC climate disclosure	~2KB

Walkthrough

From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.

Step 1

Install vai

Install the vai CLI globally. If you already have it, skip to the next step.

$npm install -g voyageai-cli

added 1 package in 3s

1 package is looking for funding
  run `npm fund` for details

Step 2

Configure credentials

Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.

$vai config set api-key YOUR_VOYAGE_API_KEY

$vai config set mongodb-uri YOUR_MONGODB_URI

✓ api-key saved
✓ mongodb-uri saved

Your credentials are stored locally in ~/.vai/config.json and never shared.

Step 3

Download the sample documents

Grab the 15-file sample financial document set. These are synthetic but realistic documents for a fictional public company (Acme Corp), including earnings calls, risk reports, and policy memos.

$curl -L https://vaicli.com/use-cases/finance/sample-docs/sample-docs.zip -o sample-docs.zip

$unzip sample-docs.zip -d ./sample-docs

Archive:  sample-docs.zip
  inflating: ./sample-docs/q3-2025-earnings-call.md
  inflating: ./sample-docs/q4-2025-earnings-call.md
  ...
  inflating: ./sample-docs/regulatory-change-tracker.md
  15 files extracted

Step 4

Ingest and embed the documents

Run the vai pipeline to chunk, embed, and index all 15 documents. This uses voyage-finance-2, a model trained specifically on financial text, and creates a vector search index in MongoDB Atlas.

$vai pipeline ./sample-docs/ --model voyage-finance-2 --db finance_demo --collection financial_knowledge --create-index

◼ Scanning ./sample-docs/ ...
  Found 15 files (39KB total)

◼ Chunking documents ...
  Created 156 chunks (avg 250 chars)

◼ Embedding with voyage-finance-2 ...
  ████████████████████████████████ 156/156 chunks
  Embedded in 3.1s (50 chunks/sec)

◼ Storing in MongoDB Atlas ...
  Database: finance_demo
  Collection: financial_knowledge
  Inserted 156 documents

◼ Creating vector search index ...
  Index "vector_index" created on field "embedding"
  Dimensions: 1024 | Similarity: cosine

✓ Pipeline complete — 15 files → 156 indexed chunks

Step 5

Run your first search

Test the knowledge base with a query that uses financial language. Notice how the finance-domain model understands terms like "margin compression" and "headwinds."

$vai search "What did management say about margin compression and how are they addressing it?" --db finance_demo --collection financial_knowledge

Query: "What did management say about margin compression and how are they addressing it?"
Model: voyage-finance-2 | Results: 5

1. q3-2025-earnings-call.md (score: 0.94)
   "On margins, we saw about 80 basis points of compression this
    quarter, primarily driven by input cost headwinds and the
    product mix shift toward our enterprise tier. We expect to
    recover roughly half of that through pricing actions in Q1..."

2. q4-2025-earnings-call.md (score: 0.89)
   "For the full year, gross margins came in at 62.3%, down from
    64.1% in the prior year. The team has executed well on our
    cost optimization program, and we're guiding to margin
    expansion in the back half of 2026..."

3. annual-report-summary.md (score: 0.82)
   "Profitability: Gross margin declined 180 basis points
    year-over-year due to elevated cloud infrastructure costs
    and competitive pricing pressure in the mid-market segment..."

Step 6

Try cross-document queries

Run queries that span risk reports, policy documents, and earnings calls. This is where semantic search on financial text is most powerful.

$vai search "What are our biggest risk exposures right now?" --db finance_demo --collection financial_knowledge

Query: "What are our biggest risk exposures right now?"
Model: voyage-finance-2 | Results: 5

1. risk-committee-report.md (score: 0.93)
   "Top Risks This Quarter: (1) Interest rate volatility and its
    impact on our fixed-income portfolio, (2) Concentration risk
    in our top-5 enterprise accounts representing 34% of ARR,
    (3) Regulatory uncertainty around pending SEC climate rules..."

2. market-risk-framework.md (score: 0.87)
   "Current VaR at the 99th percentile stands at $4.2M, up from
    $3.8M last quarter. The increase is primarily attributable to
    heightened equity market volatility and widening credit spreads
    in our corporate bond holdings..."

3. credit-policy.md (score: 0.81)
   "Concentration Limits: No single counterparty shall represent
    more than 10% of total credit exposure. The current top
    exposure is 8.7%, approaching the threshold..."

Step 7

Explore in the playground

Launch the vai playground for a visual interface. Browse your indexed financial documents, run queries interactively, and use vai estimate to project costs for your actual document volume.

$vai playground

◼ Starting vai playground ...
  Server running at http://localhost:1958

  Open your browser to explore:
  • Search your knowledge base
  • Compare embedding models
  • Visualize similarity scores

Try vai estimate to see what it would cost to embed your full document library. Finance audiences care about unit economics.

Example Queries

See how semantic search handles real questions. Click a query to see the results.

“What did management say about margin compression and how are they addressing it?”

Tests earnings call retrieval with nuanced financial language. "Margin compression," "headwinds," and "pricing actions" are financial idioms the domain model understands.

q3-2025-earnings-call.md

94% match

“On margins, we saw about 80 basis points of compression this quarter, primarily driven by input cost headwinds and the product mix shift toward our enterprise tier.”

q4-2025-earnings-call.md

89% match

“For the full year, gross margins came in at 62.3%, down from 64.1% in the prior year. The team has executed well on our cost optimization program.”

“What are our biggest risk exposures right now?”

Spans the risk committee report, market risk framework, and credit policy. Tests the model's ability to connect "risk exposure" across different document contexts.

“How are we preparing for upcoming regulatory changes?”

Tests the regulatory change tracker and compliance documents. Financial firms face constant regulatory evolution.

“What's the capital return strategy for next year?”

Tests capital allocation memo and earnings call guidance. Financial analysts frequently ask about capital return plans.

“Do we have concentration risk in our vendor relationships?”

Tests the vendor risk assessment against credit policy. A practical compliance question that spans two policy documents.

Try the Knowledge Base Live

This is a real chatbot powered by the 15 financial sample docs you just explored. Ask it about earnings calls, risk management, capital allocation, or regulatory compliance.

Why This Model?

Model	Relevance	Notes
voyage-finance-2 Recommended	94%	Purpose-built for financial text. Best at understanding financial jargon, earnings call conventions, and risk terminology.
voyage-4-large	86%	Strong general-purpose model. Handles straightforward financial queries well, but misses nuance in domain-specific terminology like "headwinds" and "run-rate."
voyage-4-lite	77%	Fastest and cheapest. Adequate for simple lookups, but lacks the financial vocabulary understanding needed for analyst-quality retrieval.

For financial documents, voyage-finance-2 provides measurably better retrieval on queries that use domain-specific language. The difference is most apparent on earnings call queries where terms like "headwinds," "constructive," and "run-rate" carry specific financial meaning that general models may not capture. The cost difference between voyage-finance-2 and voyage-4-large is marginal for most document volumes, making the domain model the clear choice for financial applications.

Scaling to Production

You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.

Data sensitivity

Financial documents often contain material non-public information (MNPI). Embedding vectors sent to Voyage AI do not contain readable text, but the text chunks stored in MongoDB do. Ensure your Atlas cluster meets your compliance requirements.

Scale projections

An investment firm might ingest 10,000+ documents (transcripts, filings, research notes). At this scale, vai estimate shows embedding costs remain low with voyage-finance-2. Asymmetric retrieval (short queries against long documents) further reduces per-query costs.

Metadata filtering

Financial search often needs filters by date, company, or document type. vai supports metadata filters, so you can narrow results to "only Q3 2025 documents" or "only risk reports" before semantic ranking applies.

Real-time ingestion

Earnings calls and filings arrive on a schedule. vai pipeline can be automated as part of an ingestion workflow, embedding new documents as they land and making them searchable within minutes.

Conversational interface

The natural next step is vai chat: an analyst asking "Summarize what management said about cloud spending across the last four quarters" and getting answers grounded in actual transcript text.

Ready to build your knowledge base?

Install vai and go from documents to searchable knowledge in minutes.

$ npm install -g voyageai-cli

Download Desktop App Star on GitHub

Explore other use cases: Healthcare, Legal, Finance, and more

💬 Ask about Finance Docs

Ask anything about the developer documentation:

What did management say about margin pressure?

What are our biggest risk exposures?

How are we preparing for regulatory changes?

What's the capital return strategy?

Do we have vendor concentration risk?