Financial analysis requires synthesizing information across dozens of documents: earnings call transcripts, 10-K filings, risk committee reports, internal policy memos, and market research. An analyst asking "What did management say about margin pressure?" needs to find the relevant passage across hundreds of pages of transcripts, and keyword search returns far too many results for "margin" alone.
The financial domain also has its own vocabulary challenges. "Headwinds" means challenges. "Color" means additional detail. "Constructive" means cautiously optimistic. A semantic search system trained on financial text understands these conventions; a generic one doesn't.
vai turns your financial document library into a searchable knowledge base in minutes. Point it at a folder of earnings transcripts, risk reports, and policy documents, and it handles chunking, embedding with Voyage AI's finance-domain model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "What are our biggest risk exposures?" finds answers across risk committee reports, market risk frameworks, and credit policies.
Documents
Your files
Chunk
Split text
Embed
Voyage AI
Index
MongoDB Atlas
Search
Semantic query
15 synthetic but realistic documents, ~39KB total. Small enough to process in minutes, rich enough to produce meaningful search results.
Download All (15 files, ~39KB)| File | Topic | Size |
|---|---|---|
q3-2025-earnings-call.md | Acme Corp Q3 2025 earnings call: revenue beat, margin pressure | ~4KB |
q4-2025-earnings-call.md | Q4 2025 earnings call: full-year results, 2026 guidance | ~4KB |
q3-2025-10q-summary.md | 10-Q highlights: revenue breakdown, operating expenses, risk factors | ~3KB |
annual-report-summary.md | Annual report executive summary: strategy, markets, competition | ~3KB |
risk-committee-report.md | Risk committee quarterly report: credit, market, operational risk | ~3KB |
credit-policy.md | Corporate credit policy: approval tiers, concentration limits | ~3KB |
market-risk-framework.md | Market risk management: VaR methodology, stress testing | ~2KB |
interest-rate-analysis.md | Interest rate sensitivity: duration gaps, hedging strategy | ~2KB |
liquidity-policy.md | Liquidity management: reserve requirements, stress scenarios | ~2KB |
compliance-aml-summary.md | AML/KYC compliance: CDD requirements, SAR filing triggers | ~3KB |
vendor-risk-assessment.md | Third-party vendor risk: tiering, due diligence framework | ~2KB |
capital-allocation-memo.md | Capital allocation strategy: dividends, buybacks, M&A criteria | ~2KB |
esg-report-summary.md | ESG report: carbon targets, diversity metrics, governance | ~2KB |
fintech-partnership-memo.md | Strategic memo: fintech partnership, embedded finance, API strategy | ~2KB |
regulatory-change-tracker.md | Regulatory changes: Basel IV, DORA, SEC climate disclosure | ~2KB |
From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.
Install vai
Install the vai CLI globally. If you already have it, skip to the next step.
added 1 package in 3s 1 package is looking for funding run `npm fund` for details
Configure credentials
Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.
✓ api-key saved ✓ mongodb-uri saved
Your credentials are stored locally in ~/.vai/config.json and never shared.
Download the sample documents
Grab the 15-file sample financial document set. These are synthetic but realistic documents for a fictional public company (Acme Corp), including earnings calls, risk reports, and policy memos.
Archive: sample-docs.zip inflating: ./sample-docs/q3-2025-earnings-call.md inflating: ./sample-docs/q4-2025-earnings-call.md ... inflating: ./sample-docs/regulatory-change-tracker.md 15 files extracted
Ingest and embed the documents
Run the vai pipeline to chunk, embed, and index all 15 documents. This uses voyage-finance-2, a model trained specifically on financial text, and creates a vector search index in MongoDB Atlas.
◼ Scanning ./sample-docs/ ... Found 15 files (39KB total) ◼ Chunking documents ... Created 156 chunks (avg 250 chars) ◼ Embedding with voyage-finance-2 ... ████████████████████████████████ 156/156 chunks Embedded in 3.1s (50 chunks/sec) ◼ Storing in MongoDB Atlas ... Database: finance_demo Collection: financial_knowledge Inserted 156 documents ◼ Creating vector search index ... Index "vector_index" created on field "embedding" Dimensions: 1024 | Similarity: cosine ✓ Pipeline complete — 15 files → 156 indexed chunks
Run your first search
Test the knowledge base with a query that uses financial language. Notice how the finance-domain model understands terms like "margin compression" and "headwinds."
Query: "What did management say about margin compression and how are they addressing it?"
Model: voyage-finance-2 | Results: 5
1. q3-2025-earnings-call.md (score: 0.94)
"On margins, we saw about 80 basis points of compression this
quarter, primarily driven by input cost headwinds and the
product mix shift toward our enterprise tier. We expect to
recover roughly half of that through pricing actions in Q1..."
2. q4-2025-earnings-call.md (score: 0.89)
"For the full year, gross margins came in at 62.3%, down from
64.1% in the prior year. The team has executed well on our
cost optimization program, and we're guiding to margin
expansion in the back half of 2026..."
3. annual-report-summary.md (score: 0.82)
"Profitability: Gross margin declined 180 basis points
year-over-year due to elevated cloud infrastructure costs
and competitive pricing pressure in the mid-market segment..."Try cross-document queries
Run queries that span risk reports, policy documents, and earnings calls. This is where semantic search on financial text is most powerful.
Query: "What are our biggest risk exposures right now?"
Model: voyage-finance-2 | Results: 5
1. risk-committee-report.md (score: 0.93)
"Top Risks This Quarter: (1) Interest rate volatility and its
impact on our fixed-income portfolio, (2) Concentration risk
in our top-5 enterprise accounts representing 34% of ARR,
(3) Regulatory uncertainty around pending SEC climate rules..."
2. market-risk-framework.md (score: 0.87)
"Current VaR at the 99th percentile stands at $4.2M, up from
$3.8M last quarter. The increase is primarily attributable to
heightened equity market volatility and widening credit spreads
in our corporate bond holdings..."
3. credit-policy.md (score: 0.81)
"Concentration Limits: No single counterparty shall represent
more than 10% of total credit exposure. The current top
exposure is 8.7%, approaching the threshold..."Explore in the playground
Launch the vai playground for a visual interface. Browse your indexed financial documents, run queries interactively, and use vai estimate to project costs for your actual document volume.
◼ Starting vai playground ... Server running at http://localhost:1958 Open your browser to explore: • Search your knowledge base • Compare embedding models • Visualize similarity scores
Try vai estimate to see what it would cost to embed your full document library. Finance audiences care about unit economics.
See how semantic search handles real questions. Click a query to see the results.
“What did management say about margin compression and how are they addressing it?”
Tests earnings call retrieval with nuanced financial language. "Margin compression," "headwinds," and "pricing actions" are financial idioms the domain model understands.
q3-2025-earnings-call.md
“On margins, we saw about 80 basis points of compression this quarter, primarily driven by input cost headwinds and the product mix shift toward our enterprise tier.”
q4-2025-earnings-call.md
“For the full year, gross margins came in at 62.3%, down from 64.1% in the prior year. The team has executed well on our cost optimization program.”
“What are our biggest risk exposures right now?”
Spans the risk committee report, market risk framework, and credit policy. Tests the model's ability to connect "risk exposure" across different document contexts.
“How are we preparing for upcoming regulatory changes?”
Tests the regulatory change tracker and compliance documents. Financial firms face constant regulatory evolution.
“What's the capital return strategy for next year?”
Tests capital allocation memo and earnings call guidance. Financial analysts frequently ask about capital return plans.
“Do we have concentration risk in our vendor relationships?”
Tests the vendor risk assessment against credit policy. A practical compliance question that spans two policy documents.
This is a real chatbot powered by the 15 financial sample docs you just explored. Ask it about earnings calls, risk management, capital allocation, or regulatory compliance.
| Model | Relevance | Notes |
|---|---|---|
voyage-finance-2 Recommended | 94% | Purpose-built for financial text. Best at understanding financial jargon, earnings call conventions, and risk terminology. |
voyage-4-large | 86% | Strong general-purpose model. Handles straightforward financial queries well, but misses nuance in domain-specific terminology like "headwinds" and "run-rate." |
voyage-4-lite | 77% | Fastest and cheapest. Adequate for simple lookups, but lacks the financial vocabulary understanding needed for analyst-quality retrieval. |
For financial documents, voyage-finance-2 provides measurably better retrieval on queries that use domain-specific language. The difference is most apparent on earnings call queries where terms like "headwinds," "constructive," and "run-rate" carry specific financial meaning that general models may not capture. The cost difference between voyage-finance-2 and voyage-4-large is marginal for most document volumes, making the domain model the clear choice for financial applications.
You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.
Data sensitivity
Financial documents often contain material non-public information (MNPI). Embedding vectors sent to Voyage AI do not contain readable text, but the text chunks stored in MongoDB do. Ensure your Atlas cluster meets your compliance requirements.
Scale projections
An investment firm might ingest 10,000+ documents (transcripts, filings, research notes). At this scale, vai estimate shows embedding costs remain low with voyage-finance-2. Asymmetric retrieval (short queries against long documents) further reduces per-query costs.
Metadata filtering
Financial search often needs filters by date, company, or document type. vai supports metadata filters, so you can narrow results to "only Q3 2025 documents" or "only risk reports" before semantic ranking applies.
Real-time ingestion
Earnings calls and filings arrive on a schedule. vai pipeline can be automated as part of an ingestion workflow, embedding new documents as they land and making them searchable within minutes.
Conversational interface
The natural next step is vai chat: an analyst asking "Summarize what management said about cloud spending across the last four quarters" and getting answers grounded in actual transcript text.
Install vai and go from documents to searchable knowledge in minutes.
$ npm install -g voyageai-cli
Explore other use cases: Healthcare, Legal, Finance, and more