voyage-4-large

Build a Clinical Knowledge Base in 20 Minutes

From clinical guidelines to searchable AI, using your own infrastructure

Start the Walkthrough Download Sample Docs

The Problem

Healthcare teams drown in clinical documentation. Treatment guidelines update quarterly. Drug interaction databases span thousands of pages. Internal protocols live in scattered wikis, PDFs, and shared drives. When a clinician needs an answer, "What's the recommended first-line treatment for Type 2 diabetes with renal impairment?", they search through multiple systems, often settling for whatever Google returns rather than their organization's own vetted guidelines.

Standard search tools fail here because clinical questions are semantic, not keyword-based. A search for "diabetes kidney treatment" needs to find documents about "glycemic management in chronic kidney disease": same concept, completely different words. This is exactly what embedding-based semantic search solves.

The Solution

vai turns your clinical documentation into a searchable knowledge base in minutes. Point it at a folder of guidelines, drug references, and care protocols, and it handles chunking, embedding with Voyage AI's highest-accuracy model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "What medications should I avoid in a patient with kidney problems?" finds answers across your metformin reference, CKD staging guide, and ACE inhibitor docs, even when each uses different terminology.

Documents

Your files

Chunk

Split text

Embed

Voyage AI

Index

MongoDB Atlas

Semantic query

Sample Document Set

15 synthetic but realistic documents, ~34KB total. Small enough to process in minutes, rich enough to produce meaningful search results.

Download All (15 files, ~34KB)

File	Topic	Size
diabetes-management.md	Type 2 diabetes treatment guidelines, HbA1c targets, medication ladder	~3KB
diabetes-renal.md	Glycemic management in patients with CKD stages 3 to 5	~2KB
metformin-reference.md	Metformin prescribing information, contraindications, renal dosing	~2KB
sglt2-inhibitors.md	SGLT2 inhibitor class overview, cardiovascular and renal benefits	~2KB
hypertension-guidelines.md	Blood pressure targets, first-line agents, resistant hypertension	~3KB
ace-inhibitor-reference.md	ACE inhibitor prescribing, renal protective effects, monitoring	~2KB
heart-failure-protocol.md	HFrEF and HFpEF management, GDMT optimization	~3KB
anticoagulation-guide.md	Anticoagulation selection, DOAC vs warfarin, bridging protocols	~3KB
sepsis-bundle.md	Sepsis recognition, hour-1 bundle, lactate-guided resuscitation	~2KB
pain-management.md	Acute and chronic pain protocols, opioid stewardship, multimodal approach	~2KB
drug-interactions-cardiac.md	Common drug interactions in cardiac patients, QTc prolongation risks	~2KB
ckd-staging.md	Chronic kidney disease staging, eGFR calculation, referral criteria	~2KB
insulin-protocols.md	Basal-bolus insulin, sliding scale, transition from IV to subcutaneous	~2KB
discharge-checklist.md	Hospital discharge protocol, medication reconciliation, follow-up	~2KB
falls-prevention.md	Fall risk assessment, prevention interventions, post-fall protocol	~2KB

Walkthrough

From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.

Step 1

Install vai

Install the vai CLI globally. If you already have it, skip to the next step.

$npm install -g voyageai-cli

added 1 package in 3s

1 package is looking for funding
  run `npm fund` for details

Step 2

Configure credentials

Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.

$vai config set api-key YOUR_VOYAGE_API_KEY

$vai config set mongodb-uri YOUR_MONGODB_URI

✓ api-key saved
✓ mongodb-uri saved

Your credentials are stored locally in ~/.vai/config.json and never shared.

Step 3

Download the sample documents

Grab the 15-file sample clinical document set. These are synthetic but realistic treatment guidelines, drug references, and care protocols for a fictional hospital system.

$curl -L https://vaicli.com/use-cases/healthcare/sample-docs/sample-docs.zip -o sample-docs.zip

$unzip sample-docs.zip -d ./sample-docs

Archive:  sample-docs.zip
  inflating: ./sample-docs/diabetes-management.md
  inflating: ./sample-docs/diabetes-renal.md
  ...
  inflating: ./sample-docs/falls-prevention.md
  15 files extracted

Step 4

Ingest and embed the documents

Run the vai pipeline to chunk, embed, and index all 15 documents. This uses voyage-4-large, the highest-accuracy general-purpose model, and creates a vector search index in MongoDB Atlas.

$vai pipeline ./sample-docs/ --model voyage-4-large --db healthcare_demo --collection clinical_knowledge --create-index

◼ Scanning ./sample-docs/ ...
  Found 15 files (34KB total)

◼ Chunking documents ...
  Created 118 chunks (avg 288 chars)

◼ Embedding with voyage-4-large ...
  ████████████████████████████████ 118/118 chunks
  Embedded in 2.1s (56 chunks/sec)

◼ Storing in MongoDB Atlas ...
  Database: healthcare_demo
  Collection: clinical_knowledge
  Inserted 118 documents

◼ Creating vector search index ...
  Index "vector_index" created on field "embedding"
  Dimensions: 1024 | Similarity: cosine

✓ Pipeline complete — 15 files → 118 indexed chunks

Step 5

Run your first clinical search

Test the knowledge base with a query that spans multiple clinical documents. Notice how semantic search finds relevant information even when the query uses different terminology than the source documents.

$vai search "What medications should I avoid in a patient with kidney problems?" --db healthcare_demo --collection clinical_knowledge

Query: "What medications should I avoid in a patient with kidney problems?"
Model: voyage-4-large | Results: 5

1. metformin-reference.md (score: 0.94)
   "Contraindications: Metformin is contraindicated in patients with
    an eGFR below 30 mL/min/1.73m2. For patients with eGFR 30-45,
    initiation is not recommended but continuation at reduced dose
    may be considered with close monitoring..."

2. ckd-staging.md (score: 0.91)
   "Medication Adjustment by CKD Stage: Multiple medications require
    dose adjustment or discontinuation as renal function declines.
    Review all medications at each stage transition..."

3. ace-inhibitor-reference.md (score: 0.87)
   "Renal Monitoring: Check serum creatinine and potassium within
    1-2 weeks of initiation or dose increase. A rise in creatinine
    of up to 30% is acceptable and expected..."

Step 6

Try cross-document clinical queries

Run queries that require understanding medical concepts across different document types. This is where semantic search delivers the most value over traditional keyword search.

$vai search "How do I manage blood sugar in someone who cannot take metformin?" --db healthcare_demo --collection clinical_knowledge

Query: "How do I manage blood sugar in someone who cannot take metformin?"
Model: voyage-4-large | Results: 5

1. diabetes-management.md (score: 0.93)
   "Second-Line Agents: When metformin is contraindicated or not
    tolerated, consider SGLT2 inhibitors (preferred if cardiovascular
    or renal comorbidity) or GLP-1 receptor agonists..."

2. diabetes-renal.md (score: 0.90)
   "Glycemic Management in CKD: For patients with eGFR below 30
    where metformin is contraindicated, SGLT2 inhibitors with
    demonstrated renal benefit (dapagliflozin, empagliflozin)
    are preferred..."

3. sglt2-inhibitors.md (score: 0.86)
   "SGLT2 inhibitors have demonstrated cardiovascular and renal
    benefits independent of their glucose-lowering effect.
    Recommended as first-line add-on or metformin alternative..."

Step 7

Explore in the playground

Launch the vai playground for a visual interface. Browse your indexed clinical documents, run queries interactively, and compare how different models handle clinical terminology.

$vai playground

◼ Starting vai playground ...
  Server running at http://localhost:1958

  Open your browser to explore:
  • Search your knowledge base
  • Compare embedding models
  • Visualize similarity scores

Try comparing voyage-4-large results with voyage-4-lite on the same clinical query to see how model quality affects retrieval accuracy for medical terminology.

Example Queries

See how semantic search handles real questions. Click a query to see the results.

“What medications should I avoid in a patient with kidney problems?”

Tests cross-document retrieval spanning the metformin reference, CKD staging guide, and ACE inhibitor docs. The query uses "kidney problems" while the documents use "renal impairment," "CKD," and "eGFR."

metformin-reference.md

94% match

“Contraindications: Metformin is contraindicated in patients with an eGFR below 30 mL/min/1.73m2. For patients with eGFR 30-45, initiation is not recommended but continuation at reduced dose may be considered.”

ckd-staging.md

91% match

“Medication Adjustment by CKD Stage: Multiple medications require dose adjustment or discontinuation as renal function declines. NSAIDs should be avoided in Stage 3 and beyond.”

ace-inhibitor-reference.md

87% match

“Renal Monitoring: Check serum creatinine and potassium within 1-2 weeks of initiation or dose increase. Discontinue if creatinine rises more than 30% or potassium exceeds 5.5 mEq/L.”

“How do I manage blood sugar in someone who cannot take metformin?”

Tests the medication ladder and renal contraindication overlap across diabetes management, diabetes-renal, and SGLT2 inhibitor documents.

“What's the sepsis protocol for the first hour?”

Tests precise retrieval from the sepsis bundle document. The "hour-1 bundle" concept should be retrieved even when the query uses different phrasing.

“My patient is on warfarin and needs to start amiodarone. What do I watch for?”

Tests drug interaction document retrieval. A practical clinical scenario requiring cross-reference between the anticoagulation guide and drug interactions document.

“When should I refer a patient to nephrology?”

Tests CKD staging referral criteria. A straightforward clinical question that should retrieve specific threshold values from the CKD staging document.

Try the Knowledge Base Live

This is a real chatbot powered by the 15 clinical sample docs you just explored. Ask it about treatment guidelines, drug interactions, care protocols, or clinical procedures.

Why This Model?

Model	Relevance	Notes
voyage-4-large Recommended	95%	Highest-accuracy general-purpose model. Best retrieval quality for clinical terminology where precision matters. Recommended for any healthcare application where accuracy is paramount.
voyage-4-lite	84%	Faster and more cost-effective. Handles straightforward clinical queries adequately, but struggles with nuanced cross-document retrieval involving specialized medical terminology.
voyage-code-3	72%	Optimized for code and technical docs, not clinical text. Included for comparison only. Medical vocabulary and clinical reasoning patterns are outside its training domain.

For clinical documents, voyage-4-large consistently outperforms lighter models on queries that require understanding medical terminology and cross-referencing clinical concepts. The difference is most pronounced on queries like "What medications should I avoid in a patient with kidney problems?" where the model needs to connect "kidney problems" with "renal impairment," "eGFR," and "CKD stages." For a healthcare application, the accuracy premium of voyage-4-large is worth the modest additional cost.

Scaling to Production

You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.

HIPAA considerations

vai processes documents locally before embedding. The vectors sent to Voyage AI do not contain readable PHI, but the text chunks stored in MongoDB do. Your Atlas cluster must be HIPAA-eligible if storing real clinical data. MongoDB Atlas offers HIPAA-eligible dedicated clusters with a BAA.

Document volume

A typical hospital formulary plus guideline set might be 5,000 to 50,000 pages. At this scale, initial embedding costs are modest with voyage-4-large, and queries cost fractions of a cent. Use vai estimate to project costs for your corpus size.

Keeping guidelines current

Clinical guidelines update regularly: drug formularies change, protocols evolve, new evidence emerges. Re-run vai pipeline on updated files and it will re-chunk, re-embed, and update only the changed documents. Automate this as part of your clinical documentation workflow.

Metadata filtering

Clinical search often needs filters by department, document type, or date. vai supports metadata filters on search, so you can narrow results to "only cardiology guidelines updated in the last year" before semantic ranking applies.

Conversational interface

The natural next step is vai chat: a clinician asking "What is the recommended anticoagulation for a patient with atrial fibrillation and moderate renal impairment?" and getting answers grounded in your organization's own vetted guidelines.

Ready to build your knowledge base?

Install vai and go from documents to searchable knowledge in minutes.

$ npm install -g voyageai-cli

Download Desktop App Star on GitHub

Explore other use cases: Healthcare, Legal, Finance, and more

💬 Ask about Clinical Docs

Ask anything about the developer documentation:

What medications should I avoid with kidney problems?

What is the sepsis protocol for the first hour?

How do I manage blood sugar if metformin is contraindicated?

When should I refer a patient to nephrology?

What are the fall prevention interventions?