Healthcare teams drown in clinical documentation. Treatment guidelines update quarterly. Drug interaction databases span thousands of pages. Internal protocols live in scattered wikis, PDFs, and shared drives. When a clinician needs an answer, "What's the recommended first-line treatment for Type 2 diabetes with renal impairment?", they search through multiple systems, often settling for whatever Google returns rather than their organization's own vetted guidelines.
Standard search tools fail here because clinical questions are semantic, not keyword-based. A search for "diabetes kidney treatment" needs to find documents about "glycemic management in chronic kidney disease": same concept, completely different words. This is exactly what embedding-based semantic search solves.
vai turns your clinical documentation into a searchable knowledge base in minutes. Point it at a folder of guidelines, drug references, and care protocols, and it handles chunking, embedding with Voyage AI's highest-accuracy model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "What medications should I avoid in a patient with kidney problems?" finds answers across your metformin reference, CKD staging guide, and ACE inhibitor docs, even when each uses different terminology.
Documents
Your files
Chunk
Split text
Embed
Voyage AI
Index
MongoDB Atlas
Search
Semantic query
15 synthetic but realistic documents, ~34KB total. Small enough to process in minutes, rich enough to produce meaningful search results.
Download All (15 files, ~34KB)| File | Topic | Size |
|---|---|---|
diabetes-management.md | Type 2 diabetes treatment guidelines, HbA1c targets, medication ladder | ~3KB |
diabetes-renal.md | Glycemic management in patients with CKD stages 3 to 5 | ~2KB |
metformin-reference.md | Metformin prescribing information, contraindications, renal dosing | ~2KB |
sglt2-inhibitors.md | SGLT2 inhibitor class overview, cardiovascular and renal benefits | ~2KB |
hypertension-guidelines.md | Blood pressure targets, first-line agents, resistant hypertension | ~3KB |
ace-inhibitor-reference.md | ACE inhibitor prescribing, renal protective effects, monitoring | ~2KB |
heart-failure-protocol.md | HFrEF and HFpEF management, GDMT optimization | ~3KB |
anticoagulation-guide.md | Anticoagulation selection, DOAC vs warfarin, bridging protocols | ~3KB |
sepsis-bundle.md | Sepsis recognition, hour-1 bundle, lactate-guided resuscitation | ~2KB |
pain-management.md | Acute and chronic pain protocols, opioid stewardship, multimodal approach | ~2KB |
drug-interactions-cardiac.md | Common drug interactions in cardiac patients, QTc prolongation risks | ~2KB |
ckd-staging.md | Chronic kidney disease staging, eGFR calculation, referral criteria | ~2KB |
insulin-protocols.md | Basal-bolus insulin, sliding scale, transition from IV to subcutaneous | ~2KB |
discharge-checklist.md | Hospital discharge protocol, medication reconciliation, follow-up | ~2KB |
falls-prevention.md | Fall risk assessment, prevention interventions, post-fall protocol | ~2KB |
From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.
Install vai
Install the vai CLI globally. If you already have it, skip to the next step.
added 1 package in 3s 1 package is looking for funding run `npm fund` for details
Configure credentials
Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.
✓ api-key saved ✓ mongodb-uri saved
Your credentials are stored locally in ~/.vai/config.json and never shared.
Download the sample documents
Grab the 15-file sample clinical document set. These are synthetic but realistic treatment guidelines, drug references, and care protocols for a fictional hospital system.
Archive: sample-docs.zip inflating: ./sample-docs/diabetes-management.md inflating: ./sample-docs/diabetes-renal.md ... inflating: ./sample-docs/falls-prevention.md 15 files extracted
Ingest and embed the documents
Run the vai pipeline to chunk, embed, and index all 15 documents. This uses voyage-4-large, the highest-accuracy general-purpose model, and creates a vector search index in MongoDB Atlas.
◼ Scanning ./sample-docs/ ... Found 15 files (34KB total) ◼ Chunking documents ... Created 118 chunks (avg 288 chars) ◼ Embedding with voyage-4-large ... ████████████████████████████████ 118/118 chunks Embedded in 2.1s (56 chunks/sec) ◼ Storing in MongoDB Atlas ... Database: healthcare_demo Collection: clinical_knowledge Inserted 118 documents ◼ Creating vector search index ... Index "vector_index" created on field "embedding" Dimensions: 1024 | Similarity: cosine ✓ Pipeline complete — 15 files → 118 indexed chunks
Run your first clinical search
Test the knowledge base with a query that spans multiple clinical documents. Notice how semantic search finds relevant information even when the query uses different terminology than the source documents.
Query: "What medications should I avoid in a patient with kidney problems?"
Model: voyage-4-large | Results: 5
1. metformin-reference.md (score: 0.94)
"Contraindications: Metformin is contraindicated in patients with
an eGFR below 30 mL/min/1.73m2. For patients with eGFR 30-45,
initiation is not recommended but continuation at reduced dose
may be considered with close monitoring..."
2. ckd-staging.md (score: 0.91)
"Medication Adjustment by CKD Stage: Multiple medications require
dose adjustment or discontinuation as renal function declines.
Review all medications at each stage transition..."
3. ace-inhibitor-reference.md (score: 0.87)
"Renal Monitoring: Check serum creatinine and potassium within
1-2 weeks of initiation or dose increase. A rise in creatinine
of up to 30% is acceptable and expected..."Try cross-document clinical queries
Run queries that require understanding medical concepts across different document types. This is where semantic search delivers the most value over traditional keyword search.
Query: "How do I manage blood sugar in someone who cannot take metformin?"
Model: voyage-4-large | Results: 5
1. diabetes-management.md (score: 0.93)
"Second-Line Agents: When metformin is contraindicated or not
tolerated, consider SGLT2 inhibitors (preferred if cardiovascular
or renal comorbidity) or GLP-1 receptor agonists..."
2. diabetes-renal.md (score: 0.90)
"Glycemic Management in CKD: For patients with eGFR below 30
where metformin is contraindicated, SGLT2 inhibitors with
demonstrated renal benefit (dapagliflozin, empagliflozin)
are preferred..."
3. sglt2-inhibitors.md (score: 0.86)
"SGLT2 inhibitors have demonstrated cardiovascular and renal
benefits independent of their glucose-lowering effect.
Recommended as first-line add-on or metformin alternative..."Explore in the playground
Launch the vai playground for a visual interface. Browse your indexed clinical documents, run queries interactively, and compare how different models handle clinical terminology.
◼ Starting vai playground ... Server running at http://localhost:1958 Open your browser to explore: • Search your knowledge base • Compare embedding models • Visualize similarity scores
Try comparing voyage-4-large results with voyage-4-lite on the same clinical query to see how model quality affects retrieval accuracy for medical terminology.
See how semantic search handles real questions. Click a query to see the results.
“What medications should I avoid in a patient with kidney problems?”
Tests cross-document retrieval spanning the metformin reference, CKD staging guide, and ACE inhibitor docs. The query uses "kidney problems" while the documents use "renal impairment," "CKD," and "eGFR."
metformin-reference.md
“Contraindications: Metformin is contraindicated in patients with an eGFR below 30 mL/min/1.73m2. For patients with eGFR 30-45, initiation is not recommended but continuation at reduced dose may be considered.”
ckd-staging.md
“Medication Adjustment by CKD Stage: Multiple medications require dose adjustment or discontinuation as renal function declines. NSAIDs should be avoided in Stage 3 and beyond.”
ace-inhibitor-reference.md
“Renal Monitoring: Check serum creatinine and potassium within 1-2 weeks of initiation or dose increase. Discontinue if creatinine rises more than 30% or potassium exceeds 5.5 mEq/L.”
“How do I manage blood sugar in someone who cannot take metformin?”
Tests the medication ladder and renal contraindication overlap across diabetes management, diabetes-renal, and SGLT2 inhibitor documents.
“What's the sepsis protocol for the first hour?”
Tests precise retrieval from the sepsis bundle document. The "hour-1 bundle" concept should be retrieved even when the query uses different phrasing.
“My patient is on warfarin and needs to start amiodarone. What do I watch for?”
Tests drug interaction document retrieval. A practical clinical scenario requiring cross-reference between the anticoagulation guide and drug interactions document.
“When should I refer a patient to nephrology?”
Tests CKD staging referral criteria. A straightforward clinical question that should retrieve specific threshold values from the CKD staging document.
This is a real chatbot powered by the 15 clinical sample docs you just explored. Ask it about treatment guidelines, drug interactions, care protocols, or clinical procedures.
| Model | Relevance | Notes |
|---|---|---|
voyage-4-large Recommended | 95% | Highest-accuracy general-purpose model. Best retrieval quality for clinical terminology where precision matters. Recommended for any healthcare application where accuracy is paramount. |
voyage-4-lite | 84% | Faster and more cost-effective. Handles straightforward clinical queries adequately, but struggles with nuanced cross-document retrieval involving specialized medical terminology. |
voyage-code-3 | 72% | Optimized for code and technical docs, not clinical text. Included for comparison only. Medical vocabulary and clinical reasoning patterns are outside its training domain. |
For clinical documents, voyage-4-large consistently outperforms lighter models on queries that require understanding medical terminology and cross-referencing clinical concepts. The difference is most pronounced on queries like "What medications should I avoid in a patient with kidney problems?" where the model needs to connect "kidney problems" with "renal impairment," "eGFR," and "CKD stages." For a healthcare application, the accuracy premium of voyage-4-large is worth the modest additional cost.
You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.
HIPAA considerations
vai processes documents locally before embedding. The vectors sent to Voyage AI do not contain readable PHI, but the text chunks stored in MongoDB do. Your Atlas cluster must be HIPAA-eligible if storing real clinical data. MongoDB Atlas offers HIPAA-eligible dedicated clusters with a BAA.
Document volume
A typical hospital formulary plus guideline set might be 5,000 to 50,000 pages. At this scale, initial embedding costs are modest with voyage-4-large, and queries cost fractions of a cent. Use vai estimate to project costs for your corpus size.
Keeping guidelines current
Clinical guidelines update regularly: drug formularies change, protocols evolve, new evidence emerges. Re-run vai pipeline on updated files and it will re-chunk, re-embed, and update only the changed documents. Automate this as part of your clinical documentation workflow.
Metadata filtering
Clinical search often needs filters by department, document type, or date. vai supports metadata filters on search, so you can narrow results to "only cardiology guidelines updated in the last year" before semantic ranking applies.
Conversational interface
The natural next step is vai chat: a clinician asking "What is the recommended anticoagulation for a patient with atrial fibrillation and moderate renal impairment?" and getting answers grounded in your organization's own vetted guidelines.
Install vai and go from documents to searchable knowledge in minutes.
$ npm install -g voyageai-cli
Explore other use cases: Healthcare, Legal, Finance, and more