voyage-code-3

Make Your Engineering Docs Actually Searchable

Internal docs, API references, and runbooks: semantic search in minutes

Start the Walkthrough Download Sample Docs

The Problem

Every engineering team has a documentation problem. Internal docs live across Confluence, Notion, GitHub wikis, README files, and Slack threads. The official search is terrible. Developers ask the same questions in Slack because it's faster than searching the wiki. When someone asks "How do I set up the local development environment?" the answer exists somewhere, but nobody can find it.

The irony: developers build search systems for users but can't search their own documentation. Traditional keyword search fails because engineering docs use inconsistent terminology. "Deployment," "shipping," "releasing," and "going live" might all describe the same process in different documents.

The Solution

vai turns your scattered documentation into a searchable knowledge base in minutes. Point it at a folder of markdown files, and it handles chunking, embedding with Voyage AI's code-optimized model, and indexing in MongoDB Atlas Vector Search. The result: semantic search that understands "How do I get the dev environment running?" finds the local-dev-setup doc, even if it never uses the word "running."

Documents

Your files

Chunk

Split text

Embed

Voyage AI

Index

MongoDB Atlas

Semantic query

Sample Document Set

16 synthetic but realistic documents, ~40KB total. Small enough to process in minutes, rich enough to produce meaningful search results.

Download All (16 files, ~40KB)

File	Topic	Size
architecture-overview.md	System architecture: microservices, event bus, data stores	~3KB
api-authentication.md	API auth: OAuth 2.0 flow, JWT tokens, API keys, rate limiting	~3KB
api-endpoints-users.md	User API endpoints: CRUD, search, permissions, pagination	~3KB
api-endpoints-orders.md	Order API endpoints: create, status, webhooks, idempotency	~3KB
local-dev-setup.md	Local development environment: Docker Compose, seed data, env vars	~3KB
deployment-guide.md	Deployment process: CI/CD pipeline, staging, production, rollback	~3KB
database-schema.md	Database schema: tables, indexes, migrations, naming conventions	~3KB
monitoring-runbook.md	Monitoring and alerting: Datadog dashboards, PagerDuty escalation	~3KB
incident-response.md	Incident response: severity levels, communication, postmortem	~2KB
onboarding-checklist.md	New engineer onboarding: accounts, tooling, first PR, buddy system	~2KB
testing-strategy.md	Testing philosophy: unit, integration, e2e, coverage targets	~2KB
feature-flags.md	Feature flag system: LaunchDarkly setup, naming, lifecycle, cleanup	~2KB
error-handling.md	Error handling patterns: error codes, retry logic, circuit breakers	~2KB
caching-strategy.md	Caching architecture: Redis layers, TTLs, invalidation patterns	~2KB
adr-001-event-sourcing.md	ADR: Adopted event sourcing for order service	~2KB
adr-002-graphql.md	ADR: Chose GraphQL over REST for new client API	~2KB

Walkthrough

From zero to a searchable knowledge base. Follow these steps, each takes 1-3 minutes.

Step 1

Install vai

Install the vai CLI globally. If you already have it, skip to the next step.

$npm install -g voyageai-cli

added 1 package in 3s

1 package is looking for funding
  run `npm fund` for details

Step 2

Configure credentials

Set your Voyage AI API key and MongoDB Atlas connection string. You can get a free Voyage AI key at dash.voyageai.com and a free MongoDB Atlas cluster at cloud.mongodb.com.

$vai config set api-key YOUR_VOYAGE_API_KEY

$vai config set mongodb-uri YOUR_MONGODB_URI

✓ api-key saved
✓ mongodb-uri saved

Your credentials are stored locally in ~/.vai/config.json and never shared.

Step 3

Download the sample documents

Grab the 16-file sample documentation set. These are synthetic but realistic engineering docs covering architecture, APIs, runbooks, and ADRs.

$curl -L https://vaicli.com/use-cases/devdocs/sample-docs/sample-docs.zip -o sample-docs.zip

$unzip sample-docs.zip -d ./sample-docs

Archive:  sample-docs.zip
  inflating: ./sample-docs/architecture-overview.md
  inflating: ./sample-docs/api-authentication.md
  ...
  inflating: ./sample-docs/adr-002-graphql.md
  16 files extracted

Step 4

Ingest and embed the documents

Run the vai pipeline to chunk, embed, and index all 16 documents. This uses voyage-code-3, a model optimized for technical content, and creates a vector search index in MongoDB Atlas automatically.

$vai pipeline ./sample-docs/ --model voyage-code-3 --db devdocs_demo --collection engineering_knowledge --create-index

◼ Scanning ./sample-docs/ ...
  Found 16 files (40KB total)

◼ Chunking documents ...
  Created 127 chunks (avg 312 chars)

◼ Embedding with voyage-code-3 ...
  ████████████████████████████████ 127/127 chunks
  Embedded in 2.3s (55 chunks/sec)

◼ Storing in MongoDB Atlas ...
  Database: devdocs_demo
  Collection: engineering_knowledge
  Inserted 127 documents

◼ Creating vector search index ...
  Index "vector_index" created on field "embedding"
  Dimensions: 1024 | Similarity: cosine

✓ Pipeline complete — 16 files → 127 indexed chunks

Step 5

Run your first search

Test the knowledge base with a simple query. Notice how semantic search finds the right document even when the query uses different words than the source.

$vai search "How do I get the development environment running on my laptop?" --db devdocs_demo --collection engineering_knowledge

Query: "How do I get the development environment running on my laptop?"
Model: voyage-code-3 | Results: 5

1. local-dev-setup.md (score: 0.94)
   "Prerequisites: Docker Desktop 4.x+, Node.js 20 LTS, and access
    to the team 1Password vault for environment variables. Clone the
    monorepo and run docker compose up from the project root..."

2. onboarding-checklist.md (score: 0.87)
   "Day 1 Setup: Request access to GitHub org, Datadog, PagerDuty,
    and LaunchDarkly. Follow the local-dev-setup guide to get the
    development environment running on your machine..."

3. deployment-guide.md (score: 0.72)
   "Local Testing: Before pushing to staging, verify your changes
    work locally by running the full test suite against the Docker
    Compose environment..."

Step 6

Try domain-specific queries

Run a few more queries to see how semantic search handles cross-document retrieval, technical jargon, and questions that span multiple topics.

$vai search "What happens when an API request fails?" --db devdocs_demo --collection engineering_knowledge

Query: "What happens when an API request fails?"
Model: voyage-code-3 | Results: 5

1. error-handling.md (score: 0.93)
   "All API errors return a structured JSON response with an error
    code, human-readable message, and request ID for tracing. Client
    errors (4xx) include validation details..."

2. api-authentication.md (score: 0.82)
   "Authentication failures return 401 with a WWW-Authenticate header.
    Expired JWT tokens should be refreshed using the /auth/refresh
    endpoint. Rate limit exceeded returns 429..."

3. monitoring-runbook.md (score: 0.76)
   "Alert: API Error Rate > 5%. Escalation: Check the error-rate
    Datadog dashboard. Common causes: upstream service degradation,
    database connection pool exhaustion..."

Step 7

Explore in the playground

Launch the vai playground for a visual interface. Browse your indexed documents, run queries interactively, and see similarity scores visualized.

$vai playground

◼ Starting vai playground ...
  Server running at http://localhost:1958

  Open your browser to explore:
  • Search your knowledge base
  • Compare embedding models
  • Visualize similarity scores

The playground connects to the same MongoDB collection, so your devdocs knowledge base is ready to query visually.

Example Queries

See how semantic search handles real questions. Click a query to see the results.

“How do I get the development environment running on my laptop?”

The most common new-hire question. Tests retrieval from the local-dev-setup doc using natural language that differs from the document title.

local-dev-setup.md

94% match

“Prerequisites: Docker Desktop 4.x+, Node.js 20 LTS, and access to the team 1Password vault for environment variables. Clone the monorepo and run docker compose up from the project root.”

onboarding-checklist.md

87% match

“Day 1 Setup: Follow the local-dev-setup guide to get the development environment running. Your onboarding buddy can help if you hit issues with the VPN or Docker networking.”

“What happens when an API request fails and how do we handle errors?”

Tests cross-document retrieval: spans error-handling patterns, API endpoint docs, and the monitoring runbook.

“What's the process for deploying to production?”

Tests retrieval from the deployment guide, CI/CD pipeline docs, and feature flag lifecycle.

“Why did we choose event sourcing for the order service?”

Tests architectural decision record (ADR) retrieval. ADRs are a specific document type that captures rationale.

“How do I add a new API endpoint with authentication?”

A practical "how do I" question that spans API auth docs, endpoint patterns, and the testing strategy.

Try the Knowledge Base Live

This is a real chatbot powered by the 16 sample docs you just explored. Ask it anything about authentication, deployment, architecture, or any of the developer documentation.

Why This Model?

Model	Relevance	Notes
voyage-code-3 Recommended	94%	Purpose-built for code and technical docs. Best at understanding mixed prose, code snippets, and CLI commands in the same document.
voyage-4-large	89%	Excellent general-purpose model. Slightly lower relevance on code-heavy docs, but strong on pure prose sections like ADRs and onboarding guides.
voyage-4-lite	82%	Fastest and cheapest option. Good enough for simple queries, but misses nuance in technical jargon and code-comment relationships.

For developer documentation, voyage-code-3 consistently outperforms general-purpose models on queries that mix natural language with technical concepts. The difference is most noticeable on queries like "How do I add a new endpoint with auth?" where the model needs to connect prose instructions with code patterns. For pure text documentation (policies, onboarding guides), the gap narrows, and voyage-4-large is a strong alternative.

Scaling to Production

You just built a working knowledge base from 16 sample docs. Here is what changes when you scale to thousands of real documents.

Source diversity

Real engineering docs come from multiple sources: Git repos, Confluence, Notion, Google Docs. vai pipeline accepts any folder of text files. Export your docs to markdown, point vai at the folder, and the pipeline handles the rest.

Keeping docs current

Engineering docs change constantly. Re-run vai pipeline on updated files and it will re-chunk, re-embed, and update only the changed documents. Set up a cron job or CI hook to keep your knowledge base in sync.

Cost at scale

A typical engineering org has 500 to 5,000 pages of documentation. At this scale, the initial embedding costs pennies with voyage-code-3, and queries cost fractions of a cent each. Use vai estimate to project costs for your corpus.

MCP server integration

vai mcp-server exposes your knowledge base to AI coding assistants (Cursor, Claude Code, Windsurf). Your team's docs become available to every developer's AI agent, so "How do I deploy?" gets answered from your actual runbook.

Conversational interface

The natural next step is vai chat, which adds a conversational layer on top of your knowledge base. New hires can ask questions in natural language and get answers grounded in your team's actual documentation.

Ready to build your knowledge base?

Install vai and go from documents to searchable knowledge in minutes.

$ npm install -g voyageai-cli

Download Desktop App Star on GitHub

Explore other use cases: Healthcare, Legal, Finance, and more

💬 Ask about Developer Docs

Ask anything about the developer documentation:

How do I set up the local dev environment?

How does API authentication work?

What is the deployment process?

How is error handling implemented?

What does the database schema look like?