Semantic Search

Find code by meaning, not just keywords

Semantic Search

ultrasync provides semantic code search that understands what you're looking for, not just the exact words you type. Search for concepts like "authentication logic" or "error handling" and find relevant code across your codebase.

How It Works

Just-In-Time Indexing

ultrasync uses a "Just-In-Time" (JIT) indexing strategy. Instead of requiring a full upfront index of your entire codebase, files are indexed as they're accessed:

  1. When you search or read a file, ultrasync checks if it's indexed
  2. If not, embeddings are generated immediately
  3. The index entry is cached for future searches

This means zero startup overhead and immediate responsiveness to new files.

Embeddings

Code and queries are converted to 384-dimensional vectors using the all-MiniLM-L6-v2 model. These vectors capture semantic meaning, allowing the system to find code that's conceptually similar even if it uses different terminology.

Symbol Extraction

For supported languages, ultrasync extracts symbols (functions, classes, types) and indexes them separately. This enables precise symbol-level search results with source code included.

Search Modes

Semantic Search (Default)

Vector similarity search that understands concepts:

search("authentication logic")
search("database connection handling")
search("user input validation")

Best for conceptual queries where you're looking for code that does something, not code that contains specific text.

BM25 keyword matching for exact symbol names:

search("handleSubmit", search_mode="lexical")
search("UserAuthContext", search_mode="lexical")

Best for finding specific function or class names.

Combines semantic and lexical results using Reciprocal Rank Fusion:

search("auth middleware", search_mode="hybrid")

More thorough but slightly slower. Good when you're not sure which approach will work best.

In Claude Code

The search() function is automatically available as an MCP tool:

search("login flow")

Returns ranked results with:

  • File paths
  • Symbol names and types
  • Relevance scores
  • Source code snippets

From the CLI

ultrasync query --query-text "authentication logic"

Options:

  • --top-k N - Number of results (default: 5)
  • --format json|tsv - Output format
  • --threshold N - Minimum score threshold

Fallback Behavior

If semantic search returns no results, ultrasync falls back to:

  1. Git grep for pattern matching
  2. Ripgrep with common file extensions

Files discovered through fallback are automatically indexed for future searches.

Be descriptive, not literal:

  • "user authentication flow" instead of "login"
  • "error handling for API calls" instead of "try catch"

Use domain terminology:

  • "React component for user settings"
  • "database migration for adding email field"

Combine with filters:

  • Search within specific directories
  • Filter by file type or symbol kind

On this page