Concepts
Discovery at Scale

Discovery at Scale

As the bullet ecosystem grows to arbitrarily large scale, discovery must be efficient, accurate, and quality-aware. The system surfaces the right capability among vast numbers of options in milliseconds.

Discovery Metadata

Every bullet includes a discovery section designed for large-scale search:

discovery:
  embedding_hints:      # For semantic search
    primary: "..."
    context: "..."
    keywords: [...]
  quality:             # For ranking
    execution_count: 15234
    success_rate: 0.98
    adoption_score: 0.85
    plan_references: 142

The Discovery Pipeline

Stage 1: Semantic Search

Vector embeddings find semantically relevant bullets. The embedding_hints structure provides rich text for embedding generation—primary description, additional context, and specific keywords.

The registry performs approximate nearest neighbor search across the entire collection. This finds bullets whose meaning aligns with your query, even without exact keyword matches. Searching "organize inbox automatically" finds bullets about "email triage," "message classification," and "priority sorting" because they're semantically similar.

Stage 2: Quality Ranking

Semantic search might return thousands of candidates. The system ranks them using pre-computed quality scores:

quality:
  status: production              # production > candidate > draft
  test_pass_rate: 1.0             # 100% tests passing
  execution_count: 15234          # Proven in production
  success_rate: 0.98              # 98% success rate
  adoption_score: 0.85            # Weighted composite
  plan_references: 142            # Many plans reference this

This is the key innovation: quality metrics are pre-computed and highly queryable. You can rank 1000 bullets by adoption_score in microseconds because it's a simple numeric field, not a complex evaluation suite to run.

The ranking weighs:

  1. Status (production > candidates)
  2. Test quality (high pass rates and coverage)
  3. Real-world usage (execution counts and success rates)
  4. Adoption (how many plans reference this bullet)
  5. Freshness (recently validated)
  6. Evaluations (detailed test results and evidence)

Stage 3: Final Selection

The top-ranked bullets (typically 10-20) are presented for selection. At this point, users or agents can inspect detailed evaluation metadata for evidence—which tests passed, what the environment was, real telemetry, when last tested.

This deep inspection happens only on finalists, not the entire candidate pool.

Index Architecture

The system maintains separate indexes:

Global index contains only curated bullets—implementations proven reliable through evaluation and production use. Most discovery queries search this by default.

Candidate index holds bullets in probation. They pass minimum quality bars but lack extensive track records. Marked as unproven, the resolver tries them in controlled experiments.

Private indexes are scoped to users or organizations. Your private bullets never appear in global search but are discoverable within your scope.

Runtime artifacts and drafts never enter any index—they're not sharable capabilities.