Self-Improvement

The Bullet Protocol enables continuous improvement through a self-improvement loop that learns from operational feedback. While individual bullets are self-evaluating (they contain their own tests), the ecosystem is self-improving (bullets are improved by external processes that analyze execution outcomes).

This architectural separation is intentional: bullets don't modify themselves, but the system learns which bullets are effective and evolves the collection over time.

The Three-Role Loop

The self-improvement loop consists of three specialized components inspired by Stanford's ACE framework (opens in a new tab). While the final implementation will likely be a variation adapted to the Bullet Protocol's specific needs, the core pattern of generate-reflect-curate provides a proven foundation for continuous learning systems.

Generator

The generator attempts tasks using current bullets. It composes plans, applies guidance, and executes workflows. During execution, it records which bullets were helpful and which were harmful based on whether operations succeeded or failed.

The generator creates execution manifests that capture:

Which specific versions of bullets were used
What inputs and outputs occurred
Whether steps succeeded or failed
Performance metrics (latency, cost)
Guidance that was active

This rich execution data provides the raw material for learning.

Reflector

The reflector analyzes execution outcomes to distill insights. It examines why particular attempts succeeded or failed, identifies patterns across multiple executions, and extracts lessons about what works and what doesn't.

The reflector discovers:

Guidance issues: Certain guidance is counterproductive in specific contexts
Script failures: Particular scripts fail on edge cases
Poor interactions: Combinations of bullets interact poorly
Missing capabilities: Tasks that require new bullets

The reflector also synthesizes new tests from edge cases encountered in production. When a script fails on unexpected input, that input becomes a test case preventing regression. When postconditions fail to catch an invalid output, stronger postconditions get suggested. The test suite grows more comprehensive through operational experience.

Curator

The curator applies incremental updates based on reflector insights. Rather than rewriting everything, it makes targeted changes:

Creates new bullets capturing successful patterns
Edits existing bullets to fix problems or add capabilities
Deprecates obsolete bullets that better alternatives have superseded
Deduplicates similar bullets by consolidating them

The curator operates through versioning. When a bullet needs improvement, the curator doesn't modify the existing version (bullets are immutable). Instead, it creates a new version with the improvements, and the quality metrics guide future resolution toward better implementations.

This incremental approach preserves the specificity and rationale behind each piece of knowledge, avoiding the context collapse problem that plagues systems which repeatedly compress and rewrite their prompts.

Versioning and Immutability

Self-improvement relies on a foundation of semantic versioning and immutability. All bullets follow semantic versioning (opens in a new tab) (MAJOR.MINOR.PATCH):

MAJOR: Breaking changes to contracts, behavior, or guarantees
MINOR: Backward-compatible additions (new features, improved performance)
PATCH: Backward-compatible bug fixes

Published bullet versions are immutable. Once bullet://example/script@2.1.0 is published, the code, contracts, and metadata cannot change. New fixes or features require new versions. This ensures reproducibility (same version always behaves the same), auditability (historical executions remain verifiable), and trust (no silent mutations).

Plans use version ranges to balance stability with evolution. A range like ^2.0.0 accepts any 2.x.x version but not 3.0.0, allowing plans to automatically benefit from improvements while preventing breaking changes.

Bullets can be marked as deprecated without breaking existing references. Deprecated bullets still resolve and execute, but discovery deprioritizes them and manifests flag their usage, providing graceful migration paths to better alternatives.

Continuous Improvement

The cumulative effect of this learning loop is continuous improvement without manual intervention. Execution outcomes are traced back to specific bullets, enabling the system to learn which components are actually effective.

Better implementations gradually displace worse ones through selection pressure. The resolver tracks which scripts succeed and which fail. Over time, successful implementations accumulate positive reputation and get chosen more often. Failing implementations fall in rankings and eventually get deprecated.

Guidance that helps becomes more prominent. When guidance is active during successful executions, it accumulates helpful signals. The system prioritizes applying highly-rated guidance in similar contexts.

Patterns that work get captured and reused. When an agent accomplishes a task through improvisation, the system can mint a plan bullet representing what worked. This captures proven workflows automatically rather than requiring manual specification.

The shared learning benefits all users. When one organization discovers a better way to handle Gmail rate limits, that knowledge propagates through the ecosystem. Others benefit automatically as the guidance accumulates helpful evidence and rises in rankings.

This evolution happens within the bounds of compatibility constraints. Semantic versioning ensures improvements don't break existing users. Deprecated bullets remain available for systems that depend on them while new systems use better alternatives.

HTTP Addressability Evaluation