Data

Data bullets represent immutable information with clear provenance. Once created, their content never changes.

Purpose

Data bullets capture facts, snapshots, and structured information. They're standalone - they don't accept inputs. Plans reference them to make information available to scripts and other processing steps.

Common uses include API responses, uploaded files, cached computations, curated datasets, and derived transformations.

Size Strategies

Data bullets support different patterns based on size:

Small Data (< 10KB) - Inline in content field. The data is included directly in the bullet. Best for configuration, small datasets, API responses, or computed results that fit comfortably in context.

Medium-Large Data (> 10KB) - External URL with summary and fetch instructions. The bullet contains a compact summary for context efficiency, instructions on how to fetch the full data, and a reference to external storage. Enables lazy loading - only fetch full data when actually needed.

Very Large Data (> 10MB) - Multiple data bullets with chunking hints. For datasets too large to process as a single unit, split across coordinated bullets with metadata about the collection structure.

Inline Data Pattern

For small datasets, include the content directly:

id: bullet://example/data-customer-feedback@1.0.0
 
# ... standard bullet metadata (see bullets.mdx) ...
# type: data
 
payload:
  # Where this data came from
  provenance:
    origin: external
    source_system: survey_tool
    captured_at: "2025-11-12T10:30:00Z"
  
  # Structure definition
  # For structured data (tabular, JSON), include field schemas
  # For unstructured data (images, audio, docs), just describe format and metadata
  schema:
    fields:
      - name: customer_id
        type: string
      - name: feedback_text
        type: string
      - name: rating
        type: integer
      - name: timestamp
        type: datetime
  
  # The actual data - small enough to inline
  content:
    - customer_id: "cust_001"
      feedback_text: "Great product, very intuitive interface!"
      rating: 5
      timestamp: "2025-11-10T14:22:00Z"
    - customer_id: "cust_002"
      feedback_text: "Good but could use better documentation."
      rating: 4
      timestamp: "2025-11-11T09:15:00Z"
    - customer_id: "cust_003"
      feedback_text: "Excellent customer support team."
      rating: 5
      timestamp: "2025-11-12T10:05:00Z"

External Data Pattern

For larger datasets, use external storage with a summary and fetch instructions:

id: bullet://example/data-q4-customer-feedback@1.0.0
 
# ... standard bullet metadata (see bullets.mdx) ...
# type: data
 
payload:
  # Where this data came from
  provenance:
    origin: external
    source_system: survey_tool
    captured_at: "2025-11-12T10:30:00Z"
  
  # Structure definition
  # For structured data (tabular, JSON), include field schemas
  # For unstructured data (images, audio, docs), just describe format and metadata
  schema:
    type: tabular
    format: parquet
    mime_type: application/vnd.apache.parquet
    fields:
      - name: customer_id
        type: string
      - name: feedback_text
        type: string
      - name: rating
        type: integer
      - name: timestamp
        type: datetime
  
  # Compact summary - always injected to context
  summary:
    description: "Customer feedback from Q4 2025 survey campaign"
    size_bytes: 2458624
    stats:
      row_count: 10247
      date_range: ["2025-10-01", "2025-12-31"]
      avg_rating: 4.2
    
    # Representative sample
    sample:
      - customer_id: "cust_001"
        feedback_text: "Great product, very intuitive interface!"
        rating: 5
  
  # Instructions for fetching full data
  fetch_instructions: |
    To access the full dataset, call fetch with this bullet ID.
    Returns all rows as JSON array. Size: 2.4MB compressed.
  
  # External storage reference - managed by centralized bullet repository
  external:
    url: https://bullets.io/data/q4-feedback-abc123def456.parquet
    size_bytes: 2458624
    sha256: abc123def456789abcdef0123456789abcdef0123456789abcdef0123456789
    format: parquet
    region: us-east-1

The external pattern enables lazy loading: scripts receive the summary in context and can make intelligent decisions about whether to fetch the full data. This is essential for token efficiency when working with large datasets.

Script Plan