Data
Data bullets represent immutable information with clear provenance. Once created, their content never changes.
Purpose
Data bullets capture facts, snapshots, and structured information. They're standalone - they don't accept inputs. Plans reference them to make information available to scripts and other processing steps.
Common uses include API responses, uploaded files, cached computations, curated datasets, and derived transformations.
Size Strategies
Data bullets support different patterns based on size:
Small Data (< 10KB) - Inline in content field. The data is included directly in the bullet. Best for configuration, small datasets, API responses, or computed results that fit comfortably in context.
Medium-Large Data (> 10KB) - External URL with summary and fetch instructions. The bullet contains a compact summary for context efficiency, instructions on how to fetch the full data, and a reference to external storage. Enables lazy loading - only fetch full data when actually needed.
Very Large Data (> 10MB) - Multiple data bullets with chunking hints. For datasets too large to process as a single unit, split across coordinated bullets with metadata about the collection structure.
Inline Data Pattern
For small datasets, include the content directly:
id: bullet://example/data-customer-feedback@1.0.0
# ... standard bullet metadata (see bullets.mdx) ...
# type: data
payload:
# Where this data came from
provenance:
origin: external
source_system: survey_tool
captured_at: "2025-11-12T10:30:00Z"
# Structure definition
# For structured data (tabular, JSON), include field schemas
# For unstructured data (images, audio, docs), just describe format and metadata
schema:
fields:
- name: customer_id
type: string
- name: feedback_text
type: string
- name: rating
type: integer
- name: timestamp
type: datetime
# The actual data - small enough to inline
content:
- customer_id: "cust_001"
feedback_text: "Great product, very intuitive interface!"
rating: 5
timestamp: "2025-11-10T14:22:00Z"
- customer_id: "cust_002"
feedback_text: "Good but could use better documentation."
rating: 4
timestamp: "2025-11-11T09:15:00Z"
- customer_id: "cust_003"
feedback_text: "Excellent customer support team."
rating: 5
timestamp: "2025-11-12T10:05:00Z"External Data Pattern
For larger datasets, use external storage with a summary and fetch instructions:
id: bullet://example/data-q4-customer-feedback@1.0.0
# ... standard bullet metadata (see bullets.mdx) ...
# type: data
payload:
# Where this data came from
provenance:
origin: external
source_system: survey_tool
captured_at: "2025-11-12T10:30:00Z"
# Structure definition
# For structured data (tabular, JSON), include field schemas
# For unstructured data (images, audio, docs), just describe format and metadata
schema:
type: tabular
format: parquet
mime_type: application/vnd.apache.parquet
fields:
- name: customer_id
type: string
- name: feedback_text
type: string
- name: rating
type: integer
- name: timestamp
type: datetime
# Compact summary - always injected to context
summary:
description: "Customer feedback from Q4 2025 survey campaign"
size_bytes: 2458624
stats:
row_count: 10247
date_range: ["2025-10-01", "2025-12-31"]
avg_rating: 4.2
# Representative sample
sample:
- customer_id: "cust_001"
feedback_text: "Great product, very intuitive interface!"
rating: 5
# Instructions for fetching full data
fetch_instructions: |
To access the full dataset, call fetch with this bullet ID.
Returns all rows as JSON array. Size: 2.4MB compressed.
# External storage reference - managed by centralized bullet repository
external:
url: https://bullets.io/data/q4-feedback-abc123def456.parquet
size_bytes: 2458624
sha256: abc123def456789abcdef0123456789abcdef0123456789abcdef0123456789
format: parquet
region: us-east-1The external pattern enables lazy loading: scripts receive the summary in context and can make intelligent decisions about whether to fetch the full data. This is essential for token efficiency when working with large datasets.