bebo CLI reference
Standalone decoder for BEBO archive files. Read, verify, and export without hitting the Ceradela API. MIT-licensed, source on GitHub.
Install
# Homebrew (macOS / Linux)
brew install ceradela/tap/bebo
# Direct download
curl -sSL https://install.ceradela.com/bebo | sh
# Verify
bebo --version
# bebo 0.1.0 Global flags
-o, --output | table | ndjson | json | csv | tsv — default table on a TTY, ndjson when piping |
-c, --columns | comma-separated list — skips non-matching columns during decode |
-n, --limit | row limit for streaming commands |
--where | predicate like "id>100" or "total<1000" (numeric columns only, v0.1) |
--dict | path to trained zstd dictionary if the archive was compressed with one |
--no-color | disable ANSI escape codes |
--version | print version and exit |
Read / inspect commands
bebo head <file>
First N rows (default 10).
$ bebo head orders.cbebomth -n 3 -c id,total_cents
id total_cents
── ───────────
1 12500
2 8900
3 45000 bebo cat <file>
Stream all rows. Honors --where for filtering and -c for column pruning. Output is NDJSON when piped.
$ bebo cat orders.cbebomth --where "total_cents>10000" -c id,customer_id | jq -s length
1842 bebo schema <file>
Column schema with inferred Go types.
$ bebo schema orders.cbebomth
orders.cbebomth (monthly, 24812 rows, 7 cols)
id int64
customer_id int64
total_cents int64
status string
created_at time.Time
metadata jsonb
tags string[] bebo meta <file>
File metadata: kind, bytes, row count, columns. JSON output by default.
bebo count <file>
Row count. O(1) for monthly archives (reads footer), O(N) for bundles (decodes each partition).
bebo sizes <file>
Per-column uncompressed size — useful for debugging why an archive is bigger than expected.
bebo stats <file>
Per-column non-null, null, distinct, min, max counts.
bebo sample <file>
Random sample of N rows. --seed for reproducibility.
DR / verify commands
bebo verify <file>
CRC32 per page + SHA-256 of the whole file. Quick integrity check.
bebo verify --deep <file>
Re-decodes every page — proves the archive is restorable, not just on-disk intact. Takes longer (equivalent to a full read) but this is the check that caught GitLab-style silent corruption before it ships a month of bad backups.
$ bebo verify --deep orders.cbebomth
{
"file": "orders.cbebomth",
"kind": "monthly",
"bytes": 104857,
"sha256": "bd77...",
"crc32": "2144df1c",
"decode_ok": true,
"columns": 7,
"rows_decoded": 24812
} bebo diff <a> <b>
Schema + row-count diff between two archives. Useful for detecting schema drift between versions.
Portability / exit commands
bebo export <file> --to=<format>
Convert to another format. Use --out path to write a file, otherwise stdout.
| Format | Use case |
|---|---|
ndjson | Streaming, typed numbers, jq-friendly. Default when piping. |
json | One JSON array. Smaller files only. |
csv | RFC 4180. Excel / Google Sheets. |
tsv | Tab-separated. Unix pipes. |
parquet | Via DuckDB passthrough (install DuckDB). Industry-standard columnar. |
# Parquet via the DuckDB passthrough path (v0.1 approach)
$ bebo export orders.cbebomth --to=ndjson | \
duckdb -c "COPY (SELECT * FROM read_ndjson_auto('/dev/stdin')) TO 'orders.parquet' (FORMAT PARQUET)"
# CSV direct
$ bebo export orders.cbebomth --to=csv --out orders.csv bebo merge <files...> --out <path>
Concatenate compatible archives into one NDJSON stream. Writing back to .cbebomth isn't supported from the CLI — re-ingest via the archiver instead.
Bundle helpers
bebo list <bundle>
List the labels inside a .cbeboqtr (monthly labels) or .cbeboyr (quarterly labels).
bebo extract <bundle> --label <label>
Pull one inner file out of a bundle. Use --out path to write a file.
$ bebo extract 2026-Q1_v1.cbeboqtr --label 2026-02 --out feb.cbebomth
extracted 2026-02 → feb.cbebomth (1147 bytes)
$ bebo count feb.cbebomth
32 Exit codes
0 | success |
1 | usage / argument error |
2 | decode or integrity failure |
3 | file not found / I/O error |
Troubleshooting
zstd: Unknown frame descriptor- You're feeding a non-BEBO file, or the file was compressed with a trained dictionary you don't have locally. Pass
--dict path/to/DICT.binif your tenant uses one. CRC32 verification failed- The file was modified after it was written. Check your S3 sync didn't truncate it. If it's corrupt, fetch a fresh copy from the bucket (versioning retains old versions).
decode panic- Most likely an archive written before we shipped the array/JSONB/nullable-TS codec fix (pre-2026-04-18). Re-archive the source data — the new codec handles those column types correctly.