mm wc¶
Count files, bytes, estimated lines, and estimated tokens across a directory — like wc scaled up for LLM context budgeting.
Synopsis¶
DIRECTORY defaults to . (current directory).
Options¶
| Flag | Short | Type | Description |
|---|---|---|---|
--kind KINDS |
-k |
string | Filter by kind. Comma-separated. e.g. code,text |
--by-kind |
flag | Break down metrics by file kind | |
--format FORMAT |
-f |
enum | Output format: rich, tsv, csv, json |
Output metrics¶
| Metric | Description |
|---|---|
files |
Total file count |
size |
Total disk usage (formatted: KB / MB / GB) |
lines (est.) |
Estimated line count for text/code/document files |
tokens (est.) |
Estimated token count (characters ÷ 4) |
tok_per_mb |
Token density: tokens per megabyte |
When --by-kind is active (or automatically activated when multiple kinds are present), an additional per-kind breakdown table is shown with a totals row.
For image files, tok_per_img (tokens per image) is also computed per-kind.
Token estimation¶
Tokens are estimated using a character-to-token ratio of 4 characters per token — a standard approximation for English text with typical tokenizers.
- Text / code files: full content read, character count ÷ 4
- PDF documents: text extracted via pypdfium2, then character count ÷ 4
- Binary files (image, video, audio): 0 lines, 0 tokens — binary content is not text
Examples¶
# summary panel for current directory
mm wc
# summary for a specific directory
mm wc ~/project
# code files only
mm wc ~/project --kind code
# images and video breakdown
mm wc ~/media --kind image,video --by-kind
# explicit breakdown by kind
mm wc ~/data --by-kind
# JSON output for scripting
mm wc ~/data --format json
# TSV for spreadsheet import
mm wc ~/data --by-kind --format tsv
Pipe support¶
mm wc reads file paths from stdin when piped, computing stats only for those files:
# count tokens in files found by find
mm find ~/project --kind code | mm wc
# count tokens in specific PDF files
mm find ~/data --ext .pdf | mm wc
# compare token counts before and after filtering
mm find ~/data | mm wc
mm find ~/data --kind code | mm wc
Auto-breakdown¶
When multiple file kinds are present in the scanned directory, --by-kind is enabled automatically — no flag needed. Pass --kind to filter to a single kind and suppress the per-kind table.
Notes¶
- Document line counts are extracted from pypdfium2 text output, not page count.
tok_per_mbis omitted (—) when a kind has zero bytes on disk.- The Rust scanner handles all kinds except documents; PDF text extraction runs in Python via pypdfium2 and is overlaid onto the Rust results.