Skip to content

mm wc

Count files, bytes, estimated lines, and estimated tokens across a directory — like wc scaled up for LLM context budgeting.

Synopsis

mm wc [DIRECTORY] [OPTIONS]

DIRECTORY defaults to . (current directory).

Options

Flag Short Type Description
--kind KINDS -k string Filter by kind. Comma-separated. e.g. code,text
--by-kind flag Break down metrics by file kind
--format FORMAT -f enum Output format: rich, tsv, csv, json

Output metrics

Metric Description
files Total file count
size Total disk usage (formatted: KB / MB / GB)
lines (est.) Estimated line count for text/code/document files
tokens (est.) Estimated token count (characters ÷ 4)
tok_per_mb Token density: tokens per megabyte

When --by-kind is active (or automatically activated when multiple kinds are present), an additional per-kind breakdown table is shown with a totals row.

For image files, tok_per_img (tokens per image) is also computed per-kind.

Token estimation

Tokens are estimated using a character-to-token ratio of 4 characters per token — a standard approximation for English text with typical tokenizers.

  • Text / code files: full content read, character count ÷ 4
  • PDF documents: text extracted via pypdfium2, then character count ÷ 4
  • Binary files (image, video, audio): 0 lines, 0 tokens — binary content is not text

Examples

# summary panel for current directory
mm wc

# summary for a specific directory
mm wc ~/project

# code files only
mm wc ~/project --kind code

# images and video breakdown
mm wc ~/media --kind image,video --by-kind

# explicit breakdown by kind
mm wc ~/data --by-kind

# JSON output for scripting
mm wc ~/data --format json

# TSV for spreadsheet import
mm wc ~/data --by-kind --format tsv

Pipe support

mm wc reads file paths from stdin when piped, computing stats only for those files:

# count tokens in files found by find
mm find ~/project --kind code | mm wc

# count tokens in specific PDF files
mm find ~/data --ext .pdf | mm wc

# compare token counts before and after filtering
mm find ~/data | mm wc
mm find ~/data --kind code | mm wc

Auto-breakdown

When multiple file kinds are present in the scanned directory, --by-kind is enabled automatically — no flag needed. Pass --kind to filter to a single kind and suppress the per-kind table.

Notes

  • Document line counts are extracted from pypdfium2 text output, not page count.
  • tok_per_mb is omitted () when a kind has zero bytes on disk.
  • The Rust scanner handles all kinds except documents; PDF text extraction runs in Python via pypdfium2 and is overlaid onto the Rust results.