mm Python API¶

mm.Context is the main entry point for building a multimodal prompt incrementally, then handing the whole thing to a VLM. This doc covers the public Python surface; under the hood everything runs through the Rust _mm.PyContext core so memory is compact and insert/lookup/render is sub-millisecond at 10K items.

Looking for the directory-scan surface (Context("~/data") + to_polars/sql/show)? That mode is preserved unchanged — see USER_GUIDE.md. This doc is about the new incremental role-aware mode.

TL;DR¶

import mm
from pathlib import Path
from PIL import Image

ctx = mm.Context(session_id=mm.uuid7())          # or omit; auto-mints a UUIDv7

sys:  mm.Ref = ctx.add("You are a terse visual analyst.", role="system")
txt:  mm.Ref = ctx.add("Summarize these assets.", role="user")
img:  mm.Ref = ctx.add(Path("photo.jpg"), role="user")
img2: mm.Ref = ctx.add(Image.open("x.png"), role="user",
                       metadata={"note": "product hero shot"})
doc:  mm.Ref = ctx.add(Path("paper.pdf"), role="user",
                       metadata={"summary": "Attention is all you need",
                                 "tags": ["nlp", "transformer"]})
vid:  mm.Ref = ctx.add(Path("clip.mp4"), role="user",
                       metadata={"scene": 3, "actor": "A"})

from openai.types.chat import ChatCompletionMessageParam
from google.genai import types as genai_types

messages_openai: list[ChatCompletionMessageParam] = ctx.to_messages(format="openai")
messages_gemini: list[genai_types.ContentDict]    = ctx.to_messages(format="gemini")

obj = ctx.get(img)                               # str | Path | PIL.Image.Image
row = mm.Context.get(f"{ctx.session_id}/{img}")  # cross-session DB lookup

ctx.print_tree()                                 # T4 tree with metadata
print(ctx.to_md(mode="metadata"))                # markdown table w/ cat content
print(repr(ctx))                                 # markdown __repr__

Core types¶

`mm.Ref`¶

A typed alias for ref id strings. Runtime is just str; IDEs and mypy see a distinct type thanks to typing.Annotated.

from typing import Annotated
Ref = Annotated[str, "mm.Ref"]   # e.g. "img_a1b2c3"

`mm.uuid7() -> str`¶

Canonical UUIDv7 (time-ordered) in the hyphenated form xxxxxxxx-xxxx-7xxx-Nxxx-xxxxxxxxxxxx. Python 3.12's stdlib uuid doesn't ship uuid7, so mm provides its own — implemented in Rust (see crates/mm-core/src/refs.rs). Preferred default for new session ids because two uuids compared lexicographically sort in creation order.

`mm.RefNotFoundError`¶

KeyError subclass raised by ctx.get(ref) on miss. The message is an agent-friendly markdown table: closest-match suggestion (Levenshtein distance ≤ 4 within the same kind) followed by the full context's ref listing.

`mm.Context`¶

Context(
    root: str | Path | None = None,
    *,
    session_id: str | None = None,
    # ...directory-scan-only kwargs (n_threads, no_ignore, …) elided
)

Incremental mode (the one this doc covers): pass no root. A fresh session_id is minted via mm.uuid7() when omitted.
Directory-scan mode: pass a root path to get the legacy Arrow-backed scan surface. Both modes share session_id + refs.

`ctx.add(obj, *, role="user", metadata=None) -> mm.Ref`¶

Attach an item. Accepted types:

Input	Stored as	`get()` returns
`str`	`ItemSource::InMemory`	the same `str`
`pathlib.Path`	`ItemSource::Path`	new `pathlib.Path`
`PIL.Image.Image`	`ItemSource::InMemory`	the exact object

Strings are always treated as free-form text and inlined into to_messages(). Path-like strings and URL-looking strings are not resolved or fetched; use Path("file.ext") for on-disk files.

role is one of "system", "developer", or "user". Strings can use any role. Path and PIL.Image.Image currently require role="user" because multimodal system/developer messages are not portable across providers.

metadata is a single optional JSON-serialisable dict holding any extra context you want to ride along with the item. Common keys by convention:

note — short human-readable note.
summary — longer summary / caption. Used as the "pre-extracted" content fallback in to_md(mode="metadata").
tags — free-form list of strings.
…plus anything else your pipeline needs ({"scene": 3, "actor": "A"}).

The dict is emitted as a leading text block per item in to_messages so VLMs see it inline, and is also surfaced in __repr__, to_md, and print_tree.

Returns the generated ref id (<prefix>_<6 hex>), typed as mm.Ref.

Example¶

text = ctx.add("Compare the image and document below.", role="user")
img = ctx.add(Path("photo.jpg"), role="user", metadata={"note": "hero shot"})
doc = ctx.add(Path("paper.pdf"), role="user",
              metadata={"summary": "Attention is all you need",
                        "tags": ["nlp", "transformer"]})
vid = ctx.add(Path("clip.mp4"), role="user", metadata={"scene": 3})

`ctx.get(ref) -> str | Path | PIL.Image.Image`¶

Local lookup by ref. Accepts a bare ref ("img_a1b2c3") or a global ref ("<session_id>/<ref_id>"); the session segment must match this context's session_id.

Free-form text returns the same str.
Path-backed items return a freshly-constructed pathlib.Path.
In-memory PIL images return the exact Python object that was added (no copy, no rehydrate — identity is preserved).

Raises RefNotFoundError on miss. The error message prints the full ref table + a "did you mean" suggestion:

RefNotFoundError: ref 'img_a1b2cZ' not found in session 019da4…. Did you mean: img_a1b2c3?

Available refs:
Context(session=019da4…, items=3)

| ref        | role | kind  | source                |
|------------|------|-------|-----------------------|
| img_a1b2c3 | user | image | /abs/path/photo.jpg   |
| doc_d4e5f6 | user | doc   | /abs/path/paper.pdf   |
| vid_7890ab | user | video | /abs/path/clip.mp4    |

`ctx.remove(ref) -> None`¶

Remove an item by bare ref ("img_a1b2c3") or matching global ref ("<session_id>/<ref_id>"). Raises RefNotFoundError on miss and ValueError when the global ref belongs to a different session.

ref = ctx.add(Path("photo.jpg"))
ctx.remove(ref)

`Context.get(global_ref, *, session_id=None, db=None)` (classmethod)¶

Cross-session resolver. Parses a "<session>/<ref>" global ref (or accepts a bare ref + session_id=...) and returns the files row dict from the global ~/.local/share/mm/mm.db, or None on miss.

Use this when you have a ref from a persisted context and no live Context instance. Replaces the (still-supported) legacy Context.resolve().

`ctx.to_messages(format="openai", *, encoders=None) -> list[dict]`¶

Encode every item into a role-aware message list, ready to drop into the respective SDK call. The returned shape is a plain Python list of dicts, typed to match the target SDK:

from openai.types.chat import ChatCompletionMessageParam
from google.genai import types as genai_types

messages_openai: list[ChatCompletionMessageParam] = ctx.to_messages(format="openai")
messages_gemini: list[genai_types.ContentDict]    = ctx.to_messages(format="gemini")

format="openai" → one message per consecutive role run, e.g. [{"role": "system", ...}, {"role": "developer", ...}, {"role": "user", ...}].
format="gemini" → [{"role": "user", "parts": [{"inline_data": …}, {"text": …}]}] — non-user roles are folded into labelled text parts because Gemini role semantics differ.

Per-kind encoder overrides:

messages: list[ChatCompletionMessageParam] = ctx.to_messages(
    format="openai",
    encoders={"image": "tile", "video": "mosaic"},
)

Unspecified kinds use sensible defaults (image-resize, video-frames, document-rasterize). Encoder names come from the mm.encoders registry — see --list-encoders.

User metadata is emitted as a leading text part per item ([ref=<id>] note: <text>), so VLMs see your context inline.

`ctx.to_md(mode="metadata") -> str`¶

Markdown table with one row per ref: ref | role | kind | source | content. mode="metadata" populates each row with the metadata-tier content (files.text_preview — produced by extract_meta; no LLM call) for non-text kinds, and raw text for code/text files. (Mirrors what the CLI's mm peek surfaces locally for binary kinds — same source data.)

mode="fast" and mode="accurate" are reserved for the LLM-backed pipelines and currently raise NotImplementedError.

print(ctx.to_md())
# | ref        | role | kind  | source              | content                              |
# |------------|------|-------|---------------------|--------------------------------------|
# | img_a1b2c3 | user | image | /abs/path/photo.jpg | 3024×4032, jpeg, EXIF: Canon EOS…    |
# | doc_d4e5f6 | user | doc   | /abs/path/paper.pdf | # Title…\n## Abstract…               |

`ctx.print_tree(layout="insertion") -> None`¶

Print a rich.Tree rendering of the context. The default "insertion" layout (T4) shows items in insertion order with metadata on sub-branches — best for the "build a prompt incrementally" workflow where metadata is the whole point.

Context(session=019da4…, items=5)
├── [1] img_a1b2c3  user  image  /abs/path/photo.jpg
├── [2] img_9f0e12  user  image  PIL.Image(RGB, 1024×768)
│        └─ note: "product hero shot"
├── [3] doc_d4e5f6  user  document  /abs/path/paper.pdf
│        ├─ summary: "Attention is all you need"
│        └─ tags: [nlp, transformer]
├── [4] vid_7890ab  user  video  /abs/path/clip.mp4
│        └─ metadata: {"scene": 3, "actor": "A"}
└── [5] txt_111222  system  text  You are concise.

Other layouts are declared in the docstring so they're discoverable, but raise NotImplementedError for now:

"paths" — directory hierarchy with refs on the right. [TODO]
"kind" — grouped by kind (images, documents, videos, …). [TODO]
"flat" — ref-first flat list. [TODO; likely ships as print_table() instead of a tree]
"hybrid" — paths + per-item dim metadata line. [TODO]

`repr` → markdown¶

repr(ctx) returns a markdown summary: session_id, item count, and the ref | role | kind | source table. Works well in Jupyter / doc snippets and doubles as the body of RefNotFoundError.

`ctx.save()` (deferred)¶

Not implemented for role-aware contexts. Planned behaviour:

Write (session_id, ref_id, role, kind, uri, content_hash, metadata) to the files table in ~/.local/share/mm/mm.db.
For in-memory objects, spool to a content-addressed cache directory ~/.local/share/mm/blobs/<xxh3>.<ext> and record the blob URI.
Make Context.get("<session>/<ref>") resolve via the DB across processes.
Idempotent on repeat calls for the same (session_id, ref_id).

Directory-scan Context(root) retains its existing save() (writes the Arrow table to the global DB).

Performance architecture¶

The hot path is Rust. Python is a thin façade.

crates/mm-core/src/refs.rs owns RefId, Kind, Item, ItemSource, Context (Rust struct), make_ref_id, uuid7.
RefId = CompactString — the canonical <prefix>_<6 hex> shape fits inside the 24-byte inline SSO buffer, so refs never heap-allocate on the hot path.
Item = { ref_id, kind, source, metadata: Option<Box<MetaMap>> }. Items without user metadata pay only one pointer's worth of memory and zero allocations.
by_ref: HashMap<RefId, u32> gives O(1) ref→index lookup.
crates/mm-python/src/refs.rs exposes PyContext and keeps in-memory Python objects alive in a parallel Vec<Option<Py<PyAny>>> indexed by item position. That's why ctx.get(ref) returns the exact object the caller passed to ctx.add — no copy, no rehydrate.
Rendering (__repr__, print_tree, to_md table assembly, RefNotFoundError message, "did you mean" Levenshtein search) all happen in Rust, so Python only pays one FFI boundary crossing to get a ready-to-print string.

Memory budget¶

Per item, excluding the user's stored object:

path-backed: ~56 bytes;
in-memory text/PIL: ~64 bytes + one Py<PyAny> refcount bump.

A 10K-item context without metadata fits in < 1 MB on the Rust side.

Benchmarks¶

Two complementary suites keep the hot path honest:

Rust / Criterion — `crates/mm-core/benches/refs.rs`¶

Pure-Rust, no PyO3 boundary. Targets the mm_core::refs::Context primitives.

cargo bench -p mm-core --bench refs
# Or target a group:
cargo bench -p mm-core --bench refs -- refs/add_path
cargo bench -p mm-core --bench refs -- refs/get
cargo bench -p mm-core --bench refs -- refs/render
cargo bench -p mm-core --bench refs -- refs/ref_not_found
cargo bench -p mm-core --bench refs -- refs/mixed

Coverage:

Group	Scales	What it measures
`refs/make_ref_id/{kind}`	per-kind	ID generation (`OsRng` + base36 encode)
`refs/uuid7`	—	`mm.uuid7()` generation latency
`refs/add_path`	100 / 1K / 10K / 100K	Path-backed `add` throughput
`refs/add_inmem`	1K / 10K	In-memory (PIL) `add` throughput
`refs/add_with_metadata`	1K / 10K	Same, with `note`+`summary`+`tags` populated
`refs/get_hit`	1K / 10K / 100K	`by_ref: HashMap` lookup (realistic hit)
`refs/get_miss`	1K / 10K	Miss — short-circuits before suggestion
`refs/render_tree_insertion`	100 / 1K / 10K	Rust tree rendering (excludes Rich)
`refs/render_tree_insertion_with_meta`	1K	Same, with 3 metadata branches per item
`refs/repr_markdown`	100 / 1K / 10K	`repr(ctx)` table generation
`refs/to_md_with_contents`	1K	`to_md()` rendering given pre-extracted text
`refs/ref_not_found_message`	100 / 1K / 10K	Full `RefNotFoundError` body (typo shape)
`refs/closest_ref_10k`	10K	Levenshtein-across-all-prefix-matching-refs
`refs/mixed_add_get_render`	1K / 10K	Agent-loop shape (add→get→repr→tree)

Python / pytest-benchmark — `tests/python/test_refs_api_perf.py`¶

Full PyO3 round-trip (Python → Rust → Python). Marked pytest.mark.slow so the default make test-python stays fast; run via make test-python-full or:

pytest tests/python/test_refs_api_perf.py -m slow
pytest tests/python/test_refs_api_perf.py -m slow --benchmark-only
pytest tests/python/test_refs_api_perf.py -m slow --benchmark-disable  # budgets only

Two classes of tests live here:

TestBench* — pytest-benchmark micro-benches for add (path / PIL / + metadata), get (hit / miss), print_tree, repr, to_md, to_messages (openai + gemini), uuid7, and RefNotFoundError construction.
Latency-budget regression guards (test_*_under_budget) that fail fast if a change pushes the Python-bound path past its budget. Each budget is overridable via an env var (see docstring at the top of the file).

Indicative numbers (Apple M-series, release build)¶

Operation	Median	Throughput
`ctx.get(ref)` hit, 10K-item context	~800ns	~1.2 M ops/s
`mm.uuid7()`	~1.3 µs	~770 K ops/s
`new_session_id()`	~1.3 µs	~770 K ops/s
`ctx.add(Path)`, amortised	~32 µs	~31 K adds/s
`ctx.add(PIL.Image)`, amortised	~7 µs	~140 K adds/s
`repr(ctx)` @ 10K items	~3 ms	—
`ctx.print_tree()` @ 10K items	~930 ms*	—
`RefNotFoundError` msg @ 10K items	~11 ms	—

* print_tree is dominated by Rich's ANSI line printer — the Rust tree-string generation itself is ~5ms at 10K. Strip to raw output with print(ctx._pyctx.render_tree_insertion()) if you need the faster path.

The ctx.add(Path) amortised cost includes Path.resolve() + a stat to sniff the MIME; ctx.add(PIL.Image) skips those and lands inside the ~7µs PyO3-boundary budget dominated by the Py<PyAny> clone + one JSON metadata roundtrip.

Error types¶

mm.RefNotFoundError — KeyError subclass. Raised by ctx.get(ref) on miss; message is a markdown table + suggestion.
ValueError — malformed global ref, mismatched session id, or metadata= containing non-JSON-serialisable keys.
TypeError — add() received something other than str, Path, or PIL.Image.Image.
FileNotFoundError — add() received a Path that doesn't exist.
NotImplementedError — print_tree(layout="paths"|"kind"|…), to_md(mode="accurate"), or save() on an incremental context.

Recipe: OpenAI chat completion¶

import mm
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam
from pathlib import Path

ctx = mm.Context()
ctx.add("Summarise the attached context.", role="system")
ctx.add(Path("whiteboard.jpg"), role="user", metadata={"note": "meeting notes"})
ctx.add(Path("slides.pdf"), role="user", metadata={"summary": "Q3 plan"})

ctx_messages: list[ChatCompletionMessageParam] = ctx.to_messages(format="openai")

client = OpenAI()
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        *ctx_messages,
    ],
)
print(resp.choices[0].message.content)

Recipe: Gemini generate_content¶

import mm
import google.generativeai as genai
from google.genai import types as genai_types
from pathlib import Path

ctx = mm.Context()
ctx.add("Summarise this lecture.", role="user")
ctx.add(Path("clip.mp4"), role="user", metadata={"summary": "lecture on attention"})

contents: list[genai_types.ContentDict] = ctx.to_messages(format="gemini")

model = genai.GenerativeModel("gemini-2.0-pro")
resp = model.generate_content(contents=contents)
print(resp.text)

mm Python API¶

TL;DR¶

Core types¶

mm.Ref¶

mm.uuid7() -> str¶

mm.RefNotFoundError¶

mm.Context¶

ctx.add(obj, *, role="user", metadata=None) -> mm.Ref¶

Example¶

ctx.get(ref) -> str | Path | PIL.Image.Image¶

ctx.remove(ref) -> None¶

Context.get(global_ref, *, session_id=None, db=None) (classmethod)¶

ctx.to_messages(format="openai", *, encoders=None) -> list[dict]¶

ctx.to_md(mode="metadata") -> str¶

ctx.print_tree(layout="insertion") -> None¶

__repr__ → markdown¶

ctx.save() (deferred)¶