mm.Context API — Building and Inspecting VLM Context¶
This notebook focuses on the role-aware mm.Context API: adding items, inspecting refs, rendering context views, generating provider payloads, and removing items.
Use this alongside mm-context-viz.ipynb: the visualization notebook compares media encoders, while this notebook explains the context object itself.
Sections:
- Setup — imports and sample data
- Role-aware adds —
system,developer, anduseritems - Inspection views —
ctx,print(ctx),print_tree(),to_md(),items(),refs - Provider payloads — OpenAI role-preserving messages and Gemini adaptation
- Ref lookup and removal —
get(ref)andremove(ref)
Dataset: ~/data/mmbench-tiny.
1. Setup¶
Import mm, notebook display helpers, and point DATA at the tiny multimodal sample directory.
from pathlib import Path
from pprint import pprint
import mm
from mm.notebook import render_messages
from IPython.display import HTML, Markdown
DATA = Path.home() / "data/mmbench-tiny"
!ls -lha ~/data/mmbench-tiny/
2. Role-Aware Adds¶
Context.add() is the primary prompt-building API. It returns an mm.Ref, which is a typed string handle you can store, pass around, inspect, use with get(ref), or remove later with remove(ref).
Each item has a role:
system— high-level behavior or policy instructionsdeveloper— application or tool-specific instructionsuser— the user request and attached media
Free-form strings can use any role. Media objects (Path, PIL.Image.Image) should use role="user".
ctx: mm.Context = mm.Context()
system_ref: mm.Ref = ctx.add(
"You are a concise multimodal analyst.",
role="system",
)
developer_ref: mm.Ref = ctx.add(
"Return concrete observations first; mention uncertainty last.",
role="developer",
)
prompt_ref: mm.Ref = ctx.add(
"Compare the car photo and invoice. What facts can you extract?",
role="user",
)
image_ref: mm.Ref = ctx.add(
DATA / "car.jpg",
role="user",
metadata={"note": "VW beetle", "source": "sample dataset"},
)
document_ref: mm.Ref = ctx.add(
DATA / "invoice.pdf",
role="user",
metadata={"type": "invoice", "vendor": "ACME Corp"},
)
ctx
3. Inspection Views¶
A Context is both a prompt builder and an inspection object. Use different views depending on what you want to understand:
| API | Best for |
|---|---|
ctx / ctx.render_html() |
Rich notebook view with native media, metadata, roles, and encoded parts |
print(ctx) |
Compact markdown ref table for logs and debugging |
ctx.print_tree() |
Terminal-friendly tree with insertion order and metadata |
ctx.to_md() |
Markdown table with locally extracted content previews |
ctx.items(), ctx.ref_ids(), ctx.refs |
Programmatic inspection and ref plumbing |
ctx.to_messages() + render_messages() |
Inspecting the final provider payload sent to a VLM |
Rich notebook view: ctx¶
Evaluating the context object in Jupyter renders native media, metadata badges, roles, refs, and collapsible encoded parts.
ctx
Compact ref table: print(ctx)¶
print(ctx) uses the markdown __repr__ from the Rust core. It is the fastest high-signal view when you just need session id, refs, roles, kinds, and sources.
print(ctx)
Tree view: ctx.print_tree()¶
The tree view is useful in terminals and notebooks when insertion order and per-item metadata matter. It is especially readable for agent-built contexts where items are added step by step.
ctx.print_tree()
Markdown content table: ctx.to_md()¶
to_md() includes the same refs and roles, plus locally extracted metadata-tier content. It is good for quick text previews, copying into notes, or checking what mm can extract without an LLM call.
Markdown(ctx.to_md())
Raw programmatic views: items(), ref_ids(), refs¶
Use these when wiring refs into tools, storing handles externally, or debugging exactly what the Rust-backed context is holding.
print("ref_ids:")
pprint(ctx.ref_ids())
print("\nrefs:")
pprint(ctx.refs)
print("\nitems:")
pprint(ctx.items())
4. Provider Payloads¶
to_messages() converts the context into model-provider payloads. This is the best way to inspect what a VLM will actually receive after refs and media encoders are resolved.
OpenAI messages¶
to_messages(format="openai") preserves system, developer, and user roles as separate chat messages. Rendering that payload makes it easy to inspect the exact content blocks a VLM will receive.
openai_messages = ctx.to_messages(format="openai")
pprint(openai_messages)
HTML(render_messages(openai_messages, title="OpenAI message payload"))
Gemini payload adaptation¶
Gemini does not use the same chat-role semantics for arbitrary system / developer turns, so mm folds those roles into labelled text parts and keeps one user payload.
gemini_messages = ctx.to_messages(format="gemini")
pprint(gemini_messages[:1])
HTML(render_messages(gemini_messages, title="Gemini-adapted payload"))
5. Ref Lookup and Removal¶
add() returns an mm.Ref object. Use refs to retrieve original Python objects with get(ref) or remove items from the context with remove(ref).
print("prompt_ref:", prompt_ref)
print("prompt text:", ctx.get(prompt_ref))
print("image_ref:", image_ref)
print("image path:", ctx.get(image_ref))
scratch: mm.Context = mm.Context()
keep_ref: mm.Ref = scratch.add("Keep this item", role="user")
remove_me: mm.Ref = scratch.add("Remove this item", role="user")
print("Before remove:")
print(scratch)
scratch.remove(remove_me)
print("After remove:")
print(scratch)