`mm.Context` API — Building and Inspecting VLM Context¶

This notebook focuses on the role-aware mm.Context API: adding items, inspecting refs, rendering context views, generating provider payloads, and removing items.

Use this alongside mm-context-viz.ipynb: the visualization notebook compares media encoders, while this notebook explains the context object itself.

Sections:

Setup — imports and sample data
Role-aware adds — system, developer, and user items
Inspection views — ctx, print(ctx), print_tree(), to_md(), items(), refs
Provider payloads — OpenAI role-preserving messages and Gemini adaptation
Ref lookup and removal — get(ref) and remove(ref)

Dataset: ~/data/mmbench-tiny.

1. Setup¶

Import mm, notebook display helpers, and point DATA at the tiny multimodal sample directory.

In [ ]:

Copied!





from pathlib import Path
from pprint import pprint

import mm
from mm.notebook import render_messages
from IPython.display import HTML, Markdown

DATA = Path.home() / "data/mmbench-tiny"
from pathlib import Path
from pprint import pprint

import mm
from mm.notebook import render_messages
from IPython.display import HTML, Markdown

DATA = Path.home() / "data/mmbench-tiny"

In [ ]:

Copied!

!ls -lha ~/data/mmbench-tiny/
!ls -lha ~/data/mmbench-tiny/

2. Role-Aware Adds¶

Context.add() is the primary prompt-building API. It returns an mm.Ref, which is a typed string handle you can store, pass around, inspect, use with get(ref), or remove later with remove(ref).

Each item has a role:

system — high-level behavior or policy instructions
developer — application or tool-specific instructions
user — the user request and attached media

Free-form strings can use any role. Media objects (Path, PIL.Image.Image) should use role="user".

In [ ]:

Copied!





ctx: mm.Context = mm.Context()

system_ref: mm.Ref = ctx.add(
    "You are a concise multimodal analyst.",
    role="system",
)
developer_ref: mm.Ref = ctx.add(
    "Return concrete observations first; mention uncertainty last.",
    role="developer",
)
prompt_ref: mm.Ref = ctx.add(
    "Compare the car photo and invoice. What facts can you extract?",
    role="user",
)
image_ref: mm.Ref = ctx.add(
    DATA / "car.jpg",
    role="user",
    metadata={"note": "VW beetle", "source": "sample dataset"},
)
document_ref: mm.Ref = ctx.add(
    DATA / "invoice.pdf",
    role="user",
    metadata={"type": "invoice", "vendor": "ACME Corp"},
)

ctx
ctx: mm.Context = mm.Context()

system_ref: mm.Ref = ctx.add(
    "You are a concise multimodal analyst.",
    role="system",
)
developer_ref: mm.Ref = ctx.add(
    "Return concrete observations first; mention uncertainty last.",
    role="developer",
)
prompt_ref: mm.Ref = ctx.add(
    "Compare the car photo and invoice. What facts can you extract?",
    role="user",
)
image_ref: mm.Ref = ctx.add(
    DATA / "car.jpg",
    role="user",
    metadata={"note": "VW beetle", "source": "sample dataset"},
)
document_ref: mm.Ref = ctx.add(
    DATA / "invoice.pdf",
    role="user",
    metadata={"type": "invoice", "vendor": "ACME Corp"},
)

ctx

3. Inspection Views¶

A Context is both a prompt builder and an inspection object. Use different views depending on what you want to understand:

API	Best for
`ctx` / `ctx.render_html()`	Rich notebook view with native media, metadata, roles, and encoded parts
`print(ctx)`	Compact markdown ref table for logs and debugging
`ctx.print_tree()`	Terminal-friendly tree with insertion order and metadata
`ctx.to_md()`	Markdown table with locally extracted content previews
`ctx.items()`, `ctx.ref_ids()`, `ctx.refs`	Programmatic inspection and ref plumbing
`ctx.to_messages()` + `render_messages()`	Inspecting the final provider payload sent to a VLM

Rich notebook view: `ctx`¶

Evaluating the context object in Jupyter renders native media, metadata badges, roles, refs, and collapsible encoded parts.

In [ ]:

Copied!

ctx
ctx

Compact ref table: `print(ctx)`¶

print(ctx) uses the markdown __repr__ from the Rust core. It is the fastest high-signal view when you just need session id, refs, roles, kinds, and sources.

In [ ]:

Copied!

print(ctx)
print(ctx)

Tree view: `ctx.print_tree()`¶

The tree view is useful in terminals and notebooks when insertion order and per-item metadata matter. It is especially readable for agent-built contexts where items are added step by step.

In [ ]:

Copied!

ctx.print_tree()
ctx.print_tree()

Markdown content table: `ctx.to_md()`¶

to_md() includes the same refs and roles, plus locally extracted metadata-tier content. It is good for quick text previews, copying into notes, or checking what mm can extract without an LLM call.

In [ ]:

Copied!

Markdown(ctx.to_md())
Markdown(ctx.to_md())

Raw programmatic views: `items()`, `ref_ids()`, `refs`¶

Use these when wiring refs into tools, storing handles externally, or debugging exactly what the Rust-backed context is holding.

In [ ]:

Copied!





print("ref_ids:")
pprint(ctx.ref_ids())

print("\nrefs:")
pprint(ctx.refs)

print("\nitems:")
pprint(ctx.items())
print("ref_ids:")
pprint(ctx.ref_ids())

print("\nrefs:")
pprint(ctx.refs)

print("\nitems:")
pprint(ctx.items())

4. Provider Payloads¶

to_messages() converts the context into model-provider payloads. This is the best way to inspect what a VLM will actually receive after refs and media encoders are resolved.

OpenAI messages¶

to_messages(format="openai") preserves system, developer, and user roles as separate chat messages. Rendering that payload makes it easy to inspect the exact content blocks a VLM will receive.

In [ ]:

Copied!

openai_messages = ctx.to_messages(format="openai")
pprint(openai_messages)

HTML(render_messages(openai_messages, title="OpenAI message payload"))
openai_messages = ctx.to_messages(format="openai")
pprint(openai_messages)

HTML(render_messages(openai_messages, title="OpenAI message payload"))

Gemini payload adaptation¶

Gemini does not use the same chat-role semantics for arbitrary system / developer turns, so mm folds those roles into labelled text parts and keeps one user payload.

In [ ]:

Copied!

gemini_messages = ctx.to_messages(format="gemini")
pprint(gemini_messages[:1])

HTML(render_messages(gemini_messages, title="Gemini-adapted payload"))
gemini_messages = ctx.to_messages(format="gemini")
pprint(gemini_messages[:1])

HTML(render_messages(gemini_messages, title="Gemini-adapted payload"))

5. Ref Lookup and Removal¶

add() returns an mm.Ref object. Use refs to retrieve original Python objects with get(ref) or remove items from the context with remove(ref).

In [ ]:

Copied!

print("prompt_ref:", prompt_ref)
print("prompt text:", ctx.get(prompt_ref))

print("image_ref:", image_ref)
print("image path:", ctx.get(image_ref))
print("prompt_ref:", prompt_ref)
print("prompt text:", ctx.get(prompt_ref))

print("image_ref:", image_ref)
print("image path:", ctx.get(image_ref))

In [ ]:

Copied!





scratch: mm.Context = mm.Context()
keep_ref: mm.Ref = scratch.add("Keep this item", role="user")
remove_me: mm.Ref = scratch.add("Remove this item", role="user")

print("Before remove:")
print(scratch)

scratch.remove(remove_me)

print("After remove:")
print(scratch)
scratch: mm.Context = mm.Context()
keep_ref: mm.Ref = scratch.add("Keep this item", role="user")
remove_me: mm.Ref = scratch.add("Remove this item", role="user")

print("Before remove:")
print(scratch)

scratch.remove(remove_me)

print("After remove:")
print(scratch)

mm.Context API — Building and Inspecting VLM Context¶

1. Setup¶

2. Role-Aware Adds¶

3. Inspection Views¶

Rich notebook view: ctx¶

Compact ref table: print(ctx)¶

Tree view: ctx.print_tree()¶

Markdown content table: ctx.to_md()¶

Raw programmatic views: items(), ref_ids(), refs¶

4. Provider Payloads¶

OpenAI messages¶

Gemini payload adaptation¶

5. Ref Lookup and Removal¶

`mm.Context` API — Building and Inspecting VLM Context¶

Rich notebook view: `ctx`¶

Compact ref table: `print(ctx)`¶

Tree view: `ctx.print_tree()`¶

Markdown content table: `ctx.to_md()`¶

Raw programmatic views: `items()`, `ref_ids()`, `refs`¶