Using `mm` with Ollama and Local Models¶

A walkthrough of mm — fast, multimodal context for agents focused on getting text-based context out of images, videos, and audio so you can pipe them into any LLM workflow.

This notebook covers:

Setting up a local VLM (Gemma 4 via Ollama) and confirming it works end-to-end
Installing mm and pointing it at the local VLM
The core commands: mm find, mm wc, mm cat, mm grep
A few ways to compose Gemma 4 with mm-style prompting for your own pipelines

Dataset: we've provided a public dataset comprising a mixture of multimodal files - image, video and pdf. You can swap in your own files - any document will work and the commands would all still run.

Make sure the runtime is set to GPU (T4) — Runtime → Change runtime type.

0. GPU check¶

mm cat -m accurate sends images to a local VLM. On CPU that round-trip takes minutes per image instead of seconds, so verify the runtime has a GPU before going further.

If no GPU is detected: Runtime → Change runtime type → T4 GPU, then re-run from the top.

In [ ]:

Copied!





import subprocess

try:
    gpu = subprocess.run(["nvidia-smi"], capture_output=True, text=True)
    if gpu.returncode == 0:
        print("✅ GPU detected — Ollama will use it automatically.\n")
        print(gpu.stdout)
    else:
        print("⚠️  No GPU detected. You're on CPU — accurate-mode VLM calls will be SLOW.")
        print("   Fix: Runtime → Change runtime type → T4 GPU, then re-run this notebook.")
except Exception as e:
    print(f"Error occured: {e}")
import subprocess

try:
    gpu = subprocess.run(["nvidia-smi"], capture_output=True, text=True)
    if gpu.returncode == 0:
        print("✅ GPU detected — Ollama will use it automatically.\n")
        print(gpu.stdout)
    else:
        print("⚠️  No GPU detected. You're on CPU — accurate-mode VLM calls will be SLOW.")
        print("   Fix: Runtime → Change runtime type → T4 GPU, then re-run this notebook.")
except Exception as e:
    print(f"Error occured: {e}")

1. Point at your files and pick a model¶

Set the paths to the image and video you just uploaded, and pick a Gemma 4 tag to self-host.

Gemma 4 (released April 2, 2026) is Google DeepMind's latest open multimodal family, built from the same research as Gemini 3, with native text + image input and variable aspect ratio / resolution support.

Tag	Size on disk	Context	Fits on T4 (15 GB)?
`gemma4:e2b`	7.2 GB	128K	✅ yes (lightest)
`gemma4:e4b` (alias `gemma4:latest`)	9.6 GB	128K	✅ yes (default)
`gemma4:26b` (MoE, 4B active)	18 GB	256K	❌ no
`gemma4:31b` (dense)	20 GB	256K	❌ no

On a Colab T4, gemma4:e4b is the sweet spot — best quality that still fits. Drop to gemma4:e2b if you hit OOM.

In [ ]:

Copied!





import os
import tarfile
import urllib.request
from pathlib import Path

# The exact tag string Ollama uses (must match the NAME column from `ollama list`).
# MODEL = "gemma4:e4b"
MODEL = "gemma4:e2b"  # lighter fallback

# Preload sample files (image, video, audio, PDF) from the mmbench-tiny bundle.
# `mm` resizes images/videos internally using sensible defaults — no pre-processing
# needed. Override with `--encode.strategy_opts max_width=<val>` when you want more detail.
DATA_URL = "https://storage.googleapis.com/vlm-data-public-prod/mmbench/mmbench-tiny.tar.gz"
DATA_DIR = Path("~/.mm/notebooks/data").expanduser()
MMBENCH_DIR = DATA_DIR / "mmbench-tiny"

if MMBENCH_DIR.is_dir():
    print(f"Data already present at {MMBENCH_DIR}")
else:
    DATA_DIR.mkdir(parents=True, exist_ok=True)
    tar_path = DATA_DIR / "mmbench-tiny.tar.gz"
    print(f"Fetching {DATA_URL}")
    urllib.request.urlretrieve(DATA_URL, tar_path)
    with tarfile.open(tar_path) as tf:
        # Skip macOS AppleDouble resource-fork shadows (`._*`) shipped in the tarball —
        # they are tiny metadata stubs that break mm grep's indexer.
        members = [m for m in tf.getmembers() if not Path(m.name).name.startswith("._")]
        tf.extractall(DATA_DIR, members=members)
    tar_path.unlink()
    print(f"Extracted to {MMBENCH_DIR}")

os.chdir(DATA_DIR)

IMAGE_PATH = str(MMBENCH_DIR / "1-vqa-car.jpg")
VIDEO_PATH = str(MMBENCH_DIR / "bakery.mp4")
AUDIO_PATH = str(MMBENCH_DIR / "how_to_build_an_mvp.mp3")
PDF_PATH = str(MMBENCH_DIR / "BillDownload-8pg.pdf")
import os
import tarfile
import urllib.request
from pathlib import Path

# The exact tag string Ollama uses (must match the NAME column from `ollama list`).
# MODEL = "gemma4:e4b"
MODEL = "gemma4:e2b"  # lighter fallback

# Preload sample files (image, video, audio, PDF) from the mmbench-tiny bundle.
# `mm` resizes images/videos internally using sensible defaults — no pre-processing
# needed. Override with `--encode.strategy_opts max_width=<val>` when you want more detail.
DATA_URL = "https://storage.googleapis.com/vlm-data-public-prod/mmbench/mmbench-tiny.tar.gz"
DATA_DIR = Path("~/.mm/notebooks/data").expanduser()
MMBENCH_DIR = DATA_DIR / "mmbench-tiny"

if MMBENCH_DIR.is_dir():
    print(f"Data already present at {MMBENCH_DIR}")
else:
    DATA_DIR.mkdir(parents=True, exist_ok=True)
    tar_path = DATA_DIR / "mmbench-tiny.tar.gz"
    print(f"Fetching {DATA_URL}")
    urllib.request.urlretrieve(DATA_URL, tar_path)
    with tarfile.open(tar_path) as tf:
        # Skip macOS AppleDouble resource-fork shadows (`._*`) shipped in the tarball —
        # they are tiny metadata stubs that break mm grep's indexer.
        members = [m for m in tf.getmembers() if not Path(m.name).name.startswith("._")]
        tf.extractall(DATA_DIR, members=members)
    tar_path.unlink()
    print(f"Extracted to {MMBENCH_DIR}")

os.chdir(DATA_DIR)

IMAGE_PATH = str(MMBENCH_DIR / "1-vqa-car.jpg")
VIDEO_PATH = str(MMBENCH_DIR / "bakery.mp4")
AUDIO_PATH = str(MMBENCH_DIR / "how_to_build_an_mvp.mp3")
PDF_PATH = str(MMBENCH_DIR / "BillDownload-8pg.pdf")

2. Preview the image and video¶

In [ ]:

Copied!

from IPython.display import Image, display

display(Image(IMAGE_PATH))
from IPython.display import Image, display

display(Image(IMAGE_PATH))

In [ ]:

Copied!

from IPython.display import Video, display

display(Video(VIDEO_PATH, embed=True))
from IPython.display import Video, display

display(Video(VIDEO_PATH, embed=True))

3. Spin up Ollama and pull Gemma 4¶

mm's accurate-mode operations need a VLM on a live server. We'll self-host Gemma 4 with Ollama.

Ollama v0.20.0+ is required for Gemma 4 (landed April 3, 2026)
zstd is needed to extract Ollama's tarball; pciutils silences the GPU-detection warning from the installer
Colab has no systemd, so we start ollama serve with nohup so it keeps running across cells

In [ ]:

Copied!





# ─── Install Ollama + start server (idempotent; safe to re-run) ─────────────
!dpkg -s zstd pciutils >/dev/null 2>&1 || apt-get install -y zstd pciutils
!which ollama >/dev/null 2>&1 || curl -fsSL https://ollama.com/install.sh | sh
!pgrep -x ollama >/dev/null 2>&1 || (nohup ollama serve > /tmp/ollama.log 2>&1 &)
!sleep 3
!ollama --version
# ─── Install Ollama + start server (idempotent; safe to re-run) ─────────────
!dpkg -s zstd pciutils >/dev/null 2>&1 || apt-get install -y zstd pciutils
!which ollama >/dev/null 2>&1 || curl -fsSL https://ollama.com/install.sh | sh
!pgrep -x ollama >/dev/null 2>&1 || (nohup ollama serve > /tmp/ollama.log 2>&1 &)
!sleep 3
!ollama --version

In [ ]:

Copied!

# Pull Gemma 4 (~7.2 GB the first time; cached afterwards)
!ollama pull {MODEL}
!ollama list  # confirm the NAME column matches MODEL exactly
# Pull Gemma 4 (~7.2 GB the first time; cached afterwards)
!ollama pull {MODEL}
!ollama list  # confirm the NAME column matches MODEL exactly

4. Sanity check: Gemma 4 out of the box¶

Before plugging mm in, let's confirm Gemma 4 actually works on our image — just a plain VLM call against the Ollama server, image in, caption out. This is the simplest possible sanity check: if this works, mm's accurate mode will work too, because mm talks to the same endpoint.

In [ ]:

Copied!





# ─── Gemma 4 VQA → image + caption ──────────────────────────────────────────
# Ask Gemma an open-ended question about the image. Render the image on the
# left and the answer as a caption on the right.
import base64
import io
import requests
from IPython.display import HTML, display as ipy_display
from PIL import Image as _PILImage

OLLAMA_URL = "http://localhost:11434"

QUESTION = (
    "Describe this image in 2-4 sentences. What's the setup, what's in it, "
    "and what does it look like it was made for?"
)

_img = _PILImage.open(IMAGE_PATH).convert("RGB")
buf = io.BytesIO()
_img.save(buf, format="JPEG", quality=90)
img_b64 = base64.b64encode(buf.getvalue()).decode()

resp = requests.post(
    f"{OLLAMA_URL}/api/generate",
    json={
        "model": MODEL,
        "prompt": QUESTION,
        "images": [img_b64],
        "stream": False,
        "options": {"temperature": 0.2, "num_predict": 512},
    },
    timeout=180,
).json()
answer = resp["response"].strip()

# ─── Render: image left, caption right ──────────────────────────────────────
img_src = f"data:image/jpeg;base64,{img_b64}"

html = f"""
<div style='display:flex; flex-wrap:wrap; gap:16px; align-items:flex-start'>
  <div style='flex:2 1 520px; min-width:360px; text-align:center'>
    <img src='{img_src}' style='max-width:100%; border-radius:6px'>
    <div style='font-size:12px; color:#666; margin-top:6px'>Input image</div>
  </div>
  <div style='flex:1 1 320px; min-width:280px; max-width:480px;
              background:#f7f7f9; border-radius:8px; padding:14px 18px;
              font-family:-apple-system,BlinkMacSystemFont,sans-serif; font-size:14px;
              line-height:1.5; color:#2e3138'>
    <div style='font-size:11px; letter-spacing:0.06em; text-transform:uppercase;
                color:#888; margin-bottom:6px'>Question</div>
    <div style='margin-bottom:12px; font-style:italic'>{QUESTION}</div>
    <div style='font-size:11px; letter-spacing:0.06em; text-transform:uppercase;
                color:#888; margin-bottom:6px'>Gemma 4 ({MODEL})</div>
    <div>{answer}</div>
  </div>
</div>
"""
ipy_display(HTML(html))
# ─── Gemma 4 VQA → image + caption ──────────────────────────────────────────
# Ask Gemma an open-ended question about the image. Render the image on the
# left and the answer as a caption on the right.
import base64
import io
import requests
from IPython.display import HTML, display as ipy_display
from PIL import Image as _PILImage

OLLAMA_URL = "http://localhost:11434"

QUESTION = (
    "Describe this image in 2-4 sentences. What's the setup, what's in it, "
    "and what does it look like it was made for?"
)

_img = _PILImage.open(IMAGE_PATH).convert("RGB")
buf = io.BytesIO()
_img.save(buf, format="JPEG", quality=90)
img_b64 = base64.b64encode(buf.getvalue()).decode()

resp = requests.post(
    f"{OLLAMA_URL}/api/generate",
    json={
        "model": MODEL,
        "prompt": QUESTION,
        "images": [img_b64],
        "stream": False,
        "options": {"temperature": 0.2, "num_predict": 512},
    },
    timeout=180,
).json()
answer = resp["response"].strip()

# ─── Render: image left, caption right ──────────────────────────────────────
img_src = f"data:image/jpeg;base64,{img_b64}"

html = f"""
<div style='display:flex; flex-wrap:wrap; gap:16px; align-items:flex-start'>
  <div style='flex:2 1 520px; min-width:360px; text-align:center'>
    <img src='{img_src}' style='max-width:100%; border-radius:6px'>
    <div style='font-size:12px; color:#666; margin-top:6px'>Input image</div>
  </div>
  <div style='flex:1 1 320px; min-width:280px; max-width:480px;
              background:#f7f7f9; border-radius:8px; padding:14px 18px;
              font-family:-apple-system,BlinkMacSystemFont,sans-serif; font-size:14px;
              line-height:1.5; color:#2e3138'>
    <div style='font-size:11px; letter-spacing:0.06em; text-transform:uppercase;
                color:#888; margin-bottom:6px'>Question</div>
    <div style='margin-bottom:12px; font-style:italic'>{QUESTION}</div>
    <div style='font-size:11px; letter-spacing:0.06em; text-transform:uppercase;
                color:#888; margin-bottom:6px'>Gemma 4 ({MODEL})</div>
    <div>{answer}</div>
  </div>
</div>
"""
ipy_display(HTML(html))

5. Install `mm`¶

The official installer drops the binary in ~/.local/bin.

In [ ]:

Copied!

!pip install mm-ctx
!pip install mm-ctx

In [ ]:

Copied!

# Verify install and version
!which mm && mm --version
# Verify install and version
!which mm && mm --version

6. Point `mm` at the local Ollama server¶

mm ships with three reserved profiles: ollama, gemini, and vlmrun. We update the ollama profile to point at our local server and the Gemma 4 model we just pulled, then activate it.

In [ ]:

Copied!

!mm profile update ollama --base-url http://localhost:11434/v1 --model {MODEL}
!mm profile use ollama
!mm profile list  # active profile is marked with ●
!mm profile update ollama --base-url http://localhost:11434/v1 --model {MODEL}
!mm profile use ollama
!mm profile list  # active profile is marked with ●

7. `mm find` and `mm wc` — metadata, no VLM¶

These commands work purely on file metadata — no model call, no GPU use.

mm find — tabular listing: kind, size, extension, dimensions
mm wc — quick summary: file count, bytes, estimated lines/tokens

In [ ]:

Copied!

!mm cat {IMAGE_PATH} -m accurate --verbose --no-cache
!mm cat {IMAGE_PATH} -m accurate --verbose --no-cache

In [ ]:

Copied!

!mm cat {IMAGE_PATH} -m accurate -p resize --verbose --no-cache
!mm cat {IMAGE_PATH} -m accurate -p resize --verbose --no-cache

In [ ]:

Copied!

!mm cat --help
!mm cat --help

In [ ]:

Copied!

!mm cat {IMAGE_PATH} --encode.strategy resize --encode.strategy_opts max_width=800 -m accurate --verbose --no-cache
!mm cat {IMAGE_PATH} --encode.strategy resize --encode.strategy_opts max_width=800 -m accurate --verbose --no-cache

In [ ]:

Copied!

!mm cat --list-pipelines
!mm cat --list-pipelines

In [ ]:

Copied!

# Tabular listing: kind, size, ext, dimensions, etc.
!mm find {IMAGE_PATH}
# Tabular listing: kind, size, ext, dimensions, etc.
!mm find {IMAGE_PATH}

In [ ]:

Copied!

# Quick summary: file count, bytes, estimated lines/tokens
!mm wc {IMAGE_PATH}
# Quick summary: file count, bytes, estimated lines/tokens
!mm wc {IMAGE_PATH}

8. `mm cat` on an image¶

mm cat extracts text context from a file. It has two modes:

-m fast — heuristic-only, no VLM call (quick metadata summary)
-m accurate — sends the file to the configured VLM for rich description

Fast mode returns in milliseconds; accurate mode takes a few seconds per image on a T4.

In [ ]:

Copied!

# Fast mode: no VLM call, just metadata
!mm cat {IMAGE_PATH} -m fast --verbose --no-cache
# Fast mode: no VLM call, just metadata
!mm cat {IMAGE_PATH} -m fast --verbose --no-cache

In [ ]:

Copied!

# Accurate mode: sends the image to Gemma 4 via Ollama
!mm cat {IMAGE_PATH} -m accurate --verbose --no-cache
# Accurate mode: sends the image to Gemma 4 via Ollama
!mm cat {IMAGE_PATH} -m accurate --verbose --no-cache

9. `mm cat` on a video¶

For videos, mm samples frames, builds a mosaic, and feeds it to the VLM. Same two modes as images.

In [ ]:

Copied!

# Fast mode: no VLM call
!mm cat {VIDEO_PATH} -m fast --verbose --no-cache
# Fast mode: no VLM call
!mm cat {VIDEO_PATH} -m fast --verbose --no-cache

In [ ]:

Copied!

# Accurate mode: mosaic → Gemma 4
!mm cat {VIDEO_PATH} -m accurate --verbose --no-cache
# Accurate mode: mosaic → Gemma 4
!mm cat {VIDEO_PATH} -m accurate --verbose --no-cache

10. `mm grep` — semantic search across a folder¶

mm grep runs a natural-language query against every file in a directory, using the active VLM profile. This is the piece that's hardest to replicate with plain grep or find: matching meaning rather than substrings.

In [ ]:

Copied!

!mm grep "invoice" {MMBENCH_DIR} -s --pre-index
!mm grep "invoice" {MMBENCH_DIR} -s --pre-index

In [ ]:

Copied!

!ls {MMBENCH_DIR}
!ls {MMBENCH_DIR}

In [ ]:

Copied!

!mm grep --help
!mm grep --help

In [ ]:

Copied!





# Run 1 — includes cold model load
!time mm cat {IMAGE_PATH} -m accurate --no-cache

# Run 2 — model is warm, pure inference
!time mm cat {IMAGE_PATH} -m accurate --no-cache

# Check if the model is currently loaded and how much VRAM it's using
!curl -s http://localhost:11434/api/ps | python -m json.tool
# Run 1 — includes cold model load
!time mm cat {IMAGE_PATH} -m accurate --no-cache

# Run 2 — model is warm, pure inference
!time mm cat {IMAGE_PATH} -m accurate --no-cache

# Check if the model is currently loaded and how much VRAM it's using
!curl -s http://localhost:11434/api/ps | python -m json.tool

In [ ]:

Copied!





# ─── Benchmarking helper (with input dimensions) ────────────────────────────
import subprocess
import time
import os
from PIL import Image


def probe_dims(path):
    """Return (width, height, duration_s) for images/videos. Duration is None for images."""
    ext = os.path.splitext(path)[1].lower()
    if ext in (".jpg", ".jpeg", ".png", ".webp"):
        with Image.open(path) as im:
            return im.width, im.height, None
    else:
        probe = subprocess.run(
            [
                "ffprobe",
                "-v",
                "error",
                "-select_streams",
                "v:0",
                "-show_entries",
                "stream=width,height,duration",
                "-of",
                "default=noprint_wrappers=1:nokey=1",
                path,
            ],
            capture_output=True,
            text=True,
        )
        lines = probe.stdout.strip().splitlines()
        w, h = int(lines[0]), int(lines[1])
        dur = float(lines[2]) if len(lines) > 2 else None
        return w, h, dur


def benchmark_mm(path, label=None, model="gemma4:e4b"):
    label = label or os.path.basename(path)
    w, h, dur = probe_dims(path)
    size_mb = os.path.getsize(path) / 1e6
    rows = []

    for mode in ("fast", "accurate"):
        t0 = time.time()
        proc = subprocess.run(
            ["mm", "cat", path, "-m", mode, "--no-cache"],
            capture_output=True,
            text=True,
        )
        wall = time.time() - t0
        output = proc.stdout.strip()
        n_chars = len(output)
        est_tokens = n_chars / 4
        tok_per_s = est_tokens / wall if wall > 0 and mode == "accurate" else None

        rows.append(
            {
                "input": label,
                "dims": f"{w}x{h}",
                "duration_s": round(dur, 1) if dur is not None else None,
                "pixels_M": round(w * h / 1e6, 2),
                "size_MB": round(size_mb, 2),
                "mode": mode,
                "wall_s": round(wall, 2),
                "est_tokens": round(est_tokens),
                "tok_per_s": round(tok_per_s, 1) if tok_per_s else None,
            }
        )
        print(
            f"  {mode:9s} → {wall:5.1f}s  ({n_chars} chars out"
            + (f", ~{tok_per_s:.1f} tok/s)" if tok_per_s else ")")
        )

    return rows
# ─── Benchmarking helper (with input dimensions) ────────────────────────────
import subprocess
import time
import os
from PIL import Image


def probe_dims(path):
    """Return (width, height, duration_s) for images/videos. Duration is None for images."""
    ext = os.path.splitext(path)[1].lower()
    if ext in (".jpg", ".jpeg", ".png", ".webp"):
        with Image.open(path) as im:
            return im.width, im.height, None
    else:
        probe = subprocess.run(
            [
                "ffprobe",
                "-v",
                "error",
                "-select_streams",
                "v:0",
                "-show_entries",
                "stream=width,height,duration",
                "-of",
                "default=noprint_wrappers=1:nokey=1",
                path,
            ],
            capture_output=True,
            text=True,
        )
        lines = probe.stdout.strip().splitlines()
        w, h = int(lines[0]), int(lines[1])
        dur = float(lines[2]) if len(lines) > 2 else None
        return w, h, dur


def benchmark_mm(path, label=None, model="gemma4:e4b"):
    label = label or os.path.basename(path)
    w, h, dur = probe_dims(path)
    size_mb = os.path.getsize(path) / 1e6
    rows = []

    for mode in ("fast", "accurate"):
        t0 = time.time()
        proc = subprocess.run(
            ["mm", "cat", path, "-m", mode, "--no-cache"],
            capture_output=True,
            text=True,
        )
        wall = time.time() - t0
        output = proc.stdout.strip()
        n_chars = len(output)
        est_tokens = n_chars / 4
        tok_per_s = est_tokens / wall if wall > 0 and mode == "accurate" else None

        rows.append(
            {
                "input": label,
                "dims": f"{w}x{h}",
                "duration_s": round(dur, 1) if dur is not None else None,
                "pixels_M": round(w * h / 1e6, 2),
                "size_MB": round(size_mb, 2),
                "mode": mode,
                "wall_s": round(wall, 2),
                "est_tokens": round(est_tokens),
                "tok_per_s": round(tok_per_s, 1) if tok_per_s else None,
            }
        )
        print(
            f"  {mode:9s} → {wall:5.1f}s  ({n_chars} chars out"
            + (f", ~{tok_per_s:.1f} tok/s)" if tok_per_s else ")")
        )

    return rows

In [ ]:

Copied!





# ─── Run the benchmark ──────────────────────────────────────────────────────
# IMAGE_PATH and VIDEO_PATH resolve to the mmbench-tiny files downloaded above.
inputs = [
    (IMAGE_PATH, "car (image)"),
    (VIDEO_PATH, "bakery (video)"),
]

all_rows = []
for path, label in inputs:
    print(f"\n📊 {label}  ({path})")
    all_rows.extend(benchmark_mm(path, label))
# ─── Run the benchmark ──────────────────────────────────────────────────────
# IMAGE_PATH and VIDEO_PATH resolve to the mmbench-tiny files downloaded above.
inputs = [
    (IMAGE_PATH, "car (image)"),
    (VIDEO_PATH, "bakery (video)"),
]

all_rows = []
for path, label in inputs:
    print(f"\n📊 {label}  ({path})")
    all_rows.extend(benchmark_mm(path, label))

In [ ]:

Copied!





# ─── Summary table ──────────────────────────────────────────────────────────
import pandas as pd

df = pd.DataFrame(all_rows)

# Input-level metadata (same across fast/accurate rows, so just take first)
meta = df.groupby("input", sort=False)[["dims", "duration_s", "pixels_M", "size_MB"]].first()

# Per-mode metrics, pivoted so each input is a row
perf = df.pivot(index="input", columns="mode", values=["wall_s", "est_tokens", "tok_per_s"])
perf.columns = [f"{metric}_{mode}" for metric, mode in perf.columns]

summary = meta.join(perf)
summary = summary.reindex([label for _, label in inputs])  # preserve input order
print(summary.to_string())
# ─── Summary table ──────────────────────────────────────────────────────────
import pandas as pd

df = pd.DataFrame(all_rows)

# Input-level metadata (same across fast/accurate rows, so just take first)
meta = df.groupby("input", sort=False)[["dims", "duration_s", "pixels_M", "size_MB"]].first()

# Per-mode metrics, pivoted so each input is a row
perf = df.pivot(index="input", columns="mode", values=["wall_s", "est_tokens", "tok_per_s"])
perf.columns = [f"{metric}_{mode}" for metric, mode in perf.columns]

summary = meta.join(perf)
summary = summary.reindex([label for _, label in inputs])  # preserve input order
print(summary.to_string())

In [ ]:

Copied!





# ─── Accurate-mode throughput only ──────────────────────────────────────────
acc = df[df["mode"] == "accurate"][
    ["input", "dims", "duration_s", "pixels_M", "size_MB", "wall_s", "est_tokens", "tok_per_s"]
]
print(acc.to_string(index=False))
print(f"\nMedian accurate-mode throughput: {acc['tok_per_s'].median():.1f} tok/s")
print(f"Mean accurate-mode throughput:   {acc['tok_per_s'].mean():.1f} tok/s")
# ─── Accurate-mode throughput only ──────────────────────────────────────────
acc = df[df["mode"] == "accurate"][
    ["input", "dims", "duration_s", "pixels_M", "size_MB", "wall_s", "est_tokens", "tok_per_s"]
]
print(acc.to_string(index=False))
print(f"\nMedian accurate-mode throughput: {acc['tok_per_s'].median():.1f} tok/s")
print(f"Mean accurate-mode throughput:   {acc['tok_per_s'].mean():.1f} tok/s")

Using mm with Ollama and Local Models¶

0. GPU check¶

1. Point at your files and pick a model¶

2. Preview the image and video¶

3. Spin up Ollama and pull Gemma 4¶

4. Sanity check: Gemma 4 out of the box¶

5. Install mm¶

6. Point mm at the local Ollama server¶

7. mm find and mm wc — metadata, no VLM¶

8. mm cat on an image¶

9. mm cat on a video¶

10. mm grep — semantic search across a folder¶

Using `mm` with Ollama and Local Models¶

5. Install `mm`¶

6. Point `mm` at the local Ollama server¶

7. `mm find` and `mm wc` — metadata, no VLM¶

8. `mm cat` on an image¶

9. `mm cat` on a video¶

10. `mm grep` — semantic search across a folder¶