Using mm with vlm.run gateway API¶
A walkthrough of mm: a CLI that turns images, videos, PDFs, and audio into text-based context you can pipe into any LLM workflow.
This notebook covers:
- Installing
mmand using the hostedgatewayprofile mm find: list files in a directory, andmm wc: count files, bytes, and estimated tokensmm catin bothfastandaccuratemodes across all four supported file types: image, video, PDF, and audiomm grep: semantic search across a folder
Dataset: four public sample files (one per modality). Swap in your own: every command works the same way.
No local GPU or model server required, as VLM calls go to the hosted gateway.
Links:
- 🤗 Try mm in Hugging Face Spaces: https://huggingface.co/spaces/vlm-run/mm-ctx
- 📚 mm docs: https://vlm-run.github.io/mm/
📥 1. Download sample files¶
One file per modality: image, video, audio, PDF.
import os
import urllib.request
from pathlib import Path
DATA_DIR = Path("~/.mm/notebooks/data/samples").expanduser()
DATA_DIR.mkdir(parents=True, exist_ok=True)
FILES = {
"car.jpg": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/image.caption/car.jpg",
"Timelapse.mp4": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/video/Timelapse.mp4",
"how_to_build_an_mvp.mp3": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/audio.transcription/how_to_build_an_mvp.mp3",
"wordpress-pdf-invoice-plugin-sample.pdf": "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/wordpress-pdf-invoice-plugin-sample.pdf",
}
for fname, url in FILES.items():
dest = DATA_DIR / fname
if dest.exists():
print(f"Already have {dest}")
else:
print(f"Fetching {url}")
urllib.request.urlretrieve(url, dest)
print(f" → {dest}")
os.chdir(DATA_DIR)
IMAGE_PATH = str(DATA_DIR / "car.jpg")
VIDEO_PATH = str(DATA_DIR / "Timelapse.mp4")
AUDIO_PATH = str(DATA_DIR / "how_to_build_an_mvp.mp3")
PDF_PATH = str(DATA_DIR / "wordpress-pdf-invoice-plugin-sample.pdf")
📦 2. Install vlmrun¶
!pip install vlmrun --upgrade --quiet
🛠️ 3. Install mm¶
mm-ctx is the pip package; the CLI is exposed as mm.
!pip install mm-ctx --quiet
# Verify install and version
!which mm && mm --version
🌐 4. Use the built-in gateway profile¶
mm ships with reserved profiles for ollama, openrouter, and gateway. The gateway profile is pre-pointed at the hosted gateway, so we just confirm it's the active one (marked with ●).
!mm profile list
🔍 5. mm find and mm wc¶
mm find: list files, kind, size, extension, dimensions.mm wc: summarize file count, bytes, and estimated lines/tokens.
# Tabular listing: kind, size, ext, dimensions
!mm find {IMAGE_PATH}
# Summary: file count, bytes, estimated lines/tokens
!mm wc {IMAGE_PATH}
🖼️ 6. mm cat on an image¶
mm cat extracts text context from a file. It supports two modes, which apply to every file type below:
-m fast: a quick pass, optimized for speed.-m accurate: a heavier pass, optimized for richer, more structured output.
# Preview the image inline
from IPython.display import Image, display
display(Image(IMAGE_PATH))
# Fast mode
!mm cat {IMAGE_PATH} -m fast --verbose --no-cache
# Accurate mode
!mm cat {IMAGE_PATH} -m accurate --verbose --no-cache
🎬 7. mm cat on a video¶
Same mm cat interface, different underlying pipeline.
# Preview the video inline
from IPython.display import Video, display
display(Video(VIDEO_PATH, embed=True))
# Fast mode
!mm cat {VIDEO_PATH} -m fast --verbose --no-cache
# Accurate mode
!mm cat {VIDEO_PATH} -m accurate --verbose --no-cache
📄 8. mm cat on a PDF¶
# Preview the first page by rasterizing with the vlmrun SDK helper
from IPython.display import display
from vlmrun.common.pdf import pdf_images
pages = list(pdf_images(Path(PDF_PATH), dpi=100))
print(f"PDF has {len(pages)} page(s)")
# display the first page.
display(pages[0].image)
# Fast mode
!mm cat {PDF_PATH} -m fast --verbose --no-cache
# Accurate mode
!mm cat {PDF_PATH} -m accurate --verbose --no-cache
🎧 9. mm cat on audio¶
# Inline HTML5 audio player
from IPython.display import Audio, display
display(Audio(AUDIO_PATH))
# Fast mode
!mm cat {AUDIO_PATH} -m fast --verbose --no-cache
🧭 10. mm grep — semantic search across a folder¶
mm grep runs a natural-language query against every file in a directory, using the active profile. This is the piece that's hardest to replicate with plain grep or find: it matches meaning, not substrings.
# Semantic search — checks each file for relevance to the query
!mm grep "minimum viable product" {DATA_DIR} -s --pre-index