Vision API — How to use

Model in use: moondream (set via GEMMA_MODEL on the server).

Face matching disclaimer. Vision-language models guess from pixels; they are not biometric identity systems. Use match_label (yes / no / unclear) only for triage. For legal, access-control, or forensic use, use dedicated face-embedding software (e.g. ArcFace / InsightFace).

1. General vision (any prompt + images)

POST /v1/vision — JSON with prompt and images_base64 (list).
POST /v1/vision/upload — multipart: prompt + repeated images files.

2. Reference vs up to 10 candidates (face-style workflow)

Upload one reference photo of a person, then 1–10 other photos. The server compares each candidate to the reference in a separate model call and returns match_label per index (yes, no, or unclear), plus the raw model text.

JSON — `POST /v1/face/compare`

{
  "reference_image_base64": "data:image/jpeg;base64,...",
  "candidate_images_base64": ["data:image/jpeg;base64,...", "..."],
  "max_tokens": 120,
  "temperature": 0.5,
  "think": false
}

Multipart — `POST /v1/face/compare/upload`

Form fields:

reference — single image file.
candidates — repeat this field for each candidate file (1–10 files).
Optional: max_tokens, temperature, top_p, top_k, think (ignored for Moondream).

3. Try in the browser

Open /playground — use the Reference vs candidates panel at the top, or use Scalar (/scalar) “Try it” for the JSON endpoints.