← Playground · Scalar API · Swagger

Vision API — How to use

Model in use: moondream (set via GEMMA_MODEL on the server).

Face matching disclaimer. Vision-language models guess from pixels; they are not biometric identity systems. Use match_label (yes / no / unclear) only for triage. For legal, access-control, or forensic use, use dedicated face-embedding software (e.g. ArcFace / InsightFace).

1. General vision (any prompt + images)

2. Reference vs up to 10 candidates (face-style workflow)

Upload one reference photo of a person, then 1–10 other photos. The server compares each candidate to the reference in a separate model call and returns match_label per index (yes, no, or unclear), plus the raw model text.

JSON — POST /v1/face/compare

{
  "reference_image_base64": "data:image/jpeg;base64,...",
  "candidate_images_base64": ["data:image/jpeg;base64,...", "..."],
  "max_tokens": 120,
  "temperature": 0.5,
  "think": false
}

Multipart — POST /v1/face/compare/upload

Form fields:

3. Try in the browser

Open /playground — use the Reference vs candidates panel at the top, or use Scalar (/scalar) “Try it” for the JSON endpoints.