API Documentation

Download OpenAPI Spec

Sign in to see your personal API key injected into the examples below. Sign in →

Prompt for AI

Paste this prompt into Claude, ChatGPT, or any AI assistant to have it implement the Vibe Earning API integration for you.

Show prompt ▸
You are integrating with the Vibe Earning LLM Grid API — a distributed AI inference service.
The API lets you submit prompts to open-source LLMs (llama3, mistral, gemma, etc.) running on volunteer hardware.

## Authentication
All requests require a Bearer token in the Authorization header:
  Authorization: Bearer YOUR_API_KEY

## Base URL
  https://www.llmondemand.com/api/v1

## Submit an Inference Job
POST /v1/jobs
Content-Type: application/json

Body fields:
  model        (string, required)  — Ollama model name, e.g. "llama3:8b", "mistral:7b", "gemma:2b"
  prompt       (string, required)  — The prompt to run
  model_match  (string, optional)  — "exact" (default) or "family" to allow compatible model variants
  tag          (string, optional)  — Label for grouping usage stats (max 64 chars)
  webhook_url  (string, optional)  — URL to POST the completed result to (async callback)
  priority     (integer, optional) — Higher = processed sooner; defaults to your subscription's priority boost when omitted
  timeout_seconds (integer, optional) — Auto-cancelled if still claimed/running this long after being claimed; defaults to 600 (10 minutes)

Success response (201):
{
  "job_id": "uuid",
  "status": "pending",
  "created_at": "2026-06-20T12:00:00Z"
}

Jobs are always accepted (201) as long as your API key is active — quota never blocks
submission, it only pauses processing (see "Token / Prompt Limits" below).

Error responses:
  422 — validation error (missing model/prompt, unsupported model, etc.)

## Poll Job Status
GET /v1/jobs/:id

Response when completed:
{
  "job_id": "uuid",
  "status": "completed",
  "output": "The model's response text...",
  "output_tokens": 142,
  "input_tokens": 34,
  "duration_ms": 8200,
  "completed_at": "2026-06-20T12:01:23Z"
}

Statuses: pending → claimed → running → completed | failed

## Webhook Callback (optional)
If you supply webhook_url, the API will POST to it on completion with the same payload as GET /v1/jobs/:id.

Every webhook request includes an X-Vibe-Signature header for verification:
  X-Vibe-Signature: sha256=

The signature is HMAC-SHA256 of the raw JSON body, keyed with your API key.
Verify it on your server before trusting the payload:

  Ruby:
    expected = "sha256=" + OpenSSL::HMAC.hexdigest("SHA256", YOUR_API_KEY, request.raw_post)
    halt 401 unless Rack::Utils.secure_compare(expected, request.env["HTTP_X_VIBE_SIGNATURE"])

  Python:
    import hmac, hashlib
    expected = "sha256=" + hmac.new(api_key.encode(), request.data, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(expected, request.headers["X-Vibe-Signature"]):
        abort(401)

  Node.js:
    const crypto = require("crypto");
    const expected = "sha256=" + crypto.createHmac("sha256", API_KEY).update(rawBody).digest("hex");
    if (!crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(sig))) return res.sendStatus(401);

Always use a constant-time comparison to prevent timing attacks.

## List Jobs
GET /v1/jobs
Query params: status=completed|failed|pending, tag=my-project, page=1

## Cancel a Job
DELETE /v1/jobs/:id          — cancel one pending job
DELETE /v1/jobs/cancel_all   — cancel all pending jobs

## Retry a Failed Job
POST /v1/jobs/:id/retry

## Increase or Decrease Priority
PATCH /v1/jobs/:id/priority
Body: { "priority": 10 }    — positive to increase, negative to decrease

## List Available Models
GET /v1/models
Returns array of models currently online in the grid with worker_count.

## Usage Statistics
GET /v1/usage
Returns token usage and job counts grouped by day/tag.

## Token / Prompt Limits
There are no per-model or per-request prompt length or output token limits.
Each model runs with its own baked-in context window on the worker's Ollama instance.
Rate limits are subscription-level quotas on total tokens used (daily/weekly/monthly).
Your current quota headroom is visible at GET /v1/usage.
If you exceed your quota, submission still succeeds — the job is accepted and stays
"pending" — but it will not be picked up by a worker until your quota resets.

## File Input (Blob Upload)
Blobs must be attached to an existing job. The required sequence is:

  1. Submit the job first:
     POST /v1/jobs   →  { "job_id": "uuid", ... }

  2. Create a blob slot for that job:
     POST /v1/blobs
     Body: { "job_id": "uuid", "blob_type": "image" }
     Response: { "blob_id": "uuid", "upload_url": "https://s3...", "expires_in": 900 }

  3. Upload the file directly to the presigned S3 URL (PUT, no auth header):
     PUT    (binary body, expires in 15 minutes)

  4. Confirm the upload:
     POST /v1/blobs/:blob_id/confirm
     Response: { "status": "confirmed" }

After confirmation the blob is attached to the job and visible to the assigned worker.
Supported blob_type values: "image" (default). Do NOT pass blob_ids in the job creation
payload — blobs must reference an existing job_id at creation time.

## Best Practices
- For fast responses use model_match "family" so the grid can route to any compatible variant.
- Supply a webhook_url for async flows instead of polling.
- Use the tag field to track usage per feature/project.
- Check GET /v1/models first to see which models are currently online before submitting.
- Prefer smaller models (7b–8b) for lower latency; use larger models only when quality demands it.
- Always verify X-Vibe-Signature on incoming webhooks using a constant-time comparison.

Base URL: https://www.llmondemand.com/api/v1

Authentication

Include your API key as a Bearer token on every request:

Authorization: Bearer YOUR_API_KEY

Submit a Job

curl -X POST https://www.llmondemand.com/api/v1/jobs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "model_match": "family",
    "prompt": "Summarize this document",
    "tag": "my-project",
    "webhook_url": "https://yourapp.com/hooks/llm"
  }'

Poll for Results

Jobs are processed asynchronously. Poll GET /v1/jobs/:id until status is completed or failed, or supply a webhook_url to receive a POST callback.

curl https://www.llmondemand.com/api/v1/jobs/JOB_ID \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response (completed):
{
  "id": "JOB_ID",
  "status": "completed",
  "output": "Here is the summary...",
  "output_tokens": 142,
  "completed_at": "2026-06-20T12:34:56Z"
}

Job Parameters

FieldTypeRequiredDescription
modelstringYesModel name, e.g. llama3:8b
promptstringYesThe prompt text
model_matchstringNoexact (default) or family to allow compatible variants
tagstringNoLabel for grouping usage stats
webhook_urlstringNoURL to POST the result to when done
priorityintegerNoHigher = processed sooner (defaults to your subscription's priority boost)
timeout_secondsintegerNoAuto-cancelled if still claimed/running this long after being claimed (default 600 = 10 minutes)

All API Endpoints

MethodPathDescription
POST/v1/jobsSubmit a new inference job
GET/v1/jobsList your jobs (filterable by status, tag)
GET/v1/jobs/:idPoll job status and result
DELETE/v1/jobs/:idCancel a pending job
DELETE/v1/jobs/cancel_allCancel all pending jobs
POST/v1/jobs/:id/retryRetry a failed job
PATCH/v1/jobs/:id/priorityIncrease or decrease job priority
GET/v1/modelsList available grid models
GET/v1/usageUsage statistics
POST/v1/blobsGet presigned S3 upload URL for file input
POST/v1/blobs/:id/confirmConfirm blob upload complete