Replicate MCP

Eigener Build unter mcp-replicate. Python + FastMCP, HTTP-Transport Port 8769. Voll abgedeckt: Predictions, Models, Trainings, Deployments, Files.

Wozu

Replicate = Multi-Model-Host fuer AI-Modelle. Einziger Anbieter mit Flux, SDXL, Whisper, Llama, ControlNet, SAM, Hunyuan-Video, MusicGen etc. unter einer API. Billing pro GPU-Sekunde. Fuer VLOG-Pipeline alternativ/parallel zu runway — wo Runway nur Gen-4 kann, bietet Replicate die ganze Open-Source-Modell-Landschaft (LoRA-Finetunes, Style-Transfer, Upscaling, Stem-Splitting, TTS/STT).

Setup

# 1. Token: https://replicate.com/account/api-tokens
cp ~/source/mcps/mcp-replicate/.env.local.example ~/source/mcps/mcp-replicate/.env.local
# REPLICATE_API_TOKEN=...
uv tool install --force --editable ~/source/mcps/mcp-replicate
claude mcp add replicate "bash ~/source/mcps/mcp-replicate/start.sh"

Tools im Ueberblick (35)

GruppeTools
Predictionscreate_prediction, get_prediction, cancel_prediction, list_predictions, wait_for_prediction, download_output
Modelssearch_models, list_models, get_model, list_model_versions, get_model_version, create_model, delete_model, delete_model_version
Collectionslist_collections, get_collection
Trainingscreate_training, get_training, cancel_training, list_trainings, wait_for_training
Deploymentslist_deployments, get_deployment, create_deployment, update_deployment, delete_deployment
Filesupload_file, list_files, get_file, delete_file
Metaget_account, list_hardware, get_webhook_secret
Escaperaw_get, raw_post

Drei Prediction-Modi

create_prediction bedient alle drei — immer genau EINE Referenz setzen:

  1. Version (version="db21e45a...") — Community-Modell per Version-Hash. Nutzt POST /predictions.
  2. Model (model="black-forest-labs/flux-schnell") — offizielle Replicate-Modelle, immer neueste Version. Nutzt POST /models/{owner}/{name}/predictions.
  3. Deployment (deployment="acme/my-flux") — eigenes Deployment mit pinned Version + dedizierter Hardware.

Sync vs Async

Sync mit Prefer: wait (bis 60s)

Fuer schnelle Modelle (Flux-Schnell, SDXL-Lightning):

create_prediction(
  model="black-forest-labs/flux-schnell",
  input={"prompt": "a serene lake at dusk"},
  wait_seconds=30,
)

Response enthaelt output direkt wenn in 30s fertig. Sonst: Prediction-Objekt im laufenden Status, dann mit wait_for_prediction weitermachen.

Async (lange Laeufe)

pred  = create_prediction(version="db21e45a...", input={...})
final = wait_for_prediction(pred["id"], timeout_seconds=1800)
files = download_output(pred["id"])  # default: remotion/public/ai-broll/

File-Inputs

Modell-Inputs die ein File erwarten (Bild, Video, Audio) akzeptieren:

  • http(s) URL — oeffentlich erreichbar
  • Data-URIdata:image/png;base64,...
  • Replicate-File-URL — aus upload_file (result.urls.get)

Fuer lokale Files: upload_file(file_path="...")urls.get-URL als Input.

Output-Shapes

output kommt in vielen Formen:

  • String-URL (ein File)
  • Array von URLs (mehrere Files)
  • Dict mit URL-Values (named outputs)
  • Primitive (Text-Modelle: String; Classifier: dict)

download_output traversiert rekursiv und zieht alle URLs. Nicht-URL-Werte landen in skipped.

Fine-Tuning Flow

file = upload_file("training-images.zip")
create_model(owner="marvin", name="flux-brand-lora", visibility="private", hardware="cpu")
training = create_training(
  model="ostris/flux-dev-lora-trainer",
  version_id="...",  # aus list_model_versions
  destination="marvin/flux-brand-lora",
  input={"input_images": file["urls"]["get"], "trigger_word": "MARVBRAND", "steps": 1000},
)
final = wait_for_training(training["id"])  # 15-45 min
# final["output"]["version"] -> neue Version-ID fuer Predictions

Limits & Quirks

  • Rate-Limits: ~600 req/min pro Account. Bei Bedarf Deployments mit hoeherer Concurrency.
  • Prediction-Retention: 7 Tage. Output-URLs danach tot — download_output zeitnah.
  • Hardware-SKUs aendern sichlist_hardware fuer aktuelle Preise.
  • QUERY-HTTP-Method fuer search_models ist non-standard. Reverse-Proxies koennen blocken — raw_get("/models", ...) ist KEIN Fallback (Replicate hat keinen GET-Search-Endpoint).
  • Webhooks: webhook-Parameter in create_prediction/create_training. Signing-Secret via get_webhook_secret, HMAC-SHA256 ueber Body.
  • Cost visibility: Kein organization-Endpoint wie Runway — Spend ueber Replicate-Dashboard.

Pricing-Richtwerte (April 2026, nachpruefen)

  • Flux-Schnell: ~$0.003/Bild
  • Flux-Dev: ~$0.03/Bild
  • Flux Fine-Tuning: ~$2-5/Training (1000 Steps, A100)
  • SDXL: ~$0.002/Bild
  • Hunyuan-Video: ~$1-3 pro 5s-Clip
  • Whisper v3: ~$0.0015/min Audio
  • GPU: CPU 0.000575/s, A100-80GB 0.001525/s