Replicate MCP
Eigener Build unter mcp-replicate. Python + FastMCP, HTTP-Transport Port 8769. Voll abgedeckt: Predictions, Models, Trainings, Deployments, Files.
Wozu
Replicate = Multi-Model-Host fuer AI-Modelle. Einziger Anbieter mit Flux, SDXL, Whisper, Llama, ControlNet, SAM, Hunyuan-Video, MusicGen etc. unter einer API. Billing pro GPU-Sekunde. Fuer VLOG-Pipeline alternativ/parallel zu runway — wo Runway nur Gen-4 kann, bietet Replicate die ganze Open-Source-Modell-Landschaft (LoRA-Finetunes, Style-Transfer, Upscaling, Stem-Splitting, TTS/STT).
Setup
# 1. Token: https://replicate.com/account/api-tokens
cp ~/source/mcps/mcp-replicate/.env.local.example ~/source/mcps/mcp-replicate/.env.local
# REPLICATE_API_TOKEN=...
uv tool install --force --editable ~/source/mcps/mcp-replicate
claude mcp add replicate "bash ~/source/mcps/mcp-replicate/start.sh"Tools im Ueberblick (35)
| Gruppe | Tools |
|---|---|
| Predictions | create_prediction, get_prediction, cancel_prediction, list_predictions, wait_for_prediction, download_output |
| Models | search_models, list_models, get_model, list_model_versions, get_model_version, create_model, delete_model, delete_model_version |
| Collections | list_collections, get_collection |
| Trainings | create_training, get_training, cancel_training, list_trainings, wait_for_training |
| Deployments | list_deployments, get_deployment, create_deployment, update_deployment, delete_deployment |
| Files | upload_file, list_files, get_file, delete_file |
| Meta | get_account, list_hardware, get_webhook_secret |
| Escape | raw_get, raw_post |
Drei Prediction-Modi
create_prediction bedient alle drei — immer genau EINE Referenz setzen:
- Version (
version="db21e45a...") — Community-Modell per Version-Hash. NutztPOST /predictions. - Model (
model="black-forest-labs/flux-schnell") — offizielle Replicate-Modelle, immer neueste Version. NutztPOST /models/{owner}/{name}/predictions. - Deployment (
deployment="acme/my-flux") — eigenes Deployment mit pinned Version + dedizierter Hardware.
Sync vs Async
Sync mit Prefer: wait (bis 60s)
Fuer schnelle Modelle (Flux-Schnell, SDXL-Lightning):
create_prediction(
model="black-forest-labs/flux-schnell",
input={"prompt": "a serene lake at dusk"},
wait_seconds=30,
)
Response enthaelt output direkt wenn in 30s fertig. Sonst: Prediction-Objekt im laufenden Status, dann mit wait_for_prediction weitermachen.
Async (lange Laeufe)
pred = create_prediction(version="db21e45a...", input={...})
final = wait_for_prediction(pred["id"], timeout_seconds=1800)
files = download_output(pred["id"]) # default: remotion/public/ai-broll/
File-Inputs
Modell-Inputs die ein File erwarten (Bild, Video, Audio) akzeptieren:
- http(s) URL — oeffentlich erreichbar
- Data-URI —
data:image/png;base64,... - Replicate-File-URL — aus
upload_file(result.urls.get)
Fuer lokale Files: upload_file(file_path="...") → urls.get-URL als Input.
Output-Shapes
output kommt in vielen Formen:
- String-URL (ein File)
- Array von URLs (mehrere Files)
- Dict mit URL-Values (named outputs)
- Primitive (Text-Modelle: String; Classifier: dict)
download_output traversiert rekursiv und zieht alle URLs. Nicht-URL-Werte landen in skipped.
Fine-Tuning Flow
file = upload_file("training-images.zip")
create_model(owner="marvin", name="flux-brand-lora", visibility="private", hardware="cpu")
training = create_training(
model="ostris/flux-dev-lora-trainer",
version_id="...", # aus list_model_versions
destination="marvin/flux-brand-lora",
input={"input_images": file["urls"]["get"], "trigger_word": "MARVBRAND", "steps": 1000},
)
final = wait_for_training(training["id"]) # 15-45 min
# final["output"]["version"] -> neue Version-ID fuer Predictions
Limits & Quirks
- Rate-Limits: ~600 req/min pro Account. Bei Bedarf Deployments mit hoeherer Concurrency.
- Prediction-Retention: 7 Tage. Output-URLs danach tot —
download_outputzeitnah. - Hardware-SKUs aendern sich —
list_hardwarefuer aktuelle Preise. - QUERY-HTTP-Method fuer
search_modelsist non-standard. Reverse-Proxies koennen blocken —raw_get("/models", ...)ist KEIN Fallback (Replicate hat keinen GET-Search-Endpoint). - Webhooks:
webhook-Parameter in create_prediction/create_training. Signing-Secret viaget_webhook_secret, HMAC-SHA256 ueber Body. - Cost visibility: Kein
organization-Endpoint wie Runway — Spend ueber Replicate-Dashboard.
Pricing-Richtwerte (April 2026, nachpruefen)
- Flux-Schnell: ~$0.003/Bild
- Flux-Dev: ~$0.03/Bild
- Flux Fine-Tuning: ~$2-5/Training (1000 Steps, A100)
- SDXL: ~$0.002/Bild
- Hunyuan-Video: ~$1-3 pro 5s-Clip
- Whisper v3: ~$0.0015/min Audio
- GPU: CPU 0.000575/s, A100-80GB 0.001525/s
Related
- mcp-replicate — Source
- runway — Alternative fuer VLOG-B-Roll (Gen-4)
- Replicate API Doku