Hosted Replicate-MCP
Zweiter hosted MCP nach mcp-vf-hosted. Wrappt replicate (lokales mcp-replicate als stdio-Subprocess) hinter Scalekit-OAuth (EU), VF-Brand-Lock und Modell-Whitelist. Dient Open WebUI VF (vf-nova) und claude.ai Pro Custom Connectors als zentraler Image/Video-Gen-Endpunkt fuer Vibe Factory.
Wofuer
| Use-Case | Layer-Tool | Default-Modell |
|---|---|---|
| Foto-real Hero, B-Roll | create_image | Flux 2 Pro |
| Text-im-Bild (Speaker-Card, Poster) | create_text_image | Ideogram V3 Quality |
| Vektor-Logo / Icon (SVG-Output) | create_svg_logo | Recraft V4 SVG |
| Variante eines bestehenden Bilds | create_image_from_reference | Flux Kontext Pro |
| Video (5-10s) | create_video | Kling 2.5 Turbo Pro / Veo 3 Fast / Wan 2.5 |
VF-Brand-Lock auf Default eingeschaltet — injiziert Neongelb (#C8FF62) + Anthrazit (#1a1a1a) + Anti-Slop-Snippet in jeden Prompt. Konsistent zu vf-nova System-Prompt v3.0.
Architektur
claude.ai Pro / Open WebUI VF (vf-nova)
| OAuth 2.1 (Scalekit EU) + JWT
v
Cloudflare Edge (TLS, WAF, Frankfurt PoP)
| replicate.agenticventures.de
v
Cloudflare Tunnel (Sidecar im Fargate-Task, kein Public-Ingress am Origin)
v
AWS Fargate Task (av-production, eu-central-1)
+-- Container 1: cloudflared
+-- Container 2: vf-replicate (FastMCP)
+-- ScalekitProvider (JWT-Validation, JWKS-Cache)
+-- GuardMiddleware (Rate-Limit + Modell-Whitelist + Audit + Kill-Switch)
+-- ToolWhitelistMiddleware (10 sichtbare Tools statt 35)
+-- Layer-Tools: create_image / _text_image / _svg_logo / _from_reference / _video
+-- Slash-Prompts: /speaker_card, /save_the_date, /social_post
+-- create_proxy(stdio) -> mcp-replicate Subprocess -> api.replicate.com
Modell-Whitelist
Source of Truth: ~/source/mcps/mcp-replicate-hosted/src/mcp_replicate_hosted/config.py.
Aktuell whitelisted (Mai 2026):
| Kategorie | Modelle |
|---|---|
| Foto-Real | flux-2-pro, flux-1.1-pro, flux-1.1-pro-ultra, flux-schnell |
| Brand-Asset + Logo | nano-banana-2 |
| Illustration | recraft-v4 |
| Vektor / SVG | recraft-v4-svg |
| Text-im-Bild | ideogram-v3-quality, ideogram-v3-turbo |
| Image-Variant | flux-kontext-pro |
| Volumen-Mockup | seedream-4.5 |
| Video | kling-v2.5-turbo-pro, wan-2.5-i2v-fast, veo-3-fast, seedance-2.0-fast |
Erweiterung: PR gegen config.py mit Begruendung + Cost-Estimate, dann CDK-Deploy. Kein stilles Flexen via env-var ohne Spur in Git-History.
Spending-Cap
Replicate hat keine API fuer Spending-Cap. Manuell setzen: https://replicate.com/account/billing → Spending limit → 50 (50% Erreichen) an Marvin.
Zusaetzliche Layer:
- GuardMiddleware Rate-Limit (60/min, 1000/h pro Subject-Hash)
- Modell-Whitelist blockt Modelle ueber $0.20/Bild im Default-Set
- CloudWatch-Alarm auf
model_not_whitelisted-Audit-Events (>5/h = Alarm)
MCP-Tool-Whitelist fuer Open WebUI VF
Aufrufbar machen via Admin-UI → Tools → Add MCP Server:
| Tool | Sichtbar fuer Nova? |
|---|---|
| create_image | ja |
| create_text_image | ja |
| create_svg_logo | ja |
| create_image_from_reference | ja |
| create_video | ja |
| replicate_get_prediction | ja |
| replicate_wait_for_prediction | ja |
| replicate_cancel_prediction | ja |
| replicate_list_predictions | ja |
| search_tools | ja |
| alles andere von mcp-replicate (~25 Tools) | nein, via search_tools discoverable |
Total 10 sichtbare Tools — unter dem 12er-Bedrock-ListTools-Cap.
Hosting + Deploy
| Komponente | Status (2026-05-19) |
|---|---|
| Repo lokal | done, 39 pytest gruen, ruff clean |
ECR-Repo mcp-replicate-hosted in av-production | angelegt 2026-05-19 |
Secrets Manager mcp-replicate-hosted/upstream-tokens | offen (Marvin: JSON mit REPLICATE_API_TOKEN) |
Secrets Manager mcp-replicate-hosted/cloudflared-token | offen (nach Tunnel-Create) |
Cloudflare-Tunnel mcp-replicate-hosted | offen (CF-Dashboard oder mcp-cloudflare API) |
Scalekit-Resource (EU) https://replicate.agenticventures.de/mcp | offen (Marvin: Scalekit-Dashboard) |
DNS-Record replicate.agenticventures.de → <tunnel-id>.cfargotunnel.com | offen |
| Docker-Image gebaut + ECR-Push | offen |
| CDK-Stack deployed | offen |
| Smoke-Tests live | offen |
| VF-AVV-Update Replicate-Subprozessor | offen (Andre + Christoph) |
Deploy-Anleitung: ~/source/mcps/mcp-replicate-hosted/README.md. Pattern-Doku: mcp-hosting-fargate-tunnel.
Incident-Response
| Symptom | Erste Aktion |
|---|---|
| URL geleakt | Scalekit-Session revoken |
| Cost-Spike | EMERGENCY_DISABLE=true via ECS-Force-Deploy + Spending-Cap in Replicate-Dashboard auf $0 |
| 401-Rate explodiert | Cloudflare-Rule fuer IP-/Geo-Block |
| Tool-Bloat in Nova | MCP_REPLICATE_HOSTED_TOOL_WHITELIST einkuerzen |
| Replicate-Token expired | Secret rotieren + ECS-Force-Deploy |
Cost-Erwartung
- Hosting: ~0 CF
- Replicate-Pass-Through: ~0.05 Mix-Avg = ~$24, plus Video bei Bedarf)
- Cap: $100/Mo hartes Spending-Limit im Dashboard
Related
- Pattern: mcp-hosting-fargate-tunnel
- Source-Repo (lokal): replicate
- Vorbild-Stack: mcp-vf-hosted
- ADR Video-Provider-Wahl: vf-video-gen-provider
- Plan: 2026-05-19-design-stack-julian
- Sprint: sprint-2-replicate-hosted
- Run-Log: _index