Context
The recipe site already depends on Large Language Models for the photo-to-recipe ingestion pipeline
(ADR 029: DVC, ADR 030: Cooklang).
The pipeline has three distinct LLM stages — OCR extraction from page images (VLM), normalisation
of raw extraction into Cooklang (LLM), and ingredient disambiguation (cheaper LLM) — each with
different price/performance trade-offs. The current
params.yaml
already mixes Gemini 3 Flash Preview and Gemini 2.5 Flash across these stages.
Beyond ingestion, the recipe-site roadmap lists several AI-powered user-facing features in Phases 2–6: AI URL import, AI photo import for smooth onboarding, AI meal planning, and nutritional analysis. Each will benefit from being able to pick the right model for the job, swap providers as price/quality moves, and try new frontier models the moment they ship — without a code change, a new SDK, a new key, a new billing relationship, or a new platform to learn.
Today the ML pipeline talks to OpenRouter via the OpenAI SDK
(ml-pipelines/recipe-parsing/src/lib/openrouter.ts).
This ADR formalises that as the strategic
choice for all LLM/VLM access across the recipe site — pipeline, future Worker-based features,
and any client-side experimentation — and documents why the alternatives (direct provider SDKs,
Cloudflare AI Gateway, Vercel AI Gateway, custom abstraction layers) are not a better fit.
The recipe site is governed by a small set of strong constraints:
- Solo developer, fast iteration. Every platform, billing relationship, and SDK is friction. Less Is More and Short Feedback Loops push hard against adding per-provider integrations.
- Model churn is structural, not transitional. New frontier and open models ship monthly. Locking the codebase to one vendor's SDK has a high opportunity cost.
- Capacity matters more than brand. The Anthropic API has had repeated availability and rate-limit issues during peak demand windows for Claude releases. AWS Bedrock and Google Vertex AI typically have far higher available capacity for the same models, but require full IAM/SigV4 or service-account setup to access — overhead that doesn't pay back for a recipe site.
- Mixed-modality workloads. Photo-to-recipe needs a vision-capable model; normalisation needs a cheap, fast text model; disambiguation can run on a tiny one. The same gateway must serve all three.
- Cooklang is the canonical output. Whatever model is in the loop, the result lands in
Cooklang (ADR 030). The model returns a JSON
envelope (
{ frontmatter, body }) that the gateway constrains via structured outputs; the Cooklang text insidebodyis plain text, validated post-hoc bycooklang-rs. So structured-output support across providers is load-bearing for the envelope, even though the format inside it is not JSON.
Decision
I propose using OpenRouter as the single LLM/VLM access point for all AI-powered features in the recipe site — the ML ingestion pipeline today, and any future user-facing AI features (recipe URL import, smooth-onboarding photo capture, meal planning, nutrition) as they ship.
OpenRouter is an OpenAI-compatible gateway that routes one API to 350+ models across 60+ providers, with one account, one key, one bill, and one SDK. A model swap is a string change:
// before
model: "google/gemini-2.5-flash"
// after
model: "anthropic/claude-sonnet-4.6"Provider-level routing is configurable per request — fallbacks, sort-by-price, sort-by-throughput, sort-by-latency, ignore-list, only-list, zero-data-retention enforcement, and quantisation filtering are all set in the request body, not in platform config. This means:
- One commit, not one platform migration, to try a new model in production for any feature.
- Automatic failover if a provider rate-limits or has an outage — the pipeline doesn't see the error, just slightly different latency.
- Capacity from Bedrock/Vertex when the model author's first-party API is saturated, with no IAM or SigV4 setup on our side — OpenRouter handles provider authentication on behalf of all routes.
- Per-stage model choice — the pipeline already mixes a cheaper text model
(
google/gemini-2.5-flash) for disambiguation with a vision-capable model (google/gemini-3-flash-preview) for OCR viaparams.yaml. Swapping any stage is a one-line change with no SDK or platform work.
The existing openrouter.ts helper already uses the OpenAI SDK pointed at
https://openrouter.ai/api/v1. Future Workers can do the same with the same SDK.
This aligns with:
- Less Is More — one account, one key, one bill, one SDK replaces N integrations per provider.
- Short Feedback Loops — trying a new model
is a one-line change in
params.yamlor a request body, not a platform onboarding exercise. - The Goldilocks Zone — OpenRouter has been the de facto cross-provider LLM gateway since 2023: battle-tested API surface, large catalogue, good docs, and personal production use from Bestomer's AI Shopping Assistant chatbot since 2023. Vercel AI Gateway is GA 2025 (early-adopter cost), Eden AI and AI/ML API are smaller and less proven, and Cloudflare AI Gateway is a different category of product. OpenRouter sits squarely in the "old enough to be reliable, not so old it's stagnant" sweet spot.
What Underlying Features It Enables
| Feature | Phase | What OpenRouter Gives Us |
|---|---|---|
| Photo-to-recipe (VLM) | P1/P4 | Multimodal models from multiple providers — Gemini, GPT-4o, Claude vision, open VLMs via Together/DeepInfra — all reachable through the same chat.completions call. |
| AI URL import (LLM) | P4 | A cheap, fast text model to clean scraped HTML/JSON-LD into Cooklang; pick the cheapest capable model per request. |
| Smooth onboarding (mixed VLM + LLM) | P4 | One key for the photograph-then-normalise loop. Latency-sorted routing on the OCR step makes onboarding feel snappy. |
| Meal planning (LLM) | P5 | Quality vs. cost trade-off per user request — frontier model for "plan my week", cheap model for "swap one meal". |
| Nutritional analysis (LLM) | P6 | Structured output (response_format: json_schema) against our nutrition schema, with the freedom to pick the cheapest model that hits accuracy targets. |
| Cooking assistant chat (LLM) | Idea | If a Phase-3+ conversational layer is built, the same gateway serves it — including streaming. |
What Specifically OpenRouter Provides
- OpenAI-compatible REST API — drop-in replacement for the OpenAI SDK by changing
baseURL. Already in use atml-pipelines/recipe-parsing/src/lib/openrouter.ts(seegetOrCreateOpenRouterClient). First-party SDKs also exist if we ever want them: an official Python SDK (currently beta) and an official Vercel AI SDK provider (@openrouter/ai-sdk-provider) for TypeScript apps that want the Vercel AI SDK's React hooks. - 358 models across 60+ providers (live count, May 2026) — Anthropic, OpenAI, Google (AI Studio + Vertex), Mistral, Meta, DeepSeek, Qwen, Together, DeepInfra, Fireworks, AWS Bedrock and more, behind one catalogue and one billing relationship. Crucially, deep coverage of the open-weight long tail (Together, DeepInfra, Fireworks, Groq, Cerebras, Nebius, AkashML, Parasail, Friendli, etc.) means popular models like Llama 3.3 70B Instruct route across ~15 providers — load balancing and price arbitrage are real, not hypothetical.
- Better availability than first-party APIs — for models that ship on multiple clouds (Claude on Anthropic + Bedrock + Vertex, Llama on Together + DeepInfra + Fireworks + Bedrock), OpenRouter load-balances and falls back across providers. When the model author's own API is rate-limiting or down — a recurring pattern around frontier model launches — the request still succeeds via Bedrock/Vertex without any code or config change on our side.
- Per-request provider preferences —
provider.order,provider.only,provider.ignore,provider.sort(price|throughput|latency),provider.allow_fallbacks,provider.data_collection: "deny",provider.zdr: true, and quantisation filters (int4,int8,fp8,bf16) — all set in the request body. - Sort shortcuts in the model slug —
:nitrofor highest throughput,:floorfor lowest price. Syntactic sugar for theprovider.sortparameter; useful as a one-line knob if experimentation later wants it, but not load-bearing. - Structured outputs —
response_format: { type: "json_schema", strict: true, ... }is honoured across providers that support it, with automatic filtering to capable providers. Already used inopenrouter.tsfor every stage. - Multimodal in one API — text, images, and PDFs (URL or base64). The photo-to-recipe stage
already passes images this way (see
parseRecipeFromImages/runImagePromptinopenrouter.ts). Audio (STT + TTS) is also covered via dedicated/audio/transcriptions(Whisper, Whisper Large V3 Turbo, GPT-4o Transcribe, Google Chirp 3, Groq's fast Whisper) and/audio/speech(OpenAI, Google Gemini Flash, Mistral Voxtral, xAI Grok Voice) endpoints — relevant if cooking mode's voice navigation (see the recipe-site roadmap's Voice / Hands-Free Navigation section) ever outgrows the browser-native Web Speech API and needs server-side STT/TTS for accuracy or language coverage. - Per-model latency and throughput visibility — the OpenRouter dashboard surfaces TTFT and tokens-per-second per provider per model, useful when picking models for latency-sensitive features like onboarding.
- Usage activity dashboard — historic usage filterable by model, provider, and API key. A light-touch observability layer that we'd otherwise need to build.
- Response caching and input/output logging out of the box — identical-request
response caching (zero-cost cache
hits, surfaced as
cached_tokens/cache_write_tokensin the usage response), provider prompt caching pass-through with session-id sticky routing, and full request/response input/output logging on the Logs page when enabled — without standing up a separate gateway. - No platform-side per-provider setup — no Bedrock IAM/SigV4, no Vertex service account, no Anthropic console keys, no AWS billing relationship. The full BYOK escape hatch exists if we ever want it.
Cost Model
Per-token pricing is pass-through — OpenRouter charges the same per-token rate as the underlying provider, with no markup on inference itself. The platform monetises via a 5.5% fee on credit top-up (5% for crypto, $0.80 minimum), so a $100 top-up funds ~$94.50 of inference. Effective markup works out to roughly 5.8% — the same as Eden AI, which uses an identical 5.5%-on-credit-top-up structure. Spend stays predictable and pay-as-you-go, with no minimum commitment per provider.
Vercel AI Gateway is the cheaper aggregator on paper — 0% Vercel markup on tokens, $5/month free credit, no paid Vercel plan required (Hobby team accounts work). Vercel monetises elsewhere: hosting/deployment is the core business and AI Gateway is a loss leader for it, plus optional add-on capabilities (Custom Reporting, team-wide allowlists, team-wide ZDR) carry per-request surcharges if turned on. Vercel's docs also disclaim "you're responsible for any payment processing fees that may apply" on credit top-ups, which in practice translates to Stripe-style card processing (~2.5%) passed through. The effective gap between the two is therefore ~3 percentage points at meaningful scale, not the full 5.8%:
| Monthly inference spend | OpenRouter overhead | Vercel overhead | Gap |
|---|---|---|---|
| $5 | ~16% ($0.80 minimum dominates) | 0% (free $5 credit) | ~16pp |
| $50 | 5.8% | ~0% (free credit absorbs most) | ~5.8pp |
| $500 | 5.8% | ~1.5% | ~4.3pp |
| $5,000 | 5.8% | ~2.5% | ~3.3pp |
| $50,000 | 5.8% | ~2.5% | ~3.3pp |
The Stripe-style passthrough is an estimate based on Vercel's "may apply" language — actual could be lower (Vercel's own merchant rate) or zero (absorbed). Even taking the worst case, the OpenRouter cost premium stabilises at roughly 3pp once spend grows past the $5 free credit.
Cost Over the Lifecycle
The 5.8% effective markup is a knob we can turn down over time rather than a fixed cost. OpenRouter's BYOK terms make the cost story phase-dependent:
- First 1M BYOK requests per month: free — 0% markup on top of whatever the provider charges you directly (announcement).
- Above 1M BYOK requests per month: 5% of the equivalent OpenRouter pay-as-you-go cost on the surplus.
- BYOK is per-provider, not all-or-nothing — you can BYOK Anthropic while staying on pay-as-you-go for Google, or vice versa, and switch any one of them independently.
- Fallback still works under BYOK — by default, if BYOK keys rate-limit or fail, OpenRouter falls back to its own shared endpoints. The availability story doesn't change.
That enables a natural lifecycle:
- Today (broad experimentation, low absolute spend). Stay on pay-as-you-go across the board. Catalogue breadth matters more than 5.8% on pocket-change spend. New model experiments don't justify creating a provider account.
- As user-facing AI scales. Selectively BYOK the providers carrying the most volume and that are operationally cheap to onboard — Anthropic direct, OpenAI direct, Google AI Studio. Those drop to 0% markup (under 1M req/month) or 5% (above) while the rest of the catalogue stays one config flip away.
- Long tail stays pay-as-you-go. Painful providers (AWS Bedrock SigV4, Vertex service accounts) — keep them on OpenRouter's relationship; the 5.8% buys us out of IAM. Open-weight experimentation providers (Together, DeepInfra, Fireworks) — same story; no account for a one-week test.
Net: the effective markup converges toward 0% on the volume that matters as the product scales, without sacrificing catalogue breadth, routing controls, or the one-key/one-bill story for the long tail. Vercel AI Gateway runs at ~2.5% overhead in practice (Stripe passthrough), so the ~3pp cost gap to OpenRouter doesn't close fully via BYOK — but it does narrow to the point where the catalogue breadth and routing controls that drove the original decision comfortably outweigh the residual saving.
Alternatives Considered
The cross-provider LLM access market splits cleanly into two categories with very different operating models:
- LLM Aggregators own a model catalogue and the billing relationship with each upstream provider. You sign up to the aggregator, pay it, and it pays the providers. Examples: OpenRouter, Vercel AI Gateway (despite the "Gateway" branding), Eden AI, AI/ML API.
- LLM Gateways sit in front of providers as a proxy. They centralise observability, key management, guardrails, and routing — but you bring your own provider keys and own the provider billing relationships. Examples: Cloudflare AI Gateway, LiteLLM, Portkey, Kong AI Gateway.
These are not interchangeable. An aggregator eliminates per-provider account sprawl; a gateway adds policy and observability on top of that sprawl. The two can also compose — a gateway can sit in front of an aggregator. The decision framing below first considers the baseline (no intermediary), then the aggregator peers of OpenRouter, then the BYOK gateways (which solve a different half of the problem).
Baseline: Direct Per-Provider SDKs (@anthropic-ai/sdk, @google/genai, openai, AWS SDK for Bedrock, etc.)
- Pros: First-party, full feature coverage on the day of launch (e.g., Anthropic's prompt caching, OpenAI's Responses API), no third-party dependency in the inference path.
- Cons: Every new model means a new SDK, a new account, a new console, a new billing
relationship, and a new set of keys/secrets to inject into CI, Workers, and local dev.
Bedrock additionally requires AWS IAM and SigV4 signing — substantial setup overhead for a
recipe site. Switching the disambiguation stage from Gemini Flash to a Together-hosted Qwen
variant for a price experiment becomes a multi-day platform change instead of a one-line
params.yamledit. Fallback across providers when one is rate-limiting is hand-rolled retry/try-catch glue that we'd have to maintain. - Decision: Rejected. The friction is incompatible with the iteration cadence the ingestion pipeline and future AI features need.
LLM Aggregators (real peers to OpenRouter)
Aggregators own the upstream provider relationships, so a single account, key, and bill replace N per-provider integrations. The relevant axes to compare them on are catalogue breadth, routing controls, pricing model, platform coupling, and maturity.
Vercel AI Gateway
The closest peer to OpenRouter — a unified API to hundreds of models with no per-token markup, optional BYOK, OpenAI- and Anthropic-compatible endpoints, automatic cross-provider retry, and spend monitoring.
- Pros: Genuine OpenRouter-class product (catalogue + unified billing + retry). Cheaper on tokens than OpenRouter — Vercel charges 0% markup on per-token pricing (AI Gateway is a loss leader for hosting; optional add-on capabilities and Stripe-passthrough card processing carry the cost), where OpenRouter applies ~5.8% effective via its credit-top-up fee. See the Cost Model table above for the ~3pp effective gap at meaningful scale. Native integration with the Vercel AI SDK if we ever adopt that client library.
- Cons: Smaller catalogue (275 vs OpenRouter's 358 models, live counts May 2026) and
meaningfully thinner provider depth on popular open-weight models. Hard examples from
querying both APIs: Llama 3.3 70B Instruct is not in Vercel's catalogue at all
(OpenRouter has 15 providers for it); Claude Sonnet 4.6 has 3 endpoints on Vercel vs 8 on
OpenRouter; the long-tail open-weight providers (Groq, Cerebras, Nebius, AkashML, Parasail,
Friendli) are largely missing. For closed frontier models the depth is comparable (GPT-5.5:
2 vs 2; DeepSeek V3.1: 6 vs 7), and Vercel occasionally has more depth (Qwen 3 235B: 4 vs 1)
— but the open-weight long tail is where cost arbitrage lives and OpenRouter is clearly
ahead. Fewer published granular routing knobs (no documented equivalent of
provider.zdr,provider.data_collection: "deny",provider.quantizations, orprovider.preferred_min_throughput). The product is Vercel-platform tilted: free credits and dashboard tooling are scoped to a Vercel account, app attribution assumes a Vercel deployment. We are on Cloudflare Pages (ADR 011) with no Vercel account, so adopting it pulls in an entire platform relationship for the LLM layer alone. Newer product (GA 2025) with a thinner public track record than OpenRouter, which has been the de facto cross-provider LLM marketplace since 2023. - Decision: Rejected. Real alternative, but adopting it would mean taking on Vercel as a platform purely for inference while losing routing controls we already use. If a future feature ever needs Vercel-only capabilities (a Next.js-on-Vercel deployment, Vercel's app attribution), this is worth revisiting.
Not an alternative — composable. The Vercel AI SDK (the client/server library —
useChat,useCompletion,useObject, streaming primitives) is a separate product from the Vercel AI Gateway. It is framework-agnostic, runs fine on Cloudflare, and has an official OpenRouter provider (@openrouter/ai-sdk-provider). If we build a streaming chat UI, the Vercel AI SDK on top of OpenRouter is the natural fit — the two compose, they don't compete.
Eden AI
A multimodal aggregator with ~30 provider integrations across text, OCR, document parsing, speech, translation, and image analysis — leaning on specialist providers per modality (Mindee for OCR, DeepL for translation, Deepgram for speech, etc.) rather than the LLM/VLM generalists. Unified billing on a credit balance with a 5.5% platform fee on credit top-up — effectively the same cost model as OpenRouter, so no price advantage either way. GDPR-native, EU data residency by default, headquartered in France.
- Decision: Rejected. The "multi-modality under one contract" angle would be compelling if OpenRouter only did text — but it doesn't. OpenRouter covers a superset of the modalities we need: LLMs, VLMs, embeddings, image generation, and (since the 2025 audio APIs launch) STT and TTS via Whisper, Chirp 3, GPT-4o Transcribe, Groq Whisper, and similar TTS providers. Eden AI's specialist providers (DeepL, Mindee, Deepgram) are higher-quality on their narrow domains but we don't need them — our document parsing is already VLM-based, voice navigation in cooking mode uses the browser-native Web Speech API (see the recipe-site roadmap's Voice / Hands-Free Navigation section), and translation isn't on the roadmap. The remaining real differences are smaller LLM catalogue (~30 vs OpenRouter's 60+) and less granular routing controls. Eden AI's EU data residency posture is nice but isn't a trigger for a B2C consumer product without that as an explicit positioning claim.
AI/ML API
OpenAI-compatible aggregator with its own catalogue and unified billing. Positioned as a lower-cost OpenRouter alternative.
- Decision: Rejected. Smaller catalogue and ecosystem than OpenRouter, with no clear feature advantage for our workload. OpenRouter's incumbency, integration in the existing pipeline, and routing-control surface area outweigh any marginal price difference.
LLM Gateways (BYOK — solve a different half of the problem)
Gateways do not remove per-provider account or billing sprawl. They add observability, policy (rate limits, budgets, guardrails), key management, and routing on top of provider relationships we still have to set up and own. Useful as a complement to an aggregator, not as a replacement.
Cloudflare AI Gateway
- Pros: Sits inline with the rest of the Cloudflare estate (ADR 011, ADR 029, ADR 039). Free tier. Recent Secrets Store integration (Aug 2025) centralises API key storage. Adds Guardrails (content-safety inspection on prompts and responses), which OpenRouter does not — the one capability that could realistically pull us toward layering AI Gateway later, if user-facing AI inputs become an abuse vector.
- Cons: To use Claude via Bedrock through AI Gateway, the project still needs an AWS account, IAM user, SigV4 setup, and a Bedrock billing relationship; AI Gateway signs requests with credentials we provide. Vertex AI similarly requires a Google service account JSON. Every provider needs to be onboarded individually. No first-party cross-provider model catalogue and no fallback to alternative providers for the same model. Caching overlaps heavily with what OpenRouter already provides (response caching, prompt-cache pass-through) — semantic caching, which would be a real differentiator, is on Cloudflare's roadmap but not yet shipped. And the Cloudflare dashboard is not a particularly strong general-purpose observability tool; if we wanted a single pane of glass across the whole product (Workers, D1, AI, business metrics) we'd more likely introduce a dedicated platform like Grafana Cloud — a separate ADR.
- Decision: Rejected as the primary access layer. Worth layering in front of OpenRouter later if guardrails become load-bearing for user-facing AI features — the one realistic trigger for a B2C consumer recipe product — and accepting the per-provider onboarding cost that comes with it. Pure caching is not on its own reason enough (OpenRouter already covers it), and general observability is not a reason to reach for AI Gateway either — that's a separate decision about observability tooling. The SDK doesn't change either way.
LiteLLM
Open-source (MIT) proxy. Self-hosted or via the managed LiteLLM Cloud offering — both BYOK at the inference layer.
- Pros: Same provider-abstraction surface as the aggregators. No third-party dependency on the inference path if self-hosted. Strong observability and routing primitives.
- Cons: Self-hosting is a platform we'd have to run, scale, and pay for; the managed cloud removes that but is still BYOK so the per-provider account/billing problem persists either way. It's a router, not a marketplace.
- Decision: Rejected. Buys flexibility we don't need at the cost of platforms and per-provider admin we explicitly want to avoid.
Portkey
Production-focused BYOK gateway with Virtual Keys / Model Catalog for centralised key management, budgets, guardrails, and routing. Markets "1,600+ models" — note this is the aggregate across upstream providers Portkey can proxy to (Together alone exposes hundreds of open-weight variants), not a Portkey-curated catalogue comparable to OpenRouter's 358 or Vercel's 275. Portkey is a pure router, not a marketplace. Pricing is per-log (~$49/mo for 100K logs at the time of writing).
In May 2026, Palo Alto Networks — one of the largest pure-play cybersecurity vendors, known for premium enterprise pricing — announced intent to acquire Portkey. PANW acquisitions typically de-prioritise self-serve / SMB tiers in favour of enterprise SKUs sold to security teams, push pricing up, and shift the roadmap toward audit / compliance / DLP features rather than developer ergonomics. The open-source Portkey gateway could stagnate or be re-licensed. This is a yellow flag for any project planning to use Portkey self-serve at SMB scale, even though existing customers usually keep service on grandfathered terms for 12–24 months post-acquisition.
- Decision: Rejected. Strong enterprise feature set we don't need, BYOK at the inference layer so doesn't remove provider sprawl, and log-based pricing scales unfavourably for a project where individual feature usage may be high-volume (every recipe ingestion is a request). The PANW acquisition reinforces the rejection — the product is heading further away from the self-serve developer use case, not toward it.
Kong AI Gateway
The AI extension of Kong's enterprise API gateway. Adds semantic caching, semantic routing, PII redaction, model lifecycle management, and governance on top of BYOK provider relationships. Enterprise licensing, typically $50K+/year.
- Decision: Rejected. Enterprise governance platform aimed at organisations standardising agentic AI across many teams — operationally and commercially disproportionate to a personal/portfolio project. BYOK at the inference layer; same provider-sprawl story as the other gateways.
Custom Internal Abstraction over Provider SDKs
A typed LLMClient interface in the domain layer with adapters per provider, switching by config.
- Pros: No third-party gateway; bespoke fit.
- Cons: Every new model and feature (prompt caching, vision, structured outputs, streaming) becomes another adapter PR. Reimplements OpenRouter at greater cost and worse coverage. Strong variant of Not Invented Here (NIH) syndrome.
- Decision: Rejected. This is the work to avoid.
Consequences
Positive
- One bill, one key, one SDK across the whole project. The ML pipeline and any future
Cloudflare Worker handling AI features use the same OpenAI SDK pointed at OpenRouter, with the
same
OPENROUTER_API_KEY. Secrets management surface area stays flat as more AI features ship. - Higher effective availability than first-party APIs. For widely hosted models (Claude on Anthropic + Bedrock + Vertex; Llama on Together + DeepInfra + Fireworks + Bedrock), OpenRouter falls back across providers transparently. The Anthropic API outages around major Claude releases stop being our problem.
- No Bedrock IAM, no Vertex service accounts, no per-provider consoles — capacity from those providers is reachable without taking on AWS/GCP as a platform.
- One-line model swaps in experiments.
params.yamlalready lists per-stage models; trying a new VLM or LLM is changing one string and rerunningdvc repro(ADR 029). Stage caching means only the affected stages re-execute. - Per-stage cost/perf tuning via model choice.
params.yamlalready mixes Gemini 3 Flash Preview (vision-heavy stages) with Gemini 2.5 Flash (cheaper text stage). Future tuning can swap any stage's model on a one-line edit, or reach forprovider.sort/:nitro/:floorif provider-level optimisation ever proves worth it. - Structured outputs already wired in.
response_format: json_schema, strict: trueis already used for every pipeline stage, and Cooklang's frontmatter schema is a perfect fit for it. - VLM and LLM in one place. Photo ingestion and text normalisation use the same client; future features (photo URL import, smooth onboarding) inherit this with zero new integration work.
- Built-in usage observability — historic activity dashboard filterable by model/provider/key removes the immediate need to roll our own LLM telemetry.
Negative
-
Third-party dependency on the live inference path. Once user-facing AI features ship (recipe ingestion at onboarding, cooking-mode voice navigation, AI URL import, meal planning, nutritional analysis), OpenRouter availability becomes load-bearing for product UX. An OpenRouter outage degrades onboarding, blocks live transcription, breaks meal planning. The BYOK fallback isn't an instant escape hatch — it requires us to have already onboarded the affected provider directly, which is itself a planning decision we'd need to have made in advance for the heavy-traffic paths. Mitigation: BYOK pre-configured for the top traffic providers as scale grows, status monitoring, and accepting that long-tail experimentation features will be down during an OpenRouter outage.
-
Per-user quotas, metering, and abuse protection are our responsibility, not OpenRouter's. OpenRouter has account-level rate limits and the
userparameter for activity tracking, but it isn't a per-user quota engine. At product scale, enforcing "free tier gets N ingestions/month, premium tier unlimited", detecting prompt injection / abuse, and per-user budget caps live in our Workers + D1 backend. OpenRouter is the inference layer; the metering and access-control layer is ours to build. If our backend logic isn't enough, Cloudflare AI Gateway specifically offers Guardrails (content-safety inspection) as a layered option — but per-user budgets, abuse heuristics, and product-shaped rate limits remain ours to own either way. -
Data flows through OpenRouter and the AI-side audit trail lives there, not on our infrastructure. For GDPR data subject access / deletion requests on AI-related data, we coordinate with OpenRouter as a sub-processor rather than self-serving from our own logs.
provider.zdr: trueandprovider.data_collection: "deny"mitigate retention/training concerns where the provider supports them, but the request still transits OpenRouter's infrastructure. For a B2C consumer product this is a normal sub-processor relationship and the right trade-off; if the product ever takes on a partner or use case that demands in-house data flow, Cloudflare AI Gateway as a BYOK proxy is the documented escape hatch (with its own per-provider onboarding cost). -
Added latency on every request. OpenRouter is one extra hop between our backend and the upstream provider. Inconsequential for async features (batch ingestion, AI URL import, meal planning generation); potentially meaningful for low-latency interactive features — most notably server-side voice transcription in cooking mode if we move off the browser-native Web Speech API. Per-request TTFT is visible in OpenRouter's dashboard so we can measure rather than guess.
-
Spend goes through OpenRouter, not directly to the model provider. We can't draw down provider startup credits (AWS Activate, Google Cloud credits, Anthropic / OpenAI startup programmes) unless we BYOK that specific provider — which reintroduces the per-provider account/billing setup we're trying to avoid for the long tail. At meaningful product spend the ~5.8% effective markup is a real cost; the Cost Over the Lifecycle BYOK path is the mitigation, but only closes the gap for providers we onboard directly.
-
Feature-launch lag for provider-native capabilities. When Anthropic ships a new prompt caching mode, OpenAI ships the Responses API, or Google ships new safety controls, there can be a gap before OpenRouter exposes them through the unified API. BYOK direct or a temporary direct-SDK integration for one stage is the escape hatch if a feature is genuinely blocking a release.
-
Commercial single point of failure. OpenRouter is a venture-backed startup with no acquisition or IPO history. Pricing changes, an unfavourable acquisition (rug-pull risk), or worst-case business failure are live risks for any AI feature in critical path. Bounded by the OpenAI-compatible thin-integration shape — migrating to another aggregator (Vercel AI Gateway, Eden AI), to a BYOK gateway with full provider sprawl (Cloudflare AI Gateway, Portkey), or to provider-direct is a baseURL change plus provider re-onboarding, not a code rewrite. Keep the integration on the OpenAI-compatible surface and treat vendor-specific OpenRouter features as nice-to-have, not load-bearing.
When to Revisit
This decision should be revisited if any of the following become true:
- Spend on easy-to-BYOK providers grows large enough to justify the operational cost of setting up direct accounts. The first response is selective BYOK on OpenRouter (Anthropic / OpenAI / Google AI Studio direct keys, fallback unchanged), not switching access layer. Only if BYOK on OpenRouter doesn't close the gap should switching come into scope.
- Spend grows large on providers we can't BYOK cheaply (Bedrock, Vertex), and the ~5.8% markup on that subset becomes a meaningful absolute number. At that point a head-to-head with Vercel AI Gateway is warranted — they run at ~2.5% effective (Stripe passthrough) so the ~3pp gap on that traffic could justify accepting the catalogue/depth trade-offs.
- Prompt caching across providers (Anthropic-style cache control, OpenAI cached input pricing) becomes load-bearing for a feature and OpenRouter's pass-through coverage lags. At that point going direct for one specific stage via BYOK is reasonable.
- User-facing AI features become abuse vectors. If a Phase 3+ cooking-assistant chat, AI URL import, or AI photo import accepts freeform user input that ends up at a model, the surface area for prompt injection, jailbreak attempts, or generating harmful content grows. At that point Cloudflare AI Gateway's Guardrails layered in front of OpenRouter — without changing SDK code — is a reasonable response, accepting the per-provider onboarding cost that comes with it.
- An outage pattern on OpenRouter itself proves more disruptive than first-party APIs.
Until then, OpenRouter is the right point on the curve: maximum model agility, minimum platform sprawl, and a clean BYOK migration path as scale arrives.