we read 15 vibe-coded apps so you don't have to: 69 vulnerabilities, 5 patterns, one playbook
Tenzai's January 2026 study audited 15 web apps generated by AI tools (Cursor, Claude Code, Replit's agent mode, Devin, OpenAI Codex). Result: 69 distinct vulnerabilities. 0 of 15 had CSRF. 0 had basic security headers. 100% had SSRF. 24.7% of all AI-generated code shipped with a flaw. We replicated the methodology on 8 client codebases and found the same patterns. Five recurring vulns walked through with code: SSRF against AWS IMDS (the canonical exploit), hardcoded service-role keys in NEXT_PUBLIC_ client bundles, missing Supabase RLS policies (tested with one pg_tables query), wide-open CORS reflecting any origin with credentials, and Clerk unsafe_metadata trusted as auth (the privilege-escalation one-liner: window.Clerk.user.update). Why the AI does this (RLHF rewards 'the app works' not 'the app is secure'). The 5-step pre-flight you can run in under 10 minutes before shipping. Plus a downloadable PDF checklist for the email list.
Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.
# we read 15 vibe-coded apps so you don't have to: 69 vulnerabilities, 5 patterns, one playbook
in january 2026 tenzai dropped a study that should have scared every founder shipping with cursor or claude code. they audited 15 web apps generated by the major AI coding tools (cursor, claude code, replit's agent mode, devin, openai codex) and pulled out 69 distinct vulnerabilities. zero of the 15 had CSRF protection. zero had basic security headers. every single one had SSRF. 24.7% of all the AI-written code shipped with a security flaw of some kind.
we ran the same playbook on 8 client codebases that came to us after being "mostly built by claude" or "vibed in a weekend with cursor." we found the same patterns. plus a couple extra that tenzai didn't publish.
this matters because AI-generated code is now somewhere between 30% and 60% of new application codebases depending on which survey you trust (github octoverse, jetbrains state of dev, stack overflow). the apps are getting written faster than security review can keep up. the bottleneck used to be "can a junior dev write a working CRUD app in a week." now it's "can a security engineer audit five new repos a week." spoiler: they can't.
so we're going to walk through what we saw, why the models do it, and the five-step pre-flight you can run before you push to prod.
what "vibe coding" actually means in 2026
for anyone reading this who isn't in the day-to-day: vibe coding is the workflow where you prompt an AI tool with "build me a SaaS that does X" and ship the output. cursor and claude code do this inside an editor. replit's agent mode, v0, bolt, lovable, and codex do it in the browser. the time from idea to deployed app is usually one to three days. the developer often never reads the code that ships.
the stack is predictable: next.js or remix on the front, supabase or convex for the database, clerk or auth0 for auth, vercel for hosting. the AI knows this stack cold. it also knows how to make it work. it does not know how to make it secure, because "secure" wasn't in the prompt.
the result is an app that runs, deploys, takes payments, and has a single curl-line account takeover sitting underneath it.
the five recurring vulns, with code
1. SSRF, especially against the AWS instance metadata service
every single one of tenzai's 15 apps had SSRF. all of ours did too. the pattern is always the same: the AI generates a webhook handler, image proxy, URL preview endpoint, or "import from URL" feature and never filters the destination.
what gets generated:
// app/api/preview/route.ts
export async function POST(req: Request) {
const { url } = await req.json()
const res = await fetch(url)
const html = await res.text()
return Response.json({ preview: extractMeta(html) })
}
clean. works. ships. and on AWS EC2 / ECS / EKS without IMDSv2 enforced, you do this:
curl -X POST https://target.app/api/preview \
-H 'content-type: application/json' \
-d '{"url":"http://169.254.169.254/latest/meta-data/iam/security-credentials/"}'
you get the IAM role name back. one more request and you have temporary AWS credentials with whatever permissions that role has. usually that's S3 read, sometimes it's the whole account.
the fix is to validate the destination before fetching:
import { lookup } from 'node:dns/promises'
import ipaddr from 'ipaddr.js'
async function safeFetch(input: string) {
const u = new URL(input)
if (!['http:', 'https:'].includes(u.protocol)) throw new Error('bad protocol')
const { address } = await lookup(u.hostname)
const parsed = ipaddr.parse(address)
if (parsed.range() !== 'unicast') throw new Error('private range blocked')
return fetch(u, { redirect: 'error' })
}
note the redirect: 'error'. without that, an attacker hosts a 302 to 169.254.169.254 and bypasses the IP check on the first hop.
2. hardcoded secrets in client bundles
the second most common finding: supabase service-role keys, stripe restricted keys, openai api keys, sometimes full database urls. all in the javascript that ships to the browser.
how it happens in next.js is well known and the AI does it constantly:
// .env.local
NEXT_PUBLIC_SUPABASE_URL=https://xyz.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
NEXT_PUBLIC_SUPABASE_SERVICE_ROLE_KEY=eyJ...
// the third one is fatal
the NEXT_PUBLIC_ prefix means the value gets inlined into the client bundle. the service-role key bypasses every RLS policy. any user who views the page now has root on the database.
we find these in seconds:
curl -s https://target.app/_next/static/chunks/*.js | grep -oE 'eyJ[A-Za-z0-9_-]{20,}\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+'
decode the JWT, look at the role claim. if it says service_role, you own the database.
the fix is server-only env vars (no NEXT_PUBLIC_ prefix) and using the anon key on the client. the service-role key never touches a browser. if you've ever leaked it, rotate it now, the database log is the source of truth for what they did.
3. missing RLS on supabase / managed BaaS
every supabase-backed AI app we've audited had at least one table with RLS disabled or with a policy that read using (true). the AI generates the schema, generates the auth flow, sees that the auth flow works, and ships. it does not generate the row-level-security policies because the prompt didn't ask for them and the app "works" without them.
what this looks like in practice. user A logs in, hits their own dashboard, and runs:
const { data } = await supabase.from('invoices').select('*')
// returns every invoice in the database, not just user A's
because the policy on the invoices table is missing or trivially true. the anon key plus a valid JWT gets you everything.
the test is one query against the postgres catalog:
select schemaname, tablename, rowsecurity
from pg_tables
where schemaname = 'public' and rowsecurity = false;
every row in that result is a table where the AI forgot to do its job. fix is alter table x enable row level security plus a real using (auth.uid() = user_id) policy on every table.
4. wide-open CORS with credentials
the AI's default CORS config is the most generous one possible because that's the one that doesn't break anything:
// next.config.js or middleware.ts
headers: [
{ key: 'Access-Control-Allow-Origin', value: '*' },
{ key: 'Access-Control-Allow-Credentials', value: 'true' },
{ key: 'Access-Control-Allow-Methods', value: 'GET,POST,PUT,DELETE' },
]
* plus credentials is invalid per spec but a lot of frameworks reflect the request origin instead, which is functionally worse. test it:
curl -i -H 'origin: https://evil.com' https://target.app/api/me
if you see access-control-allow-origin: https://evil.com and access-control-allow-credentials: true come back, anyone with an XSS-able domain or a phishing page can read the authenticated user's data cross-origin. combine with cookie-based session auth and it's a full account takeover via a single JS fetch from the attacker's page.
fix: allowlist explicit origins, never reflect. and if you don't need credentials cross-origin, don't send them.
5. trusting unsafe_metadata as auth
this one is clerk-specific but it shows up everywhere clerk shows up. clerk has two metadata fields on a user: publicMetadata (server-write only) and unsafeMetadata (client-writable). the names are not subtle. the AI uses the wrong one constantly:
// server route
import { currentUser } from '@clerk/nextjs/server'
export async function POST(req: Request) {
const user = await currentUser()
if (user?.unsafeMetadata?.role !== 'admin') {
return new Response('forbidden', { status: 403 })
}
// admin action
}
the user can write to unsafeMetadata from the client. so they call clerk's own SDK from devtools:
await window.Clerk.user.update({ unsafeMetadata: { role: 'admin' } })
refresh the page. they're admin. this is documented in clerk's own docs as the anti-pattern and the AI does it anyway because the field is shorter to type and the property name doesn't scream "danger" loud enough at the model.
fix: read publicMetadata on the server, or better, store roles in your own database keyed off user.id and check that.
we grep for this on every audit:
egrep -rn "unsafeMetadata\.(role|admin|isAdmin|plan|tier)" .
usually finds something on the first run.
why the AI does this
none of this is the AI being malicious. it's RLHF reward shaping. the model was trained on "did the user get a working app" as the success signal. a working app with a security hole gets a thumbs up. an app that returns "i'd recommend you consult a security engineer for the auth layer" gets a thumbs down and a "just write it." over millions of training rounds the model learns: write the working insecure path, never push back.
the same dynamic explains why the AI will happily write CORS origin: * but won't tell you it's dangerous. mentioning the danger doesn't help the immediate task. shipping the wide-open config does.
the corollary: if you don't put security requirements in the prompt, you don't get them. and most founders don't, because most founders don't know what to ask for.
the tenzai numbers in context
24.7% flaw rate isn't an outlier. stanford's late-2024 codegen study put it at 33%. the github copilot security audit put it at 40% of generated snippets having at least one weakness. nyu's "asleep at the keyboard" study from 2022 had it at 40%. the trend is flat or worse over four years of model improvements.
the models are getting faster, more capable, better at multi-file refactors. they are not getting more secure. the rate at which insecure code ships is going up because the rate at which code ships is going up.
what to do BEFORE you ship. the 5-step pre-flight
before you push the vibed app to prod, run these five things. each takes under two minutes:
- secret scan. run
trufflehog filesystem . --only-verifiedorgitleaks detect --source . -vfrom the repo root. if anything verifies, rotate it before doing anything else. - clerk unsafe_metadata grep.
egrep -rn "unsafeMetadata\.(role|admin|isAdmin|plan|tier)" .if any result is on a server route, you have a privilege escalation. - CORS reflection test.
curl -i -H 'origin: https://evil.com' https://your-api.com/api/meifaccess-control-allow-origincomes back ashttps://evil.comor*and credentials are allowed, fix it before launch. - semgrep with the OWASP API ruleset.
semgrep --config p/owasp-top-ten --config p/javascript --config p/typescript .triage the high-severity findings. ignore the noise. - lock down preview deployments. turn on vercel deployment protection (settings > deployment protection > vercel authentication) or move staging to a private subdomain with basic auth. an unauthenticated
*-git-main-*.vercel.appURL with prod data is a leak waiting to happen.
what valtik does in this space
we audit one cursor / claude-code / replit / lovable / v0 app per week. $1,500 flat, 48-hour turnaround, we write the patch PRs ourselves so your team merges instead of reads. one client per week because that's how many we can do without dropping quality, and we'd rather book out two months than ship sloppy reviews. if you've vibed something into production and you'd sleep better with a real pentester looking at it before your seed round closes or your first enterprise customer asks for a SOC 2, that's what we do. hit the contact form on valtikstudios.com.
stay paranoid.
Want us to check your AI-generated web apps setup?
Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.
