Valtik Studios
Back to blog
Deepfakeinfo2026-03-1110 min

Deepfake Detection in 2026: How to Spot AI-Generated Faces, Voices, and Video

Deepfakes cost companies $25M+ per incident. Here is what actually works for detection in 2026, what fails, and the step-by-step verification playbook we use on executive protection engagements.

TT
Tre Trebucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Pentester. Based in Connecticut, serving US mid-market.

# Deepfake Detection in 2026: How to Spot AI-Generated Faces, Voices, and Video

The Arup engineer in Hong Kong joined a video call with his CFO. Everyone on the call looked correct. Everyone sounded correct. He wired $25 million. The CFO was never on the call. Every face on that screen was synthesized in real time from public LinkedIn photos and earnings call audio.

That incident was 2024. The tools that pulled it off were custom at the time. In 2026 the same capability runs on a mid-range gaming laptop with an off-the-shelf model.

Deepfake detection is no longer a party trick. It's part of wire transfer authorization, executive communication, and identity verification. This guide covers what works, what has already been defeated. And the verification playbook we run during executive protection engagements at Valtik Studios.

The detection arms race in one chart

We see this pattern show up on almost every engagement.

Every deepfake detector published in academic papers between 2020 and 2023 has been defeated by a specific countermeasure from the generation side:

| Detector approach | Year published | Year defeated | How |

|---|---|---|---|

| Eye blink pattern analysis | 2018 | 2019 | Adversarial training added natural blink cadence |

| Facial warping artifacts | 2019 | 2020 | Higher resolution output + better upsampling |

| Physiological signals (pulse in skin) | 2020 | 2022 | Generators started modeling subsurface scattering |

| GAN fingerprint detection | 2020 | 2023 | Diffusion models replaced GANs, fingerprints changed |

| Frequency domain artifacts | 2021 | 2024 | Frequency-matched training loss |

| Lip sync analysis | 2022 | 2024 | Audio-visual joint models trained end-to-end |

| Deep learning ensemble (FaceForensics++) | 2022 | 2025 | Adversarial examples target ensemble weights |

The uncomfortable truth. Static detectors lose. The defense has to move to authenticated provenance (C2PA, cryptographic signing at capture time) or behavioral challenge-response during live calls.

Category 1: Image deepfakes

What works in 2026

Reverse image search with context. Yandex reverse image search (still better than Google for faces), TinEye, and Google Lens run in parallel. If an image shows a person in a specific context (CEO at a conference) and no earlier version of the image exists anywhere on the internet, that's a signal. Not proof, but signal.

Manual artifact inspection on hands, teeth, ears, jewelry. Modern diffusion models (SDXL, Flux, Stable Diffusion 3.5) still fail on:

  • Hand anatomy (extra fingers, merged fingers, bent wrong direction)
  • Teeth count and spacing (often uneven count in upper vs lower jaw)
  • Earring symmetry (asymmetric earrings on matched pair, or earrings that merge into hair)
  • Glasses frames (asymmetric rims, bent around the face incorrectly)
  • Background geometry (tile grout lines bend, books have gibberish spines, windows warp)

Zoom to 200% on the corners of the image. Check reflections (eyes, glasses, water) against the scene direction.

AI or Not (aiornot.com) and Hive Moderation. Commercial detectors with roughly 85-92% accuracy on out-of-distribution deepfakes. Both are defeated by targeted adversarial examples but work reasonably well against the casual fraud case.

Sightengine and DeepMedia. Used by news organizations and insurance fraud teams. Same caveat on adversarial resistance.

What doesn't work

  • "Asking it if it's AI-generated". Chatbots guess. The model that made the image can be prompted to say it's real.
  • JPEG compression artifact analysis. Deepfakes are often saved as re-compressed JPEGs that destroy the frequency-domain tells.
  • Error Level Analysis (ELA) from FotoForensics. Made for traditional photo manipulation, mostly useless against end-to-end generated images.

C2PA: the provenance play

Content Provenance and Authenticity (C2PA) is a cryptographic signing standard backed by Adobe, Microsoft, Sony, Canon, Nikon, BBC. And most of the major AI labs. Cameras capture, sign, and embed a manifest with every edit. When a C2PA-signed image arrives in a news wire or on social media, the signature either verifies against the capture device and the edit chain, or it doesn't.

Practical state in 2026:

  • Most current Nikon, Sony, Leica, and Canon cameras support C2PA signing at capture
  • Adobe Photoshop, Premiere, and Lightroom preserve and update C2PA manifests
  • Instagram, Meta, TikTok, YouTube, LinkedIn display "AI-generated" labels when C2PA says so
  • Adobe Content Authenticity extension for Chrome shows the full manifest for any signed image

C2PA doesn't prove an image is real. It proves the chain of custody. A C2PA-unsigned image isn't automatically fake. But a C2PA-signed image with valid chain-of-custody from a known camera is strong evidence of authenticity.

Category 2: Voice deepfakes

Voice cloning went from 30 seconds of audio in 2022 to 3 seconds of audio in 2025. ElevenLabs, Play.ht, Resemble. And open-source XTTSv2 and VALL-E reproductions can clone any voice from a podcast clip, a voicemail, or a Zoom recording.

The 2026 voice deepfake attack pattern

  1. Attacker scrapes target executive audio (earnings calls, conference talks, podcast appearances)
  2. Trains a voice clone. 30 minutes on an RTX 4090, zero minutes on ElevenLabs paid tier
  3. Calls the CFO or an EA from a spoofed caller ID
  4. Adds urgency: "I am on a plane, I need you to wire [amount] to [account] right now, the acquisition is about to fall through"
  5. The real voice, real speech patterns, real filler words. The victim complies.

What works in 2026

Callback verification on a known number. The only consistently effective defense. Caller claims to be CEO? Hang up. Call the CEO's mobile number from your contact list. The attacker can't intercept that callback.

Challenge-response with pre-shared context. Ask a question only the real person would know and that's not in their public content. Not "what's your mother's maiden name" (public). Ask about a specific recent private conversation, a specific lunch, a specific inside joke. Voice clones don't have memory of private context.

Liveness detection via cognitive challenge. Ask the caller to count backwards from 73 by 7s, or to spell a specific rare word, or to repeat a specific random phrase. Not because AI can't do it. It can. But because real-time voice cloning with an LLM backing it has latency. The pause to process the challenge often gives the attacker away.

AI voice detectors. Pindrop, Reality Defender, Resemble Detect. Used by contact centers and banks. Roughly 80-95% accuracy depending on model and codec. Degraded phone audio (G.711, Opus at low bitrate) hurts detection accuracy significantly.

What doesn't work

  • "I can hear that it sounds fake". You can't. The 2026 cloners pass blind listening tests against the real voice.
  • Waiting for the caller to stumble on slang or pronunciation. The models pick up accent and vocabulary from training audio.
  • Asking "are you really [name]". Clones say yes.

Category 3: Video deepfakes

The Arup scenario. Real-time face swap on a live video call.

The 2026 stack

  • DeepFaceLive (open source). Swap any face onto the operator's camera feed in real time, runs on consumer GPUs
  • LivePortrait and Runway Act-One. Drive a still image with real operator facial motion
  • Heygen, Synthesia, D-ID. Prerecorded talking head generation
  • Custom avatars. Fine-tuned from 5-10 minutes of target video

Real-time face swap is the one that shows up on Zoom calls. Prerecorded talking heads show up in phishing, fake news, and impersonation videos on social media.

What works in 2026

Movement-based liveness challenges. Ask the person to:

  • Turn their head 90 degrees to the side (profile view). Face-swap models trained on frontal views break at extreme profile angles
  • Put their hand in front of their face. Occlusion recovery is still weak. The face often "leaks" through the hand or glitches around the edge
  • Hold an object up next to their face (a coffee mug, a random book from their shelf). Same occlusion issue
  • Make a weird facial expression (stick out tongue, puff cheeks, squint hard). Expressions outside training data often glitch
  • Speak while moving their face fast. Motion blur combined with lip sync stresses the model

Ask for a camera angle change. "Can you pan to show me the window" or "can you show me the book on your desk." The attacker running DeepFaceLive is locked to their own face in their own environment. The room behind them won't match their claimed location.

Frame-level analysis (for recorded video). Tools like:

  • Intel FakeCatcher. Uses PPG (photoplethysmography) signals from skin pixels to detect blood flow, ~96% claimed accuracy on their benchmark
  • Microsoft Video Authenticator. Now retired, but the methodology moved into Azure Content Safety API
  • Sensity AI (formerly Deeptrace). Enterprise platform, used by news organizations
  • Deepware Scanner. Free consumer tool, decent against Sora and Veo outputs

C2PA for video. Same signing chain as images, supported by Sony FX and professional video cameras, preserved through Adobe Premiere exports.

What doesn't work

  • Watching for blinking patterns. Fixed years ago
  • Watching for head-body mismatch. Fixed in 2024
  • "It looks off somehow". Trust this feeling as a trigger for challenge-response, but don't trust it as proof

The Valtik verification playbook for executive communications

We run this exact playbook during executive protection engagements, especially for C-suite wire transfer authorization and acquisition communications.

Tier 1: anything asking for money movement

  1. Callback on a known-good number from contact list (NOT the number that called in, NOT a number read out by the caller)
  2. Out-of-band confirmation via a second channel (Signal message, Slack DM from the real account, physical presence)
  3. Code word verification. A pre-shared rotating code word between executive and finance team, changed quarterly
  4. If any of the above fails, transaction is denied and security team investigates before retry

Tier 2: live video calls with unusual requests

  1. Challenge the caller with a head turn to profile
  2. Ask them to hold up a random nearby object
  3. Ask about a private shared context (recent meeting, inside joke). Not anything that appears in public calendar or email
  4. If anything feels off, propose switching to a known scheduled meeting time and terminate

Tier 3: suspicious recorded content (media, social, evidence)

  1. Reverse image search the distinctive frames
  2. Run through 2+ commercial detectors (Sensity, Hive, Reality Defender)
  3. Check C2PA manifest with Adobe CAI extension
  4. If the content matters (legal, news, investigation), send to a forensic lab (Amped Software, Medex Forensics, Truepic)

Tier 4: high-risk identity verification (new hire onboarding, KYC, account recovery)

  1. Live video challenge-response (not a selfie, not a static liveness check)
  2. Document verification with NFC read (chip reads) when possible, not photo capture
  3. Behavioral biometrics during onboarding (typing cadence, mouse movement baselined)
  4. Delayed funds release on new accounts (standard 24-72 hours for high-value)

Organizational defenses that move the needle

Mandatory callback policy for wire transfers above a threshold. The single control that would have stopped the Arup scam. Doesn't matter if the voice is real, doesn't matter if the video is real. Callback to a known number is the authentication.

Pre-shared code word programs. Simple, low-tech, hard to defeat. Rotate quarterly. Everyone in finance and exec team has it. Any unusual request requires the code word. Any call without the code word triggers a challenge.

Executive digital footprint audits. Reduce the attack surface. How much public audio of the CEO is on YouTube, podcasts, earnings calls? Can you pull any of it down? Can the CFO stop doing interviews? In most cases, no. But at least you know what the attacker is working with.

Employee awareness training with live demos. Don't send a phishing simulation email. Run a live voice clone call against a sample of employees during an authorized tabletop exercise. The reaction is night and day compared to slide-based training.

Detection platform integration. Pindrop for contact centers, Reality Defender or Sensity for content review workflows, Persona or Incode for KYC.

What we recommend for small businesses

Most of the above is aimed at the enterprise. For a small business or individual:

  1. Never authorize a payment based on a voice or video call alone. Always callback
  2. Set up a family code word. Use it when a "relative in trouble" calls asking for money
  3. Lock down LinkedIn and exec bios. Less public audio/video = harder clone
  4. Use Signal with safety numbers verified for internal exec communication
  5. When in doubt, hang up and call back. Cost of the delay is nothing compared to the cost of a successful scam

Tools referenced

  • AI or Not: https://www.aiornot.com/
  • Hive Moderation: https://hivemoderation.com/ai-generated-content-detection
  • Sightengine: https://sightengine.com/
  • DeepMedia: https://www.deepmedia.ai/
  • Intel FakeCatcher: https://www.intel.com/content/www/us/en/newsroom/news/intel-introduces-real-time-deepfake-detector.html
  • Sensity AI: https://sensity.ai/
  • Deepware Scanner: https://deepware.ai/
  • Pindrop: https://www.pindrop.com/
  • Reality Defender: https://realitydefender.com/
  • Adobe Content Authenticity: https://contentauthenticity.adobe.com/
  • C2PA: https://c2pa.org/
  • Truepic: https://truepic.com/

Hire Valtik Studios

We run deepfake readiness assessments for executive teams. Red team engagements that test your wire transfer controls, your callback policy. And your staff response to voice and video impersonation. If you want to know whether your company would have approved the Arup wire, we'll show you.

Reach us at valtikstudios.com.

deepfakeAI securityvoice cloningvideo forensicssocial engineering

Want us to check your Deepfake setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Get new research in your inbox
No spam. No newsletter filler. Only new posts as they publish.