DeepfakescriticalUpdated 2026-04-17orig. 2026-04-0914 min

Every Person on the Video Call Was Fake: The $25.6 Million Deepfake Heist

In 2024, a Hong Kong finance worker wired $25.6 million after a deepfake video call with his CFO. Social engineering is entering a new era. Incident response and security awareness training for the deepfake threat era.

Phillip (Tre) Bucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.

A single deepfake video call cost one company $25 million

In the year after this incident was reported, "a deepfake video call stole $25M" became a talking point in roughly every cybersecurity conference keynote I attended. Every one of them used it to sell something. Very few of them actually broke down what happened, why it worked, what the controls were, and what should have stopped it.

Here's the walkthrough. Arup is a global engineering firm. Their Hong Kong office got hit. The loss was HK$200 million. The attack used real-time deepfake video of the CFO and several recognizable colleagues. It was sophisticated but not unique. The same tooling is now available to basically any motivated attacker.

The Arup Hong Kong heist

In late January 2024, a finance worker at Arup's Hong Kong office received a message that appeared to come from the company's UK-based Chief Financial Officer. The message requested a confidential financial transaction and invited the worker to join a video conference call to discuss the details [1].

The worker joined the call. On screen were the CFO and several other colleagues, all of whom the worker recognized. They discussed the transaction, the CFO confirmed the instructions. And the worker proceeded to execute 15 separate wire transfers totaling HK$200 million, approximately $25.6 million USD [2].

Every person on the call was a deepfake.

The attackers had obtained publicly available video and audio of the real executives from conference presentations, earnings calls. And social media. They used this material to generate real-time deepfake video and cloned voices that were convincing enough to fool a trained finance professional during a live, interactive video call.

The fraud was discovered a week later when the worker followed up with the real CFO's office about the transactions. By then, the money had been moved through multiple accounts and was largely unrecoverable.

How the deepfakes were generated

Creating a convincing real-time deepfake for a video call requires three components [3]:

Face synthesis. The attacker needs 3 to 10 minutes of video of the target's face from multiple angles. Conference recordings, YouTube talks, and LinkedIn videos provide ample material. Modern face-swap models (based on architectures like SimSwap, FaceFusion, or proprietary tools) can generate photorealistic face replacements in real time with consumer-grade GPUs. An NVIDIA RTX 4090 can render deepfake video at 30 frames per second with sub-100ms latency.

Voice cloning. Current voice cloning technology requires as little as 3 to 10 seconds of clean audio to generate a usable voice model [4]. With 30 to 60 seconds of audio, the clone is nearly indistinguishable from the real person, capturing accent, cadence, pitch. And speech patterns. Services like ElevenLabs, Resemble.AI. And open-source tools like OpenVoice can produce real-time voice output with latency under 200ms.

Behavior modeling. The most sophisticated attacks incorporate behavioral mimicry. The deepfake puppeteer studies how the target speaks (filler words, pauses, hand gestures) and replicates these patterns. On a video call with typical compression artifacts and mediocre lighting, the result is nearly undetectable.

Deepfake-as-a-Service

The tools needed for this attack are now available as commercial services on both the open web and dark web marketplaces [5]:

Consumer tools ($0 to $50/month): Apps like FaceFusion and DeepFaceLab are free and open source. Commercial tools like Synthesia and HeyGen are designed for legitimate video production but can be repurposed
Underground services ($1,000 to $10,000): Full-service deepfake creation, including real-time puppeteering for video calls, document forgery with deepfake ID photos, and voice clone development
Custom operations ($10,000+): Targeted attacks against specific individuals with rehearsed scenarios, multiple deepfake participants, and professional social engineering scripts

The Arup attack likely fell in the $5,000 to $15,000 range for the deepfake production. The return on investment: 1,700x.

Voice cloning: 3 seconds is enough

The speed at which voice cloning has advanced is staggering. In 2020, creating a convincing voice clone required hours of clean audio recordings. By 2023, that dropped to 5 minutes. By 2025, state of the art models can produce a functional clone from 3 to 10 seconds of audio [4].

Sources of target audio are everywhere:

Voicemail greetings (call the target's phone, let it go to voicemail)
Conference recordings (YouTube, Vimeo, corporate event archives)
Podcast appearances
Earnings calls (publicly available for executives of public companies)
Social media videos (Instagram, TikTok, LinkedIn)
Customer service recordings ("this call may be recorded for quality purposes" provides the attacker with material too)

The practical implication: if your voice has ever been recorded and is accessible online or over the phone, someone can clone it well enough to fool your colleagues, your family. And your bank.

Biometric fraud is exploding

The surge in deepfake capability has driven a corresponding surge in biometric fraud. According to Gartner, biometric fraud attacks involving deepfakes increased by 340% between 2023 and 2025 [6].

The most targeted biometric systems:

Video-based identity verification (the "take a selfie to verify your identity" flow used by banks, crypto exchanges, and government services). Deepfake videos bypass these checks at an alarming rate
Voice authentication (banking by phone, voice-activated smart assistants). Cloned voices pass voice biometric checks with increasing reliability
Facial recognition access control (building entry, device unlock). Printed or screen-displayed deepfake images can defeat some systems, while real-time deepfake video defeats most

Gartner predicts that by 2026, 30% of enterprises will no longer trust facial or voice biometrics as standalone identity verification methods due to deepfake capabilities [6]. This isn't a distant future prediction. It's a reflection of attacks happening right now.

The TAKE IT DOWN Act

In response to the surge in deepfake abuse (including non-consensual intimate imagery), the US Congress passed the TAKE IT DOWN Act, signed into law in 2025 [7]. The law:

Criminalizes the creation and distribution of non-consensual deepfake intimate imagery
Requires social media platforms and hosting providers to remove reported deepfake content within 48 hours
Establishes penalties of up to 2 years in prison and fines for creating non-consensual deepfakes
Creates a reporting mechanism through the FTC for victims

The law is a step forward for protecting individuals from deepfake abuse. But it doesn't address the corporate fraud vector demonstrated in the Arup attack. Business email compromise (BEC) and video call deepfakes fall under existing wire fraud statutes, which carry penalties of up to 20 years but require the FBI to investigate and prosecute.

How to verify identity on video calls

Given that real-time deepfakes are now practical, organizations need new verification protocols for high-value decisions [8]:

Callback verification. Before executing any financial transaction discussed on a video call, hang up and call the requester back on a known phone number (not one provided in the meeting invite or chat). This simple step would have prevented the Arup attack entirely.

Shared secrets. Establish code words or phrases with key personnel that must be spoken during any call authorizing financial transactions. Change these periodically. A deepfake puppeteer can't reproduce a secret they don't know.

Multi-channel confirmation. Require authorization through a separate channel. If the request comes via video call, require confirmation via a signed email, an authenticated Slack message, or an in-person signature.

Challenge questions. Ask something only the real person would know. Not publicly available information (birthday, alma mater) but operational details ("What was the final number in the Q3 forecast we reviewed yesterday?").

Liveness testing. Ask the person on the call to do something unexpected. Turn sideways and show their profile. Hold up a specific number of fingers. Pick up a nearby object. Current real-time deepfakes struggle with abrupt pose changes, hand interactions, and object occlusion.

Transaction limits and delays. Implement mandatory cooling-off periods for large transactions. No transfer over a certain threshold should execute within the same day it's requested. This gives time for verification and reduces the pressure tactics that social engineering relies on.

Detection tools and their limits

Several companies and research groups offer deepfake detection tools [9]:

Microsoft Video Authenticator analyzes videos for subtle artifacts at the blending boundaries where the deepfake face meets the real background
Intel FakeCatcher detects deepfakes by analyzing blood flow patterns in facial video (real faces show subtle color changes as blood pulses. Deepfakes don't)
Sensity AI offers a commercial deepfake detection API
Deepware Scanner is a free tool for checking uploaded videos

However, detection has fundamental limitations:

Adversarial arms race. Every detection technique can be countered. When detectors learn to spot blending artifacts, generators learn to eliminate them
Real-time detection is hard. Analyzing a recorded video for deepfake artifacts is feasible. Detecting a deepfake in real time during a live video call, with compressed video and variable lighting, is significantly harder
False positives. Poor lighting, low bandwidth, and bad webcams create artifacts that resemble deepfake artifacts. Detection tools tuned for high sensitivity will flag legitimate video calls
Post-compression analysis. Video calls compress video heavily (Zoom, Teams, and Google Meet all use lossy compression). This compression destroys many of the subtle artifacts that detection tools rely on

The uncomfortable truth is that detection will always lag behind generation. Prevention through verification protocols is more reliable than attempting to detect deepfakes in real time.

The trajectory

The cost of producing a convincing deepfake drops by roughly half every 12 months. The quality improves on a similar curve. Within two to three years, real-time deepfakes will be indistinguishable from real video even under expert analysis [10].

This means:

Video evidence will become unreliable in legal proceedings without cryptographic provenance
Video-based identity verification will require fundamental redesign
Remote work introduces new trust challenges when you can't physically verify who you're talking to
Financial controls must evolve to assume that any remote communication could be fabricated

The Arup attack was $25.6 million. It won't be the largest. The technology that made it possible is cheaper and better every month.

Sources

CNN, "Finance Worker Pays Out $25 Million After Video Call With Deepfake CFO," February 2024
South China Morning Post, "Hong Kong Police Report $25.6M Deepfake Video Call Fraud at Multinational Firm," February 2024
MIT Technology Review, "The Technology Behind Real-Time Deepfake Video Calls," 2024
ElevenLabs, "Voice Cloning: Technical Documentation," 2025; OpenVoice, "Instant Voice Cloning," arXiv:2312.01479
Recorded Future, "Deepfake-as-a-Service: The Commoditization of Synthetic Media Fraud," 2025
Gartner, "Predicts 2025: AI Will Disrupt Identity Verification and Biometric Security," 2024
US Congress, "TAKE IT DOWN Act," Public Law, 2025
FBI, "Public Service Announcement: Deepfake Audio and Video Used in Business Email Compromise," IC3, 2024
IEEE, "A Survey of Deepfake Detection Methods," Transactions on Information Forensics and Security, 2024
RAND Corporation, "The Future of Deepfakes: Implications for National Security," 2025

deepfakesocial engineeringincident responsesecurity awarenesscyber attackfraudresearch

Want us to check your Deepfakes setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Free Security Check Request Full Audit

Get new research in your inbox

No spam. No newsletter filler. Only new posts as they publish.

Every Person on the Video Call Was Fake: The $25.6 Million Deepfake Heist

#A single deepfake video call cost one company $25 million

#The Arup Hong Kong heist

#How the deepfakes were generated

#Deepfake-as-a-Service

#Voice cloning: 3 seconds is enough

#Biometric fraud is exploding

#The TAKE IT DOWN Act

#How to verify identity on video calls

#Detection tools and their limits

#The trajectory

#Sources