Valtik Studios
Back to blog
X (Twitter) automationinfoUpdated 2026-05-1017 min

building a real-time CVE detection-to-broadcast pipeline that hits X's 30-minute algorithmic velocity window

Engineering writeup of the Valtik flash-scanner pipeline we run in production. 26 RSS feeds polled every 10 minutes, 7-template rule-based drafter (not LLM, for cost + latency + hallucination reasons), 100-point validator that catches em-dashes and title-case dumps, auto-approve gate (validator score >=95 + CVE present + known vendor + 14-day dedup) that pushes 60-70% of drafts straight to live broadcast, real-time poster via xdk SDK, scheduled-drip fallback at 5am/9am/12pm PT. Three war stories included: the CVE dedup bug that posted PAN-OS twice (fix: dedup on the canonical id not the source URL), the OAuth1 token-scope footgun (fix: regenerate access token after flipping app perms, because tokens are baked at issue time and the X dev portal UI implies otherwise), and the 15-hour stuck-cron incident (fix: socket.setdefaulttimeout(10) at the top of every cron'd script). Plus the reply path that targets the algorithm's 75x weight for author-replies-to-replies, the highest single positive signal in the engagement graph.

Phillip (Tre) Bucchi headshot
Phillip (Tre) Bucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.

# building a real-time CVE detection-to-broadcast pipeline that hits X's 30-minute algorithmic velocity window

a couple of months ago we started measuring our own CVE tweets and realized something annoying. when a CISA "known exploited vulnerabilities" entry dropped at 14:02 UTC and we posted a thread about it at 14:47, the post died at ~3k impressions. when we caught the same class of story at 14:08 and posted at 14:11, the same content shape did 80k+ impressions and pulled 200 followers. nothing else changed. same vendor, same severity, same hashtag policy, same writing voice. the only delta was 36 minutes.

that's the whole problem this post is about. CVE news has a brutally short algorithmic shelf life on X, and a human in the loop cannot beat it consistently. so we built the pipeline. this is how it works, what we got wrong, and the gotchas worth knowing if you build the same thing.

the 30-minute shelf life of a CVE tweet

every credible teardown of X's post-source-code algorithm (the one that leaked in 2023 plus the changes documented through 2026) lands on the same conclusion. engagement velocity in the first 30 to 60 minutes is the single most important non-content feature. the heavyrank score that decides whether a post escapes your follower graph into the broader for-you index is dominated by the rate of likes, replies, bookmarks, and reposts per minute during that window.

the half-life of a typical infosec post is roughly 18 minutes. after 6 hours the post is functionally dead from a discovery standpoint. after 24 hours it might as well not exist. you can confirm this yourself with the analytics export. find any post older than a day and you'll see the impression curve is a knife edge that flattens to zero by hour 4.

so when CISA adds a Palo Alto bug to the KEV catalog at 14:00, you have a window. if you can detect it, draft a defensible take, and broadcast inside 30 minutes, you're competing for the slot. if you're at 90 minutes, you're posting into a void. manual posting cannot reliably hit 30 minutes. you have to be at a keyboard, you have to be paying attention to the right feeds, you have to draft something that doesn't read like marketing slop, and you have to validate it before it goes out. that's a 45-minute job done by a focused human, and humans aren't focused at 02:00 PT when half of the Ivanti zero-days seem to drop.

the answer is a pipeline. not a chatbot, not an "AI agent." a pipeline. small components, each with one job, each fast enough to run end-to-end inside the window.

architecture overview

cron (every 10m)
            |
            v
   +-----------------+
   |  flash scanner  |  valtik_news_flash.py
   |  26 RSS feeds   |  per-feed timeout 10s
   +--------+--------+
            |
            v
   +-----------------+
   | criticality     |  vendor regex
   | filter          |  severity regex
   |                 |  CVSS >= 7.0
   +--------+--------+
            |
            v
   +-----------------+
   |    drafter      |  valtik_tweet_drafter.py
   |  7 templates    |  slot fillers
   |                 |  bug_class detector
   +--------+--------+
            |
            v
   +-----------------+
   |   validator     |  valtik_tweet_validator.py
   |  score 0-100    |  length, em-dash, hashtags
   +--------+--------+
            |
            v
   +-----------------+
   | auto-approve    |  score >= 95
   | gate            |  CVE present
   |                 |  vendor present
   |                 |  not duped in 14d
   +----+--------+---+
        |        |
        v        v
   real-time   scheduled
   poster      drip queue
   (xdk)       (5a/9a/12p PT)

walk one CVE through. CISA's RSS feed publishes a KEV addition for CVE-2026-0042, a PAN-OS authentication bypass. the flash scanner's next 10-minute cron tick picks it up. criticality filter sees "Palo Alto" plus "actively exploited" and matches twice over. the drafter routes to the cve_breaking template, fills vendor=Palo Alto, cve=CVE-2026-0042, severity=critical, affects="PAN-OS firewalls," mitigation="patch 11.1.x to 11.1.5-h2," and a one-line operator take. validator gives it a 97. auto-approve gate confirms: score >= 95, CVE-YYYY-NNNNN present, "Palo Alto" matches the known-vendor list, and CVE-2026-0042 isn't in the queue or the last 14 days of posted history. approved=true. with --auto-post on, the next call into the flash scanner's post phase fires it via xdk. elapsed time from CISA publishing to our tweet hitting the timeline: 7 to 11 minutes depending on cron alignment.

the drafter. rule-based not LLM

obvious question. why not just pipe the story through an LLM and let it write the tweet.

three reasons. one, cost. at 3,744 feed fetches per day and roughly 40 to 60 stories per day passing the criticality filter, even at $0.001 per call you're paying for novelty you don't need. our drafter runs at zero marginal cost per draft.

two, latency. an LLM call is hundreds of milliseconds to multiple seconds. inside a cron job that needs to finish before the next tick fires, that matters less than you'd think, but when you multiply by retries and rate limits it stops being free.

three, and this is the real one. hallucinations on technical claims are credibility-fatal. if our drafter writes "unauthenticated RCE" when the advisory actually says "requires authenticated session," every infosec engineer reading the tweet immediately discounts everything else we post. the cost of one bad technical claim is a thousand correct ones.

so we have 7 templates. each is a python string with slot fillers:

  • cve_breaking for fresh CVE disclosures with a CVSS and a vendor
  • patch_tonight for "ship the patch now" urgency posts
  • post_mortem for ransomware / breach writeups after the fact
  • supply_chain for npm / pypi / dependency compromise stories
  • ai_gone_wrong for LLM jailbreaks, prompt injection, data leaks
  • privacy_outrage for surveillance / data-broker / regulator stories
  • generic_news_react as the fallback when nothing else matches

each template has slot fillers driven by regex extraction off the story title and summary. vendor, CVE, CVSS, who's affected ("PAN-OS firewalls," "Exchange 2019," "all Chrome stable"), the immediate mitigation if the advisory gives one, and the operator take. the operator take is a short, opinionated one-liner pulled from a per-template phrase bank ("patch tonight, not tuesday," "this is the third Ivanti bug this quarter," "if you run this in prod, assume breach until verified clean").

the bug_class detector is the credibility piece. it reads the story text for specific phrases. "authentication bypass" + "without credentials" goes to unauth. "post-authentication" or "requires valid session" goes to authenticated. "in default configuration" goes to exposed. "after social engineering" goes to user-interaction-required. it errs toward the less alarming claim. we'd rather underclaim a bug than overclaim it. one wrong "unauth RCE" call against a vendor's name shows up in their PR team's morning brief and that's a relationship we don't want to start.

hashtag policy is harsh. max 3 hashtags, all specific. vendor + topic + CVE number. zero generic tags. #cybersecurity and #infosec are documented to trigger the spam classifier, especially in combination with link posts. we strip them out at the drafter stage and the validator catches any that slip through.

validator. the gate that catches em-dashes and the title-case-dump bug

drafters fail in predictable ways. the validator catches them before they go out.

valtik_tweet_validator.py runs every draft through a fixed set of checks and returns a score from 0 to 100 plus an errors list and a warnings list. errors are hard fails. warnings reduce score but don't block. the checks:
  • length <= 280 chars (hard fail)
  • no em-dashes (hard fail. X strips them and you get a tweet that says "Palo Alto firewalls actively exploited patch tonight" which is incoherent)
  • hashtag count 0 to 5 (anything above 5 is a warning, anything above 8 is a hard fail)
  • no unfilled placeholder strings ({vendor}, [CVE], TODO, regex literals that leaked through)
  • no title case dumps (a recurring drafter bug where the headline got copied verbatim and we tweeted "Palo Alto Networks Discloses Critical Authentication Bypass In PAN-OS Firewalls" which reads like a press release)
  • CVE format check if a CVE-shaped string appears (must be CVE-YYYY-NNNNN, not CVE-YYYY-NNNN or other broken forms)

the score starts at 100 and each warning subtracts 5 to 15 points depending on severity. the auto-approve gate downstream wants 95 or higher, which means at most one minor warning is tolerated. anything below 95 drops to manual review.

the em-dash check is worth its own sentence. X's renderer historically mangled em-dashes (U+2014) in ways that broke tweets, and even now they're a tell that you used an LLM. the rule is. never let one through. our drafter doesn't emit them, but the validator catches the case where one snuck in via a quoted story title.

auto-approve gate. the productivity unlock

manual review of every draft was the bottleneck. you can write a perfect drafter and a perfect validator and still post nothing useful at 03:00 PT because no human is looking at the queue.

the auto-approve gate lives inside push_to_tweet_queue. four conditions, all must be true:

if (score >= 95
    and re.search(r'CVE-\d{4}-\d{4,7}', tweet_text)
    and any(v.lower() in story_text.lower() for v in KNOWN_VENDORS)
    and not cve_already_seen(extracted_cve, window_days=14)):
    draft['approved'] = True
    draft['auto_approved_at'] = now_iso()

the score gate is the validator's vote. the CVE-present gate prevents a generic "ransomware hits hospital" story from auto-firing without a concrete technical hook. the vendor-present gate ensures the story has at least one name we recognize and the drafter wasn't routing on noise. the dedup gate is the topic of the next section.

in practice this means about 60 to 70 percent of drafts auto-approve. the failures sort cleanly. low validator scores usually mean the source article was sloppy and the slot fillers came out weird. missing CVE means it's a breach / policy / breach story where we don't have a technical anchor and a human should weigh in. missing vendor means the drafter routed on a regex hit that wasn't really about a vendor (the word "apache" in a non-Apache-software context, for example). every failure mode is recoverable, and the manual queue acts as a training signal. if the same failure shape repeats, we tune the drafter or the validator.

the CVE dedup bug that posted PAN-OS twice

we shipped without dedup. it took about four days to bite us.

CISA published a KEV addition for a PAN-OS CVE. BleepingComputer wrote about it 8 minutes later. TheHackerNews wrote about it 11 minutes after that with a slightly different headline. our flash scanner ran at the 10-minute mark, drafted from CISA. at the 20-minute mark it drafted again from TheHackerNews. both passed criticality. both passed the validator. both auto-approved (different source URLs meant different fingerprints, which is what our initial dedup keyed on). both posted. the timeline got two near-identical CVE tweets 13 minutes apart and we got a DM from a friendly account saying "dude."

the fix was to dedup on CVE id, not source url:

def cve_already_seen(cve_id, window_days=14):
    if not cve_id:
        return False
    queued = load_queue()
    posted = load_posted_history(days=window_days)
    haystack = [t.get('text', '') for t in queued + posted]
    return any(cve_id in text for text in haystack)
extract_cve_from_text runs a re.search(r'CVE-\d{4}-\d{4,7}', text). if we get a hit, we check it against everything queued and everything posted in the last 14 days. 14 days because some CVEs get a second wind (a new exploit drops, a new wave of mass exploitation, a vendor changes the CVSS) and we want a deliberate human decision to post a follow-up rather than the pipeline thinking it's fresh news.

the lesson is the one you've read a thousand times. dedup on the canonical identifier, not on the source.

OAuth1 token scope. a footgun worth its own section

the X dev portal has a UI that pretends two things are the same when they aren't. app permissions and access token scope are different objects. setting your app to read+write does not update existing access tokens. the tokens keep whatever scope they had at the time they were generated.

we lost most of an afternoon to this. our app showed "read and write" in the dashboard. our access token was generated when the app was read-only. every posts.create call returned 403 with no useful body. the dashboard kept telling us the app could write. it could. our token couldn't.

the diagnostic is one curl:

curl -i --request GET \
  --url 'https://api.twitter.com/1.1/account/verify_credentials.json' \
  --header 'Authorization: OAuth oauth_consumer_key="...", oauth_token="...", ...'

look at the response headers. x-access-level: read means your token is read-only no matter what the dashboard says. x-access-level: read-write means you can write tweets. x-access-level: read-write-directmessages means full scope.

the fix is mechanical. flip the app to read+write in the dashboard. then go to the "keys and tokens" tab. then regenerate the access token and access secret. the old token is now invalidated and the new one is minted with the current app permissions. update your env vars. test verify_credentials again. confirm the header now says read-write.

if you're using OAuth2 PKCE flow instead of OAuth1 user context, the equivalent gotcha is scope claims on the bearer token. the token was issued with whatever scopes you requested at auth time, and changing app permissions after the fact doesn't re-issue. you have to walk the user back through consent.

this is two paragraphs of pipeline that didn't have to exist. but it exists for everyone who builds on X's API, and there isn't a clear writeup of it anywhere in the dev docs as of this writing.

the 15-hour stuck-cron incident

short story, important lesson.

feedparser does not set a socket timeout by default. one of our feeds (i won't name it, they fixed it) started returning a TCP connection that accepted the SYN, completed the handshake, and then sent nothing. feedparser sat on the socket waiting for bytes. our cron held its lockfile. the next cron tick saw the lockfile and exited cleanly. the next one too. and the next. for 15 hours.

we noticed because the dashboard's "last flash run" timestamp on the homepage stopped updating. 90 missed cron firings. zero auto-posts during a window that included a publicized Ivanti advisory.

the fix is one line at the top of valtik_news_flash.py:

import socket
socket.setdefaulttimeout(10)

every socket created in the process inherits a 10-second timeout. feedparser, urllib, anything. a stuck remote can no longer wedge us. a slow remote times out cleanly, gets logged as a failed feed for that tick, and we move on to the next feed.

the broader lesson, the one i'd write on the wall. every external IO call inside a cron'd path needs an explicit timeout. requests calls. socket creates. database queries. subprocess invocations. if there's any chance of indefinite blocking, set a wall clock. cron jobs without timeouts will eventually get bitten by something on the network that pretends to be alive and isn't.

what the reply path adds (and why it's the actual growth lever)

if you only read the algorithm-leak stuff for the velocity numbers, you miss the headline finding. the weighting table is the headline.

approximate per-action weights in the X 2026 algorithm, derived from multiple post-leak teardowns and confirmed via behavioral testing:

  • like: 1x
  • repost: ~5x
  • reply received on your post: 13.5x to 27x
  • bookmark: ~10x
  • you reply to a reply on your post: ~75x

that last one is the highest single positive signal in the entire engagement graph. an author replying to a reply is interpreted as deep engagement, the kind that signals "this conversation is generating real discussion," and the algorithm rewards it harder than anything else you can do organically.

so a broadcast-only strategy is leaving most of the score on the table. we built valtik_reply_watcher.py to poll 6 cybersec accounts on X (a mix of well-followed researchers and aggregator accounts) and valtik_reply_drafter.py to generate 1 to 3 candidate replies per detected post.

the rules for the reply drafter are stricter than for the broadcast drafter. no "great post." no "this." no "agreed." every candidate reply has to do one of three things. add a fact the original didn't have, ask a specific technical question that advances the conversation, or share a concrete operator anecdote ("we saw this hit a client's edge yesterday, the IOC was X"). if the drafter can't find at least one of those angles in the source post, it emits zero candidates and the watcher moves on.

candidates stage to reply_queue.json for human review rather than auto-firing. the cost of a bad reply is much higher than the cost of a bad broadcast. a broadcast that misses the mark gets ignored. a reply that misses the mark gets seen by the original author and everyone who liked their post, and it tags us as a low-signal account. so this stays in the human loop for now.

what's next

three things we're building.

cross-poster to bluesky and mastodon. per SC Media's 2026 data, ~74% of the active infosec audience has either left X entirely or significantly reduced their X engagement since 2022. our broadcast pipeline currently serves the X audience well and the rest of the audience not at all. the cross-poster will take the same auto-approved draft, strip X-specific formatting (we'll be permissive on em-dashes and hashtags for Bluesky and Mastodon where they don't tank reach), and post in parallel. same pipeline, three endpoints.

receipt-image generator. screenshots beat text by a 3x to 5x factor in our analytics. we want to auto-generate a clean "valtik studios advisory card" image for every CVE auto-post. vendor logo, CVE id, CVSS, the one-line operator take, our branding. the image becomes the post. the text becomes the alt. this is partly an aesthetic choice and partly an algorithmic one. media-attached posts hit a different ranker.

LLM-augmented reply drafter. the heuristic reply drafter is fine but it caps out at the conversational depth a regex can produce. for the next tier, where the goal is to genuinely advance a technical conversation in a reply, an LLM is the right tool. the trick is to use it only in the reply path, where the human is in the loop and a bad output is caught before it ships, not in the broadcast path where there's no second pair of eyes. the LLM drafter will run against the post text plus our internal CVE / exploit corpus and emit candidates the human can ship or discard.

code is OSS-soon

we'll open-source the scrapers, the validator, and the queue tooling in Q3 2026. the auto-approve gate needs a few more months of corpus to tune against, and the templates have some valtik-specific phrase banks we want to factor out before publishing. but the architecture and the components are not load-bearing secrets. the value isn't in the code. the value is in running the code reliably for months while everyone else is still posting CVE takes by hand at 2pm pacific.

if you're building something similar, the three things to take away are. dedup on the canonical id and not on the source, set a socket timeout on every external IO in a cron path, and regenerate your access token after you change app permissions. the rest is just plumbing.

engineeringautomationx algorithmrssfeedparsercronoauth1cve detectionpipelinebroadcastmeta

Want us to check your X (Twitter) automation setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Get new research in your inbox
No spam. No newsletter filler. Only new posts as they publish.