CVE-2026-33626: Critical SSRF Vulnerability Found in LMDeploy AI Inference Framework

A critical server-side request forgery vulnerability in LMDeploy, one of the most widely deployed open-source frameworks for running large language model inference at scale, was publicly disclosed on April 21, 2026. Tracked as CVE-2026-33626 and assigned a CVSS score of 9.1, the flaw allows an unauthenticated attacker to force the LMDeploy server to make arbitrary HTTP requests on the attacker’s behalf — a class of vulnerability that is deceptively simple to exploit and extraordinarily difficult to fully contain in complex infrastructure environments.

The vulnerability is particularly significant because of where LMDeploy sits in the AI infrastructure stack. Organizations deploying their own LLMs — for enterprise chatbots, coding assistants, document processing pipelines, or any workload that requires local model inference — frequently run LMDeploy on servers with substantial internal network access. An SSRF in that position is not a bounded web application problem. It is a pivot point into everything the inference server can reach.

What Is LMDeploy and Why Is It Everywhere

LMDeploy is developed and maintained by Shanghai AI Laboratory, the same organization behind the InternLM model series. It is designed to solve a practical problem that every organization running their own language models faces: getting transformer-based models to serve inference requests at production throughput with acceptable latency and memory efficiency. LMDeploy does this through a combination of techniques — continuous batching, quantization, KV cache management, and an optimized attention kernel implementation — that collectively allow it to serve models much more efficiently than a naive implementation.

The framework has become a default choice for teams deploying models in the 7B to 70B parameter range, particularly on NVIDIA GPU clusters. It integrates with the OpenAI-compatible API format, which means it can serve as a drop-in backend for applications built against the OpenAI SDK. This compatibility has accelerated its adoption because organizations migrating from OpenAI’s hosted API to self-hosted infrastructure can swap in LMDeploy with minimal code changes. The GitHub repository has accumulated tens of thousands of stars, and the framework appears in a substantial fraction of enterprise LLM deployment stacks across Asia, Europe, and North America.

The Vulnerability Mechanics

CVE-2026-33626 exists in LMDeploy’s multimodal input handling pathway — specifically in the logic that processes image URLs submitted as part of vision-language model inference requests. When a user submits a request that includes an image URL (for models that support visual input such as InternVL or LLaVA variants), LMDeploy fetches that URL server-side in order to pass the image data to the model. The fetching logic does not perform adequate validation of the target URL, which means an attacker can submit a URL pointing to an internal service rather than a public image.

In practice, this means an attacker who can reach the LMDeploy API endpoint can instruct the inference server to make HTTP GET requests to any URL the server’s network stack can resolve. The most obvious targets are internal metadata services — cloud provider metadata endpoints at addresses like 169.254.169.254 in AWS environments, which can return instance credentials, IAM role configurations, and other sensitive data that the attacker can read from the SSRF response. But the attack surface extends well beyond cloud metadata. Internal Kubernetes API servers, Redis instances, Elasticsearch clusters, internal HTTP admin interfaces, other microservices that trust requests from within the network — all of these become reachable through the LMDeploy server.

The vulnerability requires no authentication to exploit if the LMDeploy API endpoint is accessible, which is a common configuration in private network deployments where the assumption is that internal services are trusted. This assumption is precisely what makes SSRF so effective: organizations often harden their external perimeter while leaving internal service-to-service communication relatively open, and SSRF turns a single externally accessible entry point into a general-purpose tunnel into the internal network.

Discovery and Disclosure Timeline

The vulnerability was discovered by a researcher at a European cybersecurity consultancy specializing in AI infrastructure security. The researcher reported the issue to Shanghai AI Laboratory through their security disclosure program on March 14, 2026. The LMDeploy team acknowledged the report within 48 hours and assigned an internal tracking identifier. A patch was developed and merged into the main branch on April 8, 2026, and version 0.7.3 was released on April 17 with the fix included. The CVE was assigned on April 19 and publicly disclosed on April 21, 2026, four days after the patched release.

The disclosure timeline is relatively tight by industry standards — approximately five weeks from report to public disclosure. The LMDeploy team’s decision to release the patch before public disclosure gave organizations a four-day window to update before the vulnerability details became public. Whether that window was sufficient for most deployments is debatable; LMDeploy instances embedded in larger platform stacks often require coordination across teams to update, and four days is a narrow margin for anything other than the most mature patch management programs.

Affected Versions and the Patch

All LMDeploy versions prior to 0.7.3 are affected. The fix introduces URL validation logic that checks submitted image URLs against an allowlist of permitted schemes (http and https only), blocks requests to non-routable IP ranges including loopback addresses, link-local ranges, and private network ranges as defined in RFC 1918, and adds a configurable timeout and redirect limit to the fetching logic to prevent slow-read and redirect-chain attacks.

The patch also adds an optional strict mode that operators can enable to restrict image URL fetching to a user-defined allowlist of domains. For organizations that know exactly which image hosts their applications should be accessing, this mode eliminates the SSRF surface entirely rather than just reducing it. The LMDeploy maintainers recommend enabling strict mode for any production deployment where the API is accessible from untrusted networks or where the inference server has access to sensitive internal resources.

The Broader AI Infrastructure Security Picture

CVE-2026-33626 is not an isolated incident. It follows a pattern that the security community has been documenting with increasing frequency as organizations move from experimenting with AI models to deploying them in production infrastructure. AI inference frameworks were largely developed by research teams under timelines that prioritized functionality and performance over security hardening. The result is a class of software that handles highly privileged server-side operations — HTTP fetching, file reading, subprocess execution in some cases — with validation logic that would not pass a basic security code review in a traditional web application context.

This is not a criticism of the developers. It reflects the environment in which these frameworks evolved. Research-oriented software written to explore what language models can do does not carry the same security expectations as software written to handle financial transactions or patient data. The problem arises when that research software gets deployed in production environments that do carry those expectations, often faster than the security hardening process can keep pace.

Ollama, another popular self-hosted LLM serving framework, had a series of security vulnerabilities disclosed in 2024 and 2025 that showed similar characteristics — inadequate input validation on server-side operations, missing authentication on administrative endpoints, and insufficient network isolation guidance in the documentation. The problems are structural, not coincidental, and they will continue to surface as long as the deployment of AI infrastructure outpaces the security maturation of the underlying frameworks.

What Defenders Should Do

The immediate action is to upgrade to LMDeploy 0.7.3. The release is available through PyPI and through the Docker Hub image repository maintained by Shanghai AI Laboratory. Organizations running LMDeploy inside Kubernetes should check their deployment manifests for the image tag and update to the patched version, then validate the update in a staging environment before rolling to production.

Beyond the immediate patch, organizations should evaluate the network position of their LMDeploy instances. Inference servers that have direct access to cloud metadata endpoints, internal databases, or administrative APIs should be placed behind a network policy that restricts outbound connections to only the destinations genuinely required for model operation. In Kubernetes environments, NetworkPolicy objects can enforce this restriction at the cluster level. In cloud environments, security group rules and VPC endpoint policies can achieve similar segmentation.

If strict mode is suitable for your use case, enable it and define an explicit allowlist of permitted image domains. For applications where users should not be submitting arbitrary image URLs at all — where all images come from known application-controlled sources — consider removing the multimodal URL-fetching capability entirely and pre-fetching images through application code before passing them to the inference server. This approach eliminates the SSRF surface at the application architecture level rather than relying on framework-level mitigations that could have additional bypasses.

Finally, treat this as a prompt to audit the rest of your AI infrastructure stack. LMDeploy is one framework. Other inference engines, model management systems, embedding services, and vector database interfaces may carry similar classes of vulnerability. A systematic review of every server-side HTTP request made by AI infrastructure components — what triggers it, what validation exists, what network resources are reachable — is the kind of proactive work that prevents the next disclosure from being a surprise.

The patched release and full changelog are available on the LMDeploy official GitHub releases page. Organizations running LMDeploy in production should verify their installed version against the release history and upgrade to 0.7.3 or later immediately.

Related coverage: CISA Adds 8 Exploited CVEs to KEV Catalog — broader vulnerability landscape context. Also: Malicious Docker Hub Images Supply Chain Attack — another active threat targeting AI and container infrastructure.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *