The Complete Guide to PDF Compression in 2026
How PDF compression actually works under the hood, what trade-offs each method makes, and how to pick the right level for your file.
Every week I get asked the same question by someone trying to email a scanned report or upload a portfolio: "Why is this PDF 80 MB and what do I do about it?"The honest answer is that PDF compression is not one technique — it's a stack of five or six different optimisations applied in sequence, and the "quality" setting in your tool of choice is really a preset that decides which ones run and how aggressively.
This guide walks through what is actually happening inside the file when you compress a PDF, why the same document can shrink to 5% of its original size in one tool and barely budge in another, and how to choose settings without ending up with a document that looks like a photocopy of a photocopy.
What is actually inside a PDF?
A PDF is a container. When you open one in a hex editor you find a sequence of objects— streams for images, streams for fonts, streams for the page descriptions themselves (vector instructions like "move pen to x,y, draw a rectangle, fill with this colour"), and a cross-reference table that lets the reader jump directly to any object without parsing the whole file.
For most documents that come out of a scanner or a phone camera, 80–95% of the bytes are images. The text on the page is usually only a few kilobytes; the photos and scans are megabytes each. That means PDF compression is, in practice, mostly image compression.
The five layers of PDF compression
1. Stream compression (lossless)
Every object inside a PDF can be wrapped in a filter. The most common is FlateDecode, which is the same DEFLATE algorithm zip files use. This is applied to text, vector instructions, and metadata. It is completely reversible: the decoder gets back exactly the bytes that went in. You almost always want this turned on. The savings are typically 40–60% on text-heavy pages and effectively free in terms of quality.
2. Image recompression (lossy)
Embedded JPEG photos are usually re-encoded at a lower quality factor. A modern compressor will analyse each image, decide whether it's a photograph (better as JPEG/JPEG XL) or a screenshot with sharp edges (better as PNG/lossless), and pick the format that gives the smallest output for the visual quality target. Going from JPEG quality 95 to quality 75 is roughly a 4× size reduction with very little visible difference.
3. Image downsampling (lossy)
Phones today shoot 12-megapixel photos. If you scan a receipt with one and drop it into a PDF, you have a 4000×3000 pixel image describing a 4-inch piece of paper. Downsampling reduces the resolution to something the page can actually display — typically 150 DPI for screen viewing or 300 DPI for print. A 4000×3000 image at 150 DPI for a letter-size page becomes about 1275×1650, roughly 5× fewer pixels. Combined with recompression, this is where the dramatic shrinks come from.
4. Font subsetting
A typical TrueType font has glyphs for thousands of characters. If your document only uses 90 of them, embedding the entire font is wasteful. Font subsetting strips out everything except the glyphs that are actually referenced. This typically saves 200–500 KB per font in a multi-page document.
5. Object deduplication and pruning
Many PDFs accumulate cruft over their lifetime — orphaned objects from earlier edits, duplicate copies of the same logo embedded on every page, unused form fields, structural metadata from accessibility tags that are never used. A linearisation pass walks the cross-reference graph, drops anything unreachable, and merges identical objects. On documents that have been through several rounds of editing, this alone can cut 10–20% of the file size.
Lossless vs lossy: when to use which
Lossless compression should always be on. There is no downside. Lossy compression is the dial you actually have to think about, and the right setting depends on what the document is for.
For documents you are emailing or uploading once and never reopening, screen-quality compression (≈100–150 DPI, JPEG quality 70) is fine. The recipient is going to view it on a phone or laptop screen and will not notice. For documents you are archiving or printing, you want at least 300 DPI and JPEG quality 85+ for any photographs. For legal or medical scans where every detail might matter later, do not apply lossy compression at all — the storage cost is not worth the risk of compressing away an important detail.
Why two tools give different results on the same file
Two compressors with identical settings can produce wildly different outputs because they make different choices at each layer. One might use a more aggressive JPEG quantisation table, another might re-encode all images to JPEG 2000 (smaller but slower to decode), a third might recompress vector graphics into rasterised images (terrible idea, but it happens). When evaluating a tool, run the same file through it twice at the same setting and check that the output is bit-for-bit identical. If it isn't, the tool is doing something non-deterministic and you should be suspicious.
The quality settings, decoded
Most consumer tools expose three or four presets: low, medium, high, and printer. Here is what those usually map to internally:
- Low / screen: 72–96 DPI image downsampling, JPEG quality 50–60. Suitable for files that will only ever be viewed on a screen at fit-to-page zoom.
- Medium / web: 150 DPI, JPEG quality 70–75. Good general-purpose default.
- High / ebook: 200 DPI, JPEG quality 80–85. Looks identical to the original on a typical screen, allows light zooming.
- Printer / press: 300+ DPI, JPEG quality 90+, no downsampling for vector content. Use when the file will be physically printed.
Privacy considerations
Most online compressors upload your file to their server, run the compression there, and return the result. For documents that contain personal information — passports, bank statements, medical records — this is a meaningful risk. Server logs, backups, and any breach exposes the file. Toolkiya's PDF compressor runs entirely in your browser using WebAssembly. Your file is never uploaded; the compression happens locally on your device. For sensitive documents this is the only sane default.
When compression cannot help
If your PDF is already small (under 1 MB) and contains mostly text, there is very little compression can do. The text is already FlateDecoded and the fonts are already subsetted by whatever produced the file. Trying to squeeze further with aggressive image recompression on text will only make it look worse. Similarly, scanned documents that have already been compressed once will not compress meaningfully a second time — every additional pass through a lossy encoder degrades the image without reducing size.
The right move with already-compressed files is to check whether the original (pre-PDF) source still exists. Going back to a high-resolution master and compressing once with the right settings always beats running a previously-compressed file through another round.
A practical workflow
For everyday compression I recommend this routine. First, look at the file size and ask: do I actually need to compress this? Anything under 5 MB will go through email and most upload forms without complaint. Compression below this threshold is busy-work.
Second, identify what kind of content the PDF contains. A 50 MB scanned report is going to compress dramatically (image-heavy, downsampling does the work). A 10 MB text document with a few illustrations is going to compress modestly (most of the bytes are already efficient). A 100 MB technical manual with vector diagrams will compress very little — vectors are already efficient, and rasterising them to compress is destructive.
Third, pick a preset that matches the document's destination. For web sharing, medium. For email attachments, low or medium. For printing, high or printer. Do not blindly use the most aggressive setting — the saved kilobytes are rarely worth the visible quality loss.
Closing thought
PDF compression is not magic. It is a series of well-understood operations on image data, font data, and stream data, each with predictable trade-offs. Once you understand which layer is doing the work for your particular file, you can make informed decisions instead of clicking through presets and hoping for the best. For most documents, medium-quality settings on a privacy-respecting tool will get you 80% of the size reduction with no visible quality loss — and that is usually exactly what you need.
Built & maintained by Mayank Rai
Solo developer based in Lucknow, India · Last updated May 4, 2026
Compress your PDF for free
No signup, no upload to servers. Your files stay private.
Try Free on Toolkiya