Why extract attachments separately?
Every attachment that came with every email in an MBOX file is still there — encoded inside the archive. For an active email workflow, that's fine. For archival, legal review, or migration, you often want the attachments as standalone files in organized folders:
- Legal review: Reviewers need to open attachments individually, annotate them, and produce them separately from the email PDFs.
- Migration: Moving to a document management system that indexes attachments needs them as files, not embedded in email bodies.
- Archival sanity: Ten years from now, you want the photos your dad sent as photos in a folder — not encoded in a
.mboxfile. - File size control: PDFs with embedded attachments balloon in size. Extracting them keeps the PDF archive lean.
How MBOX actually stores attachments
MBOX stores the full RFC 5322-formatted email, including the body. For attachments, the email uses MIME multipart encoding — the body is broken into parts (plain text, HTML, each attachment), each with its own content-type and content-transfer-encoding headers. Binary attachments are base64-encoded to survive as plain text.
Here's what a piece of an MBOX looks like when an email has an attachment:
Content-Type: multipart/mixed; boundary="--boundary-abc123"
----boundary-abc123
Content-Type: text/plain; charset="UTF-8"
Hi Bob, here's the quarterly report.
----boundary-abc123
Content-Type: application/pdf; name="Q4-2025-Report.pdf"
Content-Disposition: attachment; filename="Q4-2025-Report.pdf"
Content-Transfer-Encoding: base64
JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9DcmVhdG9yIChBZG9iZSBBY3JvYmF0...
(hundreds of lines of base64)
----boundary-abc123--
Extraction = parse the MIME structure, base64-decode the attachment parts, and write each one as a file with the original filename. Tools that do this correctly preserve the attachment exactly as it was sent.
The automated way — MBOX to PDF
The simplest path on Mac:
- Install MBOX to PDF from the Mac App Store.
- Drag your
.mboxfile into the app. - In the export settings, toggle Extract attachments to folder.
- Pick an output folder for the PDFs and let the attachments save to the sibling folder.
- Run the conversion.
The app decodes every MIME part, preserves original filenames, and organizes attachments into per-email subfolders by default. If you're only interested in attachments (not PDFs), you can still run the export with a minimal PDF configuration — the attachment extraction happens regardless.
Organization strategies
Per-email subfolders (default)
Each email gets its own folder named with the date and subject. Attachments from that email go inside. Prevents filename collisions when multiple emails have attachments with the same name.
Attachments/
├── 2026-04-15 Q4 Report/
│ └── Q4-2025-Report.pdf
├── 2026-04-15 Q4 Report (reply)/
│ └── Q4-2025-Report.pdf
└── 2026-04-16 Photos from trip/
├── IMG_0001.jpg
├── IMG_0002.jpg
└── IMG_0003.jpg
Flat folder with renamed files
If you want all attachments in one folder, rename each one with its email context — e.g. 2026-04-15_Q4-Report_Q4-2025-Report.pdf. This works for smaller archives but gets unwieldy with thousands of attachments.
By attachment type
After extraction, you can reorganize by file extension — all PDFs together, all images together. Useful for photo archives but loses the email context.
The deduplication problem
Attachments often appear multiple times across a thread. A PDF sent by Alice, then quoted in Bob's reply, then quoted again in Alice's reply to Bob, shows up three times in the MBOX — once per email that carries it. Naive extraction gives you three identical files.
To deduplicate after extraction:
Using shasum in Terminal
cd Attachments/
find . -type f -exec shasum -a 256 {} \; | sort
This lists every file with its SHA-256 hash. Files with the same hash are identical; keep one copy and delete or link the rest.
Using a GUI deduplicator
Tools like dupeGuru (free) or Gemini 2 (paid) scan a folder for duplicates by content and offer to remove redundant copies. Works well after bulk extraction.
Keeping the mapping
For legal or forensic work, you typically don't deduplicate — you need to know every email that carried every attachment. Keep the duplicates and log the relationships.
Alternative tools
Command line: munpack
Part of the mpack package (install via Homebrew: brew install mpack). Takes a MIME-encoded file and extracts all parts to the current directory. Works on individual emails, not MBOX files directly, but combined with a splitter it can process an archive.
Python mailbox module
The standard library has everything you need. A short script to extract all attachments from an MBOX:
import mailbox, os
from email.utils import parseaddr
mbox = mailbox.mbox('archive.mbox')
out_dir = 'Attachments'
os.makedirs(out_dir, exist_ok=True)
for msg in mbox:
for part in msg.walk():
if part.get_content_disposition() == 'attachment':
filename = part.get_filename()
if filename:
path = os.path.join(out_dir, filename)
with open(path, 'wb') as f:
f.write(part.get_payload(decode=True))
Simple and effective for scripted workflows. Doesn't handle filename collisions — you'd need to add per-email subfolders or numbered suffixes.
Thunderbird + ImportExportTools NG
Import the MBOX, right-click a folder, use the add-on's attachment export dialog. Works but slower than dedicated converters for large archives.
Edge cases
Inline images
Images referenced inline in HTML email bodies (via Content-ID or cid: URLs) are technically attachments. Depending on the tool, they may or may not be included in extraction. MBOX to PDF includes them by default — they show up as small image files in the attachment folders.
Encoded filenames
Some emails use RFC 2231-encoded filenames for non-ASCII characters (Chinese, emoji, etc.). Good extractors decode these correctly. If you see filenames like =?UTF-8?B?...?= after extraction, the tool isn't decoding RFC 2231 properly.
Corrupted MIME
If an email was manipulated after the fact or the MBOX has transport errors, MIME parsing can fail. Robust converters skip the problematic message and continue; fragile ones stop.
Password-protected attachments
Extraction produces the encrypted file as-is. You still need the password to open the PDF or ZIP. Tools don't crack the encryption.
Verification
After extraction, verify a few things:
- File counts look reasonable (spot-check a few emails' worth against the original)
- Files open correctly (opening 3 random files confirms the decoding worked)
- Filenames preserved (no base64 gibberish in names)
- No zero-byte files (means decoding failed on those)
Frequently asked questions
How are attachments stored inside an MBOX file?
Inline within each email using MIME base64 encoding. The MBOX contains the full as-transmitted message, so every attachment is there — just encoded as text.
Can I extract without converting emails to PDF?
Yes. MBOX to PDF's extraction is independent of PDF output. You can also use munpack, Python's mailbox module, or Thunderbird with ImportExportTools NG.
How does MBOX to PDF name extracted attachments?
Original filenames are preserved. Per-email subfolders prevent collisions when multiple emails use the same name.
Why are my extracted attachments duplicated?
The same file appears across a thread — original + quoted replies. Each is separately stored in the MBOX. Deduplicate with shasum or a GUI tool like dupeGuru.
Can I extract from a Gmail Takeout MBOX?
Yes — Gmail Takeout is standard MBOX format.
Are inline images extracted too?
They're technically attachments, so yes by default. Skip them if you only want traditional file attachments.