All guides

Extract attachments
from MBOX files.

How MBOX stores attachments, how to pull every one of them out on Mac, and how to organize the result.

Published April 22, 2026 · Updated April 22, 2026

Why extract attachments separately?

Every attachment that came with every email in an MBOX file is still there — encoded inside the archive. For an active email workflow, that's fine. For archival, legal review, or migration, you often want the attachments as standalone files in organized folders:

How MBOX actually stores attachments

MBOX stores the full RFC 5322-formatted email, including the body. For attachments, the email uses MIME multipart encoding — the body is broken into parts (plain text, HTML, each attachment), each with its own content-type and content-transfer-encoding headers. Binary attachments are base64-encoded to survive as plain text.

Here's what a piece of an MBOX looks like when an email has an attachment:

Content-Type: multipart/mixed; boundary="--boundary-abc123"

----boundary-abc123
Content-Type: text/plain; charset="UTF-8"

Hi Bob, here's the quarterly report.

----boundary-abc123
Content-Type: application/pdf; name="Q4-2025-Report.pdf"
Content-Disposition: attachment; filename="Q4-2025-Report.pdf"
Content-Transfer-Encoding: base64

JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9DcmVhdG9yIChBZG9iZSBBY3JvYmF0...
(hundreds of lines of base64)
----boundary-abc123--

Extraction = parse the MIME structure, base64-decode the attachment parts, and write each one as a file with the original filename. Tools that do this correctly preserve the attachment exactly as it was sent.

The automated way — MBOX to PDF

The simplest path on Mac:

  1. Install MBOX to PDF from the Mac App Store.
  2. Drag your .mbox file into the app.
  3. In the export settings, toggle Extract attachments to folder.
  4. Pick an output folder for the PDFs and let the attachments save to the sibling folder.
  5. Run the conversion.

The app decodes every MIME part, preserves original filenames, and organizes attachments into per-email subfolders by default. If you're only interested in attachments (not PDFs), you can still run the export with a minimal PDF configuration — the attachment extraction happens regardless.

Organization strategies

Per-email subfolders (default)

Each email gets its own folder named with the date and subject. Attachments from that email go inside. Prevents filename collisions when multiple emails have attachments with the same name.

Attachments/
├── 2026-04-15 Q4 Report/
│   └── Q4-2025-Report.pdf
├── 2026-04-15 Q4 Report (reply)/
│   └── Q4-2025-Report.pdf
└── 2026-04-16 Photos from trip/
    ├── IMG_0001.jpg
    ├── IMG_0002.jpg
    └── IMG_0003.jpg

Flat folder with renamed files

If you want all attachments in one folder, rename each one with its email context — e.g. 2026-04-15_Q4-Report_Q4-2025-Report.pdf. This works for smaller archives but gets unwieldy with thousands of attachments.

By attachment type

After extraction, you can reorganize by file extension — all PDFs together, all images together. Useful for photo archives but loses the email context.

The deduplication problem

Attachments often appear multiple times across a thread. A PDF sent by Alice, then quoted in Bob's reply, then quoted again in Alice's reply to Bob, shows up three times in the MBOX — once per email that carries it. Naive extraction gives you three identical files.

To deduplicate after extraction:

Using shasum in Terminal

cd Attachments/
find . -type f -exec shasum -a 256 {} \; | sort

This lists every file with its SHA-256 hash. Files with the same hash are identical; keep one copy and delete or link the rest.

Using a GUI deduplicator

Tools like dupeGuru (free) or Gemini 2 (paid) scan a folder for duplicates by content and offer to remove redundant copies. Works well after bulk extraction.

Keeping the mapping

For legal or forensic work, you typically don't deduplicate — you need to know every email that carried every attachment. Keep the duplicates and log the relationships.

Alternative tools

Command line: munpack

Part of the mpack package (install via Homebrew: brew install mpack). Takes a MIME-encoded file and extracts all parts to the current directory. Works on individual emails, not MBOX files directly, but combined with a splitter it can process an archive.

Python mailbox module

The standard library has everything you need. A short script to extract all attachments from an MBOX:

import mailbox, os
from email.utils import parseaddr

mbox = mailbox.mbox('archive.mbox')
out_dir = 'Attachments'
os.makedirs(out_dir, exist_ok=True)

for msg in mbox:
    for part in msg.walk():
        if part.get_content_disposition() == 'attachment':
            filename = part.get_filename()
            if filename:
                path = os.path.join(out_dir, filename)
                with open(path, 'wb') as f:
                    f.write(part.get_payload(decode=True))

Simple and effective for scripted workflows. Doesn't handle filename collisions — you'd need to add per-email subfolders or numbered suffixes.

Thunderbird + ImportExportTools NG

Import the MBOX, right-click a folder, use the add-on's attachment export dialog. Works but slower than dedicated converters for large archives.

Edge cases

Inline images

Images referenced inline in HTML email bodies (via Content-ID or cid: URLs) are technically attachments. Depending on the tool, they may or may not be included in extraction. MBOX to PDF includes them by default — they show up as small image files in the attachment folders.

Encoded filenames

Some emails use RFC 2231-encoded filenames for non-ASCII characters (Chinese, emoji, etc.). Good extractors decode these correctly. If you see filenames like =?UTF-8?B?...?= after extraction, the tool isn't decoding RFC 2231 properly.

Corrupted MIME

If an email was manipulated after the fact or the MBOX has transport errors, MIME parsing can fail. Robust converters skip the problematic message and continue; fragile ones stop.

Password-protected attachments

Extraction produces the encrypted file as-is. You still need the password to open the PDF or ZIP. Tools don't crack the encryption.

Verification

After extraction, verify a few things:

Frequently asked questions

How are attachments stored inside an MBOX file?

Inline within each email using MIME base64 encoding. The MBOX contains the full as-transmitted message, so every attachment is there — just encoded as text.

Can I extract without converting emails to PDF?

Yes. MBOX to PDF's extraction is independent of PDF output. You can also use munpack, Python's mailbox module, or Thunderbird with ImportExportTools NG.

How does MBOX to PDF name extracted attachments?

Original filenames are preserved. Per-email subfolders prevent collisions when multiple emails use the same name.

Why are my extracted attachments duplicated?

The same file appears across a thread — original + quoted replies. Each is separately stored in the MBOX. Deduplicate with shasum or a GUI tool like dupeGuru.

Can I extract from a Gmail Takeout MBOX?

Yes — Gmail Takeout is standard MBOX format.

Are inline images extracted too?

They're technically attachments, so yes by default. Skip them if you only want traditional file attachments.

Related reading

Attachments included

Extract everything.

PDFs + attachments in one pass. $14.99 one-time. 100% offline.

Download on theMac App Store