Can I extract MBOX attachments without converting emails to PDF?

Yes. MBOX to PDF's extraction runs independently of PDF export — enable extraction and the attachments save to your chosen folder regardless of whether you also produce PDFs. You can also use command-line tools like munpack or Python's mailbox module for scripted extraction.

Can I extract attachments from a Gmail Takeout MBOX?

Yes. Gmail Takeout produces a standard MBOX file with all attachments encoded inline. MBOX to PDF extracts them the same way it handles any other MBOX source.

What happens to inline images — are those also extracted?

Images embedded in HTML email bodies (referenced by Content-ID or Content-Location) are technically attachments and are extracted by most tools. Whether you want them in your attachment folder depends on the use case — for legal production, yes; for a personal archive focused on traditional attachments, you may want to skip them.

Extract Attachments from MBOX Files on Mac (2026)

Published April 22, 2026 · Updated April 22, 2026

Why extract attachments separately?

Every attachment that came with every email in an MBOX file is still there — encoded inside the archive. For an active email workflow, that's fine. For archival, legal review, or migration, you often want the attachments as standalone files in organized folders:

Legal review: Reviewers need to open attachments individually, annotate them, and produce them separately from the email PDFs.
Migration: Moving to a document management system that indexes attachments needs them as files, not embedded in email bodies.
Archival sanity: Ten years from now, you want the photos your dad sent as photos in a folder — not encoded in a .mbox file.
File size control: PDFs with embedded attachments balloon in size. Extracting them keeps the PDF archive lean.

How MBOX actually stores attachments

MBOX stores the full RFC 5322-formatted email, including the body. For attachments, the email uses MIME multipart encoding — the body is broken into parts (plain text, HTML, each attachment), each with its own content-type and content-transfer-encoding headers. Binary attachments are base64-encoded to survive as plain text.

Here's what a piece of an MBOX looks like when an email has an attachment:

Content-Type: multipart/mixed; boundary="--boundary-abc123"

----boundary-abc123
Content-Type: text/plain; charset="UTF-8"

Hi Bob, here's the quarterly report.

----boundary-abc123
Content-Type: application/pdf; name="Q4-2025-Report.pdf"
Content-Disposition: attachment; filename="Q4-2025-Report.pdf"
Content-Transfer-Encoding: base64

JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9DcmVhdG9yIChBZG9iZSBBY3JvYmF0...
(hundreds of lines of base64)
----boundary-abc123--

Extraction = parse the MIME structure, base64-decode the attachment parts, and write each one as a file with the original filename. Tools that do this correctly preserve the attachment exactly as it was sent.

The automated way — MBOX to PDF

The simplest path on Mac:

Install MBOX to PDF from the Mac App Store.
Drag your .mbox file into the app.
In the export settings, toggle Extract attachments to folder.
Pick an output folder for the PDFs and let the attachments save to the sibling folder.
Run the conversion.

The app decodes every MIME part, preserves original filenames, and organizes attachments into per-email subfolders by default. If you're only interested in attachments (not PDFs), you can still run the export with a minimal PDF configuration — the attachment extraction happens regardless.

Organization strategies

Per-email subfolders (default)

Each email gets its own folder named with the date and subject. Attachments from that email go inside. Prevents filename collisions when multiple emails have attachments with the same name.

Attachments/
├── 2026-04-15 Q4 Report/
│   └── Q4-2025-Report.pdf
├── 2026-04-15 Q4 Report (reply)/
│   └── Q4-2025-Report.pdf
└── 2026-04-16 Photos from trip/
    ├── IMG_0001.jpg
    ├── IMG_0002.jpg
    └── IMG_0003.jpg

Flat folder with renamed files

If you want all attachments in one folder, rename each one with its email context — e.g. 2026-04-15_Q4-Report_Q4-2025-Report.pdf. This works for smaller archives but gets unwieldy with thousands of attachments.

By attachment type

After extraction, you can reorganize by file extension — all PDFs together, all images together. Useful for photo archives but loses the email context.

The deduplication problem

Attachments often appear multiple times across a thread. A PDF sent by Alice, then quoted in Bob's reply, then quoted again in Alice's reply to Bob, shows up three times in the MBOX — once per email that carries it. Naive extraction gives you three identical files.

To deduplicate after extraction:

Using shasum in Terminal

cd Attachments/
find . -type f -exec shasum -a 256 {} \; | sort

This lists every file with its SHA-256 hash. Files with the same hash are identical; keep one copy and delete or link the rest.

Using a GUI deduplicator

Tools like dupeGuru (free) or Gemini 2 (paid) scan a folder for duplicates by content and offer to remove redundant copies. Works well after bulk extraction.

Keeping the mapping

For legal or forensic work, you typically don't deduplicate — you need to know every email that carried every attachment. Keep the duplicates and log the relationships.

Alternative tools

Command line: munpack

Part of the mpack package (install via Homebrew: brew install mpack). Takes a MIME-encoded file and extracts all parts to the current directory. Works on individual emails, not MBOX files directly, but combined with a splitter it can process an archive.

Python mailbox module

The standard library has everything you need. A short script to extract all attachments from an MBOX:

import mailbox, os
from email.utils import parseaddr

mbox = mailbox.mbox('archive.mbox')
out_dir = 'Attachments'
os.makedirs(out_dir, exist_ok=True)

for msg in mbox:
    for part in msg.walk():
        if part.get_content_disposition() == 'attachment':
            filename = part.get_filename()
            if filename:
                path = os.path.join(out_dir, filename)
                with open(path, 'wb') as f:
                    f.write(part.get_payload(decode=True))

Simple and effective for scripted workflows. Doesn't handle filename collisions — you'd need to add per-email subfolders or numbered suffixes.

Thunderbird + ImportExportTools NG

Import the MBOX, right-click a folder, use the add-on's attachment export dialog. Works but slower than dedicated converters for large archives.

Edge cases

Inline images

Images referenced inline in HTML email bodies (via Content-ID or cid: URLs) are technically attachments. Depending on the tool, they may or may not be included in extraction. MBOX to PDF includes them by default — they show up as small image files in the attachment folders.

Encoded filenames

Some emails use RFC 2231-encoded filenames for non-ASCII characters (Chinese, emoji, etc.). Good extractors decode these correctly. If you see filenames like =?UTF-8?B?...?= after extraction, the tool isn't decoding RFC 2231 properly.

Corrupted MIME

If an email was manipulated after the fact or the MBOX has transport errors, MIME parsing can fail. Robust converters skip the problematic message and continue; fragile ones stop.

Password-protected attachments

Extraction produces the encrypted file as-is. You still need the password to open the PDF or ZIP. Tools don't crack the encryption.

Verification

After extraction, verify a few things:

File counts look reasonable (spot-check a few emails' worth against the original)
Files open correctly (opening 3 random files confirms the decoding worked)
Filenames preserved (no base64 gibberish in names)
No zero-byte files (means decoding failed on those)

Frequently asked questions

How are attachments stored inside an MBOX file?

Inline within each email using MIME base64 encoding. The MBOX contains the full as-transmitted message, so every attachment is there — just encoded as text.

Can I extract without converting emails to PDF?

Yes. MBOX to PDF's extraction is independent of PDF output. You can also use munpack, Python's mailbox module, or Thunderbird with ImportExportTools NG.

How does MBOX to PDF name extracted attachments?

Original filenames are preserved. Per-email subfolders prevent collisions when multiple emails use the same name.

Why are my extracted attachments duplicated?

The same file appears across a thread — original + quoted replies. Each is separately stored in the MBOX. Deduplicate with shasum or a GUI tool like dupeGuru.

Can I extract from a Gmail Takeout MBOX?

Yes — Gmail Takeout is standard MBOX format.

Are inline images extracted too?

They're technically attachments, so yes by default. Skip them if you only want traditional file attachments.

Extract attachments
from MBOX files.

Why extract attachments separately?

How MBOX actually stores attachments

The automated way — MBOX to PDF

Organization strategies

Per-email subfolders (default)

Flat folder with renamed files

By attachment type

The deduplication problem

Using shasum in Terminal

Using a GUI deduplicator

Keeping the mapping

Alternative tools

Command line: munpack

Python mailbox module

Thunderbird + ImportExportTools NG

Edge cases

Inline images

Encoded filenames

Corrupted MIME

Password-protected attachments

Verification

Frequently asked questions

How are attachments stored inside an MBOX file?

Can I extract without converting emails to PDF?

How does MBOX to PDF name extracted attachments?

Why are my extracted attachments duplicated?

Can I extract from a Gmail Takeout MBOX?

Are inline images extracted too?

Related reading

Extract everything.

Extract attachmentsfrom MBOX files.

Why extract attachments separately?

How MBOX actually stores attachments

The automated way — MBOX to PDF

Organization strategies

Per-email subfolders (default)

Flat folder with renamed files

By attachment type

The deduplication problem

Using shasum in Terminal

Using a GUI deduplicator

Keeping the mapping

Alternative tools

Command line: munpack

Python mailbox module

Thunderbird + ImportExportTools NG

Edge cases

Inline images

Encoded filenames

Corrupted MIME

Password-protected attachments

Verification

Frequently asked questions

How are attachments stored inside an MBOX file?

Can I extract without converting emails to PDF?

How does MBOX to PDF name extracted attachments?

Why are my extracted attachments duplicated?

Can I extract from a Gmail Takeout MBOX?

Are inline images extracted too?

Related reading

Extract everything.

Extract attachments
from MBOX files.