📰 Archive Your Pocket Saved Pages as .webarchive Files

Pocket is shutting down.

“After careful consideration, we’ve made the difficult decision to phase out Pocket — our read-it-later and content discovery app. This includes the Pocket Web, Android, iOS, and macOS apps, as well as the Pocket browser extensions.”

getpocket.com/farewell

If you’ve been relying on Pocket to store articles, research, or reading material for later, now’s the time to preserve your collection.

Read more to see how you can store your saved pages locally for archival.


Step 0: Export Your Data Now

Fortunately, Pocket now offers an easy export option:

👉 Export your data as a CSV

The CSV file contains two key columns:

  • Title

  • URL

This makes it straightforward to work with in scripts.


How to store the pages for long-term

Websites have the nasty habit of changing and disappearing from the net.

We’ll convert your list of saved URLs into .webarchive files — self-contained files that store full HTML content, images, and styles for offline use.

Many thanks to chrisbgp’s gist for the inspiration.


Step 1: Install the Tools

Install Webarchiver:

brew install webarchiver

Step 2: Run the Script

Place your exported CSV in your working directory, for example Pocket-Export-May2025.csv, then run:

tail -n +2 Pocket-Export-May2025.csv | cut -d',' -f2 | while read LINE; do
  webarchiver -url "$LINE" -output "$(md5 -qs "$LINE").webarchive"
done

What this does:

  • tail -n +2 skips the header row.

  • cut -d’,’ -f2 extracts the URL column.

  • md5 -qs “$LINE” generates a unique filename based on the URL.

  • webarchiver fetches and saves the page as a .webarchive file in the current working directory.

The result: a local archive of your Pocket reading list.

Tips

  • You can open .webarchive files in Safari or archive them long-term.

  • Consider moving your archives to a cloud storage or backup volume.

  • Fair warning: webarchives can get large. My collection of 350 pages ended up being ±2GB on disk.
  • Webarchive files are indexed by Spotlight and therefore searchable, so I don’t care about the file names.
  • If you do want human readable file names, you could adjust the script to something like:
  • tail -n +2 Pocket-Export-May2025.csv | while IFS=',' read -r TITLE URL; do
    # Remove problematic characters and trim whitespace
    SAFE_TITLE=$(echo "$TITLE" | tr -d '\r' | tr '[:space:]' '_' | tr -cd '[:alnum:]_-')
    
    # Fallback if title is empty
    [ -z "$SAFE_TITLE" ] && SAFE_TITLE="untitled_$(date +%s)"
    
    webarchiver -url "$URL" -output "${SAFE_TITLE}.webarchive"
    done

Leave a Reply

Your email address will not be published. Required fields are marked *