Book 6 — Working with APIs

Python for All

Chapter Three — Saving PDFs to a Folder

Thanasis Troboukis  ·  All Books

Book Six · Chapter Three

Saving PDFs to a Folder

Every Diavgeia decision carries a documentUrl — a direct link to the attached PDF. In this chapter you will download those files, create an organised output folder, and save each PDF with the ADA code as its filename.

The documentUrl Field

Every decision in a Diavgeia search response includes a documentUrl field. This is the direct address of the attached document — almost always a PDF. It is exactly what you would click if you were browsing the Diavgeia website manually.

Run the cell below to see how documentUrl sits inside each decision, and how to extract all the URLs from a page of results in one loop.

Python · Try it

      

Notice that the third decision has null for both documentUrl and documentType. Not every decision has an attached file. The if url: guard skips those cleanly without crashing.

Text vs Binary — resp.content

In Chapter Two you used resp.json() to get a Python dictionary from the response. PDFs are not text — they are binary files. If you try to read a PDF as text you will get garbled output. Python gives you two ways to read a response body:

Attribute Returns Use for
resp.text A Python string, decoded as UTF-8 HTML, JSON, plain text
resp.json() A Python dict or list JSON API responses
resp.content Raw bytes — the exact file as received PDFs, images, any binary file

To write those bytes to disk you open the file in binary write mode — "wb" instead of the usual "w". The b tells Python not to try to encode the bytes as text:

Python · Copy to your notebook

Two details worth noting. allow_redirects=True is important: Diavgeia's document URLs sometimes redirect to a storage server before delivering the file. Without this argument the request would stop at the redirect and you would save an empty or broken file. And timeout=60 is longer than the 30 seconds you use for API searches, because PDFs can be large.

Creating the Output Folder

Saving all PDFs into the same directory as your script quickly becomes unmanageable. The standard approach is to create a dedicated output folder. Python's os module handles this in one line:

Python · Try it

      

exist_ok=True is the key argument. Without it, os.makedirs() raises a FileExistsError if the folder is already there — which means your script would crash on the second run. With exist_ok=True, it does nothing if the folder exists and creates it if it doesn't. Safe to call every time.

To build the full path to a file inside that folder, use os.path.join(). It handles the slash between the folder name and the filename correctly on any operating system:

Python · Try it

      
Why the ADA as filename? The ADA code is unique across the entire Diavgeia registry — no two decisions share the same ADA. Using it as the filename means you can always trace a file back to its source, and you will never accidentally overwrite one decision's PDF with another's.

The Complete Pipeline

You now have all the pieces. The full script searches Diavgeia, loops over the decisions, skips any that have no document, and saves each PDF into the output folder. Run this in a Jupyter notebook or a local Python script — not in the browser, because live HTTP requests to external servers are not possible here.

Python · Copy to your notebook
time.sleep(1) pauses for one second between each PDF download. APIs and servers have rate limits — if you send dozens of requests in quick succession you may get blocked or slow the server down for other users. One second per file is a respectful pace for a tutorial script.

The structure of the script is the same pattern you will reuse whenever you download files from an API: create the folder once before the loop, build the path inside the loop, save inside the loop, pause inside the loop.

Your Turn — Build the Paths

The cell below gives you a list of decisions. Your task: create a folder called decisions_2025, then build and print the full file path for each decision that has a documentUrl. Skip any that don't.

Python · Your turn

      
What you learned in this chapter: every Diavgeia decision exposes its PDF through documentUrl; you download binary files with resp.content and save them with "wb" mode; os.makedirs(folder, exist_ok=True) creates the output folder safely; and os.path.join() builds file paths correctly on any operating system.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…