Part One
The documentUrl Field
Every decision in a Diavgeia search response includes a documentUrl field. This is the direct address of the attached document — almost always a PDF. It is exactly what you would click if you were browsing the Diavgeia website manually.
Run the cell below to see how documentUrl sits inside each decision, and how to extract all the URLs from a page of results in one loop.
Notice that the third decision has null for both documentUrl and documentType. Not every decision has an attached file. The if url: guard skips those cleanly without crashing.
Part Two
Text vs Binary — resp.content
In Chapter Two you used resp.json() to get a Python dictionary from the response. PDFs are not text — they are binary files. If you try to read a PDF as text you will get garbled output. Python gives you two ways to read a response body:
| Attribute | Returns | Use for |
|---|---|---|
resp.text |
A Python string, decoded as UTF-8 | HTML, JSON, plain text |
resp.json() |
A Python dict or list | JSON API responses |
resp.content |
Raw bytes — the exact file as received | PDFs, images, any binary file |
To write those bytes to disk you open the file in binary write mode — "wb" instead of the usual "w". The b tells Python not to try to encode the bytes as text:
Two details worth noting. allow_redirects=True is important: Diavgeia's document URLs sometimes redirect to a storage server before delivering the file. Without this argument the request would stop at the redirect and you would save an empty or broken file. And timeout=60 is longer than the 30 seconds you use for API searches, because PDFs can be large.
Part Three
Creating the Output Folder
Saving all PDFs into the same directory as your script quickly becomes unmanageable. The standard approach is to create a dedicated output folder. Python's os module handles this in one line:
exist_ok=True is the key argument. Without it, os.makedirs() raises a FileExistsError if the folder is already there — which means your script would crash on the second run. With exist_ok=True, it does nothing if the folder exists and creates it if it doesn't. Safe to call every time.
To build the full path to a file inside that folder, use os.path.join(). It handles the slash between the folder name and the filename correctly on any operating system:
Part Four
The Complete Pipeline
You now have all the pieces. The full script searches Diavgeia, loops over the decisions, skips any that have no document, and saves each PDF into the output folder. Run this in a Jupyter notebook or a local Python script — not in the browser, because live HTTP requests to external servers are not possible here.
The structure of the script is the same pattern you will reuse whenever you download files from an API: create the folder once before the loop, build the path inside the loop, save inside the loop, pause inside the loop.
Part Five
Your Turn — Build the Paths
The cell below gives you a list of decisions. Your task: create a folder called decisions_2025, then build and print the full file path for each decision that has a documentUrl. Skip any that don't.
documentUrl; you download binary files with resp.content and save them with "wb" mode; os.makedirs(folder, exist_ok=True) creates the output folder safely; and os.path.join() builds file paths correctly on any operating system.
Chapter Navigation
Move between chapters.