Book 6 · Chapter Three — Saving PDFs to a Folder

Part One

The documentUrl Field

Every decision in a Diavgeia search response includes a documentUrl field. This is the direct address of the attached document — almost always a PDF. It is exactly what you would click if you were browsing the Diavgeia website manually.

Run the cell below to see how documentUrl sits inside each decision, and how to extract all the URLs from a page of results in one loop.

Python · Try it

import json

raw = """
{
  "info": {"total": 89},
  "decisions": [
    {
      "ada": "ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ",
      "subject": "Ανάληψη υποχρέωσης για την προμήθεια κλιματιστικών μονάδων",
      "issueDate": "2025-06-10",
      "documentUrl": "https://diavgeia.gov.gr/doc/ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ",
      "documentType": "pdf",
      "organization": {"label": "ΔΗΜΟΣ ΑΘΗΝΑΙΩΝ"},
      "decisionType": {"label": "Ανάληψη Υποχρέωσης"}
    },
    {
      "ada": "6ΛΩΖ7ΛΞ-ΦΨΥ",
      "subject": "Σύμβαση προμήθειας και εγκατάστασης κλιματιστικών",
      "issueDate": "2025-06-08",
      "documentUrl": "https://diavgeia.gov.gr/doc/6ΛΩΖ7ΛΞ-ΦΨΥ",
      "documentType": "pdf",
      "organization": {"label": "ΔΗΜΟΣ ΘΕΣΣΑΛΟΝΙΚΗΣ"},
      "decisionType": {"label": "Σύμβαση"}
    },
    {
      "ada": "ΩΒΚ746ΜΦΩΡ-ΠΔΓ",
      "subject": "Συντήρηση κλιματισμού δημοτικών κτιρίων",
      "issueDate": "2025-06-05",
      "documentUrl": null,
      "documentType": null,
      "organization": {"label": "ΔΗΜΟΣ ΠΕΙΡΑΙΩΣ"},
      "decisionType": {"label": "Ανάληψη Υποχρέωσης"}
    }
  ]
}
"""

data = json.loads(raw)

for decision in data["decisions"]:
    ada = decision.get("ada", "")
    url = decision.get("documentUrl")
    if url:
        print(ada, "→", url)
    else:
        print(ada, "→ no document attached")

Notice that the third decision has null for both documentUrl and documentType. Not every decision has an attached file. The if url: guard skips those cleanly without crashing.

Part Two

Text vs Binary — resp.content

In Chapter Two you used resp.json() to get a Python dictionary from the response. PDFs are not text — they are binary files. If you try to read a PDF as text you will get garbled output. Python gives you two ways to read a response body:

Attribute	Returns	Use for
`resp.text`	A Python string, decoded as UTF-8	HTML, JSON, plain text
`resp.json()`	A Python dict or list	JSON API responses
`resp.content`	Raw bytes — the exact file as received	PDFs, images, any binary file

To write those bytes to disk you open the file in binary write mode — "wb" instead of the usual "w". The b tells Python not to try to encode the bytes as text:

Python · Copy to your notebook

import requests

document_url = "https://diavgeia.gov.gr/doc/ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ"

resp = requests.get(document_url, timeout=60, allow_redirects=True)
resp.raise_for_status()

with open("ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ.pdf", "wb") as f:
    f.write(resp.content)

print("Saved:", resp.headers.get("Content-Type"), len(resp.content), "bytes")

Two details worth noting. allow_redirects=True is important: Diavgeia's document URLs sometimes redirect to a storage server before delivering the file. Without this argument the request would stop at the redirect and you would save an empty or broken file. And timeout=60 is longer than the 30 seconds you use for API searches, because PDFs can be large.

Part Three

Creating the Output Folder

Saving all PDFs into the same directory as your script quickly becomes unmanageable. The standard approach is to create a dedicated output folder. Python's os module handles this in one line:

Python · Try it

exist_ok=True is the key argument. Without it, os.makedirs() raises a FileExistsError if the folder is already there — which means your script would crash on the second run. With exist_ok=True, it does nothing if the folder exists and creates it if it doesn't. Safe to call every time.

To build the full path to a file inside that folder, use os.path.join(). It handles the slash between the folder name and the filename correctly on any operating system:

Python · Try it

Why the ADA as filename? The ADA code is unique across the entire Diavgeia registry — no two decisions share the same ADA. Using it as the filename means you can always trace a file back to its source, and you will never accidentally overwrite one decision's PDF with another's.

Part Four

The Complete Pipeline

You now have all the pieces. The full script searches Diavgeia, loops over the decisions, skips any that have no document, and saves each PDF into the output folder. Run this in a Jupyter notebook or a local Python script — not in the browser, because live HTTP requests to external servers are not possible here.

Python · Copy to your notebook

import os
import time
import requests

# ── Settings ──────────────────────────────────────────────
SEARCH_URL = "https://diavgeia.gov.gr/luminapi/api/search"
OUTPUT_FOLDER = "diavgeia_pdfs"
QUERY = 'subject:"κλιματισμός"'
PAGE_SIZE = 10
# ──────────────────────────────────────────────────────────

os.makedirs(OUTPUT_FOLDER, exist_ok=True)

params = {
    "q": QUERY,
    "size": PAGE_SIZE,
    "page": 0,
    "sort": "recent",
}

resp = requests.get(SEARCH_URL, params=params, headers={"Accept": "application/json"}, timeout=30)
resp.raise_for_status()
data = resp.json()

print(f"Total decisions found: {data['info']['total']}")
print(f"Downloading up to {PAGE_SIZE} PDFs...\n")

for decision in data.get("decisions", []):
    ada = decision.get("ada", "")
    doc_url = decision.get("documentUrl")

    if not doc_url:
        print(f"  {ada} — no document, skipping")
        continue

    filepath = os.path.join(OUTPUT_FOLDER, ada + ".pdf")

    pdf_resp = requests.get(doc_url, timeout=60, allow_redirects=True)
    pdf_resp.raise_for_status()

    with open(filepath, "wb") as f:
        f.write(pdf_resp.content)

    print(f"  Saved: {filepath}  ({len(pdf_resp.content):,} bytes)")

    time.sleep(1)

print("\nDone.")

time.sleep(1) pauses for one second between each PDF download. APIs and servers have rate limits — if you send dozens of requests in quick succession you may get blocked or slow the server down for other users. One second per file is a respectful pace for a tutorial script.

The structure of the script is the same pattern you will reuse whenever you download files from an API: create the folder once before the loop, build the path inside the loop, save inside the loop, pause inside the loop.

Part Five

Your Turn — Build the Paths

The cell below gives you a list of decisions. Your task: create a folder called decisions_2025, then build and print the full file path for each decision that has a documentUrl. Skip any that don't.

Python · Your turn

import os
import json

raw = """
{
  "decisions": [
    {
      "ada": "ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ",
      "issueDate": "2025-06-10",
      "documentUrl": "https://diavgeia.gov.gr/doc/ΨΦΔΔ46ΜΤΛΡ-ΑΩΣ",
      "organization": {"label": "ΔΗΜΟΣ ΑΘΗΝΑΙΩΝ"}
    },
    {
      "ada": "6ΛΩΖ7ΛΞ-ΦΨΥ",
      "issueDate": "2025-06-08",
      "documentUrl": "https://diavgeia.gov.gr/doc/6ΛΩΖ7ΛΞ-ΦΨΥ",
      "organization": {"label": "ΔΗΜΟΣ ΘΕΣΣΑΛΟΝΙΚΗΣ"}
    },
    {
      "ada": "ΩΒΚ746ΜΦΩΡ-ΠΔΓ",
      "issueDate": "2025-06-05",
      "documentUrl": null,
      "organization": {"label": "ΔΗΜΟΣ ΠΕΙΡΑΙΩΣ"}
    },
    {
      "ada": "ΡΨΞ946ΜΤΛΡ-ΒΩΔ",
      "issueDate": "2025-06-01",
      "documentUrl": "https://diavgeia.gov.gr/doc/ΡΨΞ946ΜΤΛΡ-ΒΩΔ",
      "organization": {"label": "ΔΗΜΟΣ ΠΑΤΡΕΩΝ"}
    }
  ]
}
"""

data = json.loads(raw)

folder = "decisions_2025"
os.makedirs(folder, exist_ok=True)

for decision in data["decisions"]:
    ada = decision.get("ada", "")
    doc_url = decision.get("documentUrl")

if not doc_url:
        print(f"  {ada} — skipped (no document)")
        continue

filepath = os.path.join(folder, ada + ".pdf")
    print(f"  Would save to: {filepath}")

print()
print("Folder exists:", os.path.exists(folder))

What you learned in this chapter: every Diavgeia decision exposes its PDF through documentUrl; you download binary files with resp.content and save them with "wb" mode; os.makedirs(folder, exist_ok=True) creates the output folder safely; and os.path.join() builds file paths correctly on any operating system.

Chapter Navigation

Move between chapters.

Previous: Chapter 2 — Your First API Request