Book 5 · Chapter Two — Getting HTML with requests

Part One

Two Lines to Fetch a Page

The first line makes the HTTP request. The second reads the HTML text that came back from the server.

Python · Copy to your notebook

import requests

URL = "https://www.kathimerini.gr/epikairothta/"

response = requests.get(URL)
html = response.text

print("Status code:", response.status_code)
print("Characters received:", len(html))
print("First 200 characters:")
print(html[:200])

response.text is the raw HTML string. response.status_code tells you whether the request succeeded: 200 means success, 404 means the page was not found, and 403 means the server refused the request.

Why these cells do not run here. The browser version of the course cannot make live requests to external sites. Copy these cells into a Jupyter notebook and run them there.

Part Two

Looking Like a Browser

Real sites often expect browser-like headers. The most important one is User-Agent. For Greek content, it also helps to set Accept-Language.

Python · Copy to your notebook

import requests

URL = "https://www.kathimerini.gr/epikairothta/"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "el-GR,el;q=0.9,en-US;q=0.8",
}

response = requests.get(URL, headers=headers, timeout=30)
print("Status   :", response.status_code)
print("Encoding :", response.encoding)

Always set a timeout. Without timeout=30, your code can wait forever on a slow server.

Basic tools from earlier books are already in play: variables store the URL and headers, the headers are a dictionary, and function calls like requests.get(...) return values that you save into variables.

Part Three

From Response to Soup

Once you have the HTML string, the next step is always the same: hand it to BeautifulSoup.

Python · Copy to your notebook

import requests
from bs4 import BeautifulSoup

URL = "https://www.kathimerini.gr/epikairothta/"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "el-GR,el;q=0.9,en-US;q=0.8",
}

response = requests.get(URL, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")

print("Page title:", soup.title.get_text(strip=True))

first_card = soup.find("a", class_="mainlink")
first_title = first_card.find("span", class_="card-title")
print("First article:", first_title.get_text(" ", strip=True))

From this point on, the requests part stays mostly the same. The interesting work is what you do with soup. Here are the main methods you will use throughout this book:

What you want	Method	Example
First tag that matches	`find()`	`soup.find("a")`
All tags that match	`find_all()`	`soup.find_all("a")`
Tag by class name	`find(class_=)`	`soup.find("a", class_="mainlink")`
Tag by id	`find(id=)`	`soup.find(id="main")`
Visible text inside a tag	`get_text()`	`tag.get_text(" ", strip=True)`
Value of an attribute	`get()`	`tag.get("href")`
Tags matching a CSS selector	`select()`	`soup.select("a.mainlink")`

Current page structure: As of April 22, 2026, the Kathimerini latest-news page exposes article links with <a class="mainlink"> and titles inside <span class="card-title">.

Part Four

Moving from Page 1 to Page 2

The first page is https://www.kathimerini.gr/epikairothta/. The second page is https://www.kathimerini.gr/epikairothta/page/2/. This pattern will later let you loop through many pages with range().

Python · Copy to your notebook

import requests
from bs4 import BeautifulSoup

BASE_URL = "https://www.kathimerini.gr/epikairothta/"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "el-GR,el;q=0.9,en-US;q=0.8",
}

page_number = 2
page_url = f"{BASE_URL}page/{page_number}/"

response = requests.get(page_url, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")

print("Fetched:", page_url)
print("Status :", response.status_code)
print("Cards  :", len(soup.find_all("a", class_="mainlink")))

This is the first important pagination idea of the book: some pages are not discovered by clicking buttons in Python. You build the next URL as a string.

Special case for page 1: page 1 uses the base URL, but later pages use page/2/, page/3/, and so on. In the final chapter you will place that if/else logic directly inside a for loop over range().

Part Five

Your Turn — Fetch and Explore

Copy this block into your notebook. It fetches the live page and prints the first five titles it finds.

Python · Copy to your notebook

import requests
from bs4 import BeautifulSoup

URL = "https://www.kathimerini.gr/epikairothta/"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "el-GR,el;q=0.9,en-US;q=0.8",
}

response = requests.get(URL, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")

cards = soup.find_all("a", class_="mainlink")

print("Total article links found:", len(cards))
print()

for card in cards[:5]:
    title_tag = card.find("span", class_="card-title")
    if not title_tag:
        continue
    title = title_tag.get_text(" ", strip=True)
    link = card.get("href")
    print(title)
    print(link)
    print()

What you learned in this chapter: how requests.get() fetches a live page; what response.text and response.status_code contain; how to send browser-like headers; how to turn the HTML into soup; and how page 2 follows the URL pattern .../page/2/. In the next chapters you will use BeautifulSoup to extract the exact fields you want.

Chapter Navigation

Move between chapters.

Previous: Chapter 1 — What is Web Scraping? Next: Chapter 3 — Your First BeautifulSoup Object