Book 5 — Web Scraping with Python

Python for All

Chapter Two — Getting HTML with requests

Thanasis Troboukis  ·  All Books

Book Five · Chapter Two

Getting HTML with requests

The requests library fetches live pages from the internet. In this chapter you will fetch https://www.kathimerini.gr/epikairothta/, inspect the response, turn the HTML into a BeautifulSoup object, and then move from page 1 to page 2.

Two Lines to Fetch a Page

The first line makes the HTTP request. The second reads the HTML text that came back from the server.

Python · Copy to your notebook

response.text is the raw HTML string. response.status_code tells you whether the request succeeded: 200 means success, 404 means the page was not found, and 403 means the server refused the request.

Why these cells do not run here. The browser version of the course cannot make live requests to external sites. Copy these cells into a Jupyter notebook and run them there.

Looking Like a Browser

Real sites often expect browser-like headers. The most important one is User-Agent. For Greek content, it also helps to set Accept-Language.

Python · Copy to your notebook

Always set a timeout. Without timeout=30, your code can wait forever on a slow server.

Basic tools from earlier books are already in play: variables store the URL and headers, the headers are a dictionary, and function calls like requests.get(...) return values that you save into variables.

From Response to Soup

Once you have the HTML string, the next step is always the same: hand it to BeautifulSoup.

Python · Copy to your notebook

From this point on, the requests part stays mostly the same. The interesting work is what you do with soup. Here are the main methods you will use throughout this book:

What you want Method Example
First tag that matches find() soup.find("a")
All tags that match find_all() soup.find_all("a")
Tag by class name find(class_=) soup.find("a", class_="mainlink")
Tag by id find(id=) soup.find(id="main")
Visible text inside a tag get_text() tag.get_text(" ", strip=True)
Value of an attribute get() tag.get("href")
Tags matching a CSS selector select() soup.select("a.mainlink")
Current page structure: As of April 22, 2026, the Kathimerini latest-news page exposes article links with <a class="mainlink"> and titles inside <span class="card-title">.

Moving from Page 1 to Page 2

The first page is https://www.kathimerini.gr/epikairothta/. The second page is https://www.kathimerini.gr/epikairothta/page/2/. This pattern will later let you loop through many pages with range().

Python · Copy to your notebook

This is the first important pagination idea of the book: some pages are not discovered by clicking buttons in Python. You build the next URL as a string.

Special case for page 1: page 1 uses the base URL, but later pages use page/2/, page/3/, and so on. In the final chapter you will place that if/else logic directly inside a for loop over range().

Your Turn — Fetch and Explore

Copy this block into your notebook. It fetches the live page and prints the first five titles it finds.

Python · Copy to your notebook
What you learned in this chapter: how requests.get() fetches a live page; what response.text and response.status_code contain; how to send browser-like headers; how to turn the HTML into soup; and how page 2 follows the URL pattern .../page/2/. In the next chapters you will use BeautifulSoup to extract the exact fields you want.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…