Part One
Creating a Soup Object
You create a BeautifulSoup object by passing two things: the HTML string and a parser name. The parser is the engine that reads the raw text and builds the internal structure. Use "html.parser" — it is built into Python and works for every real-world page you will encounter in this course.
The prettify() output is not the actual data — it is a debugging view. You can see every tag, every attribute, every level of nesting. Whenever you are unsure what a page contains, run print(soup.prettify()) first and read the structure before writing any selectors.
Part Two
Accessing Tags by Name
The simplest way to reach an element is to access it as a property of the soup object — just type soup. followed by the tag name. soup.h3 gives you the first <h3> on the page. soup.p gives you the first <p>. soup.table gives you the first <table>.
Notice that soup.div gives you the outermost <div> — the tabcontent one — not the inner panel div. Tag-name access always returns the first match in document order. To reach specific elements deeper in the page you will use find() and find_all(), which you will learn in the next chapter.
soup.h3.name returns the string "h3". This is useful when you are iterating over mixed elements and need to check what type each one is.
Part Three
Reading Text with get_text()
Every BeautifulSoup tag has a get_text() method. It strips away all the HTML tags inside the element and returns only the human-readable text. This is the main method you will use throughout the book to extract actual values from elements.
The difference matters. Without arguments, get_text() runs all the text together with only the original whitespace — which can produce strings like "\n Αττική\n Τύπος: ...\n". Passing " " as a separator and strip=True gives you a clean, readable string. Always use get_text(" ", strip=True) unless you specifically need the raw form.
.string attribute that returns the text only if the element contains no child tags — just plain text. If there are child tags, .string returns None. Use get_text() because it always works, regardless of inner structure.
Part Four
Navigating Into Nested Tags
You can chain tag-name access to navigate down into the document tree. soup.div.table.tr.td walks down the nesting — outermost div, then the table inside it, then the row inside the table, then the first cell. Each step returns the first matching child.
Chaining is convenient for simple, predictable structures. In practice, real pages have more variation — sometimes a tag is missing, sometimes there are extra wrappers. The find() method you will learn next handles these cases more robustly, so it is what you will use most often.
None. The next .something on None will raise an AttributeError. The safe version of the chain above is soup.find("td"), which you will learn in the next chapter.
Part Five
Your Turn — Read a Card's Contents
The cell below contains a complete incident card. Use soup. tag-name access and get_text() to extract three things: the section heading, the location from the first cell, and the start date from the second cell.
BeautifulSoup(html, "html.parser"); how prettify() helps you inspect the structure; how to access tags directly with dot notation; how get_text(" ", strip=True) extracts clean text; and how to chain tag access to navigate nested structures. In the next chapter you will use find() and find_all() to search for elements across the whole document.
Chapter Navigation
Move between chapters.