Part One
stripped_strings — Line by Line
When a tag contains nested text on separate lines, stripped_strings gives you the clean pieces one by one. This is useful when a card contains the date, title, and author as separate text nodes.
This gives you separate text nodes instead of one long messy string. You will often use it together with get_text(" ", strip=True) when a title contains line breaks.
list(tag.stripped_strings). If the text is already nicely separated, keep it. If you want one clean sentence, use tag.get_text(" ", strip=True).
Part Two
Reading Attributes
The visible title is text. The article URL and publication timestamp are attributes. Use tag.get("href") and tag.get("datetime") to read them.
Attributes often hold the most useful machine-readable values on a page. The title is for humans. The link and the ISO datetime are perfect for data collection.
.get() for safety: tag["href"] crashes if the attribute is missing. tag.get("href") returns None instead.
Part Three
Navigating with .parent
Sometimes you find the smallest useful tag first, like span.card-title, and then need to move upward to reach the link or the full article card.
This pattern is common in scraping: locate the most specific tag first, then climb to the surrounding container.
Part Four
find_next_sibling()
The article card places related fields next to each other: time, then link, then author. find_next_sibling() lets you move across that structure.
Use find_next_sibling(), not .next_sibling. The raw sibling is often just a newline or whitespace node.
Part Five
Your Turn — Build One Record
Extract the date, title, author, and link from the card below, then store them in a dictionary.
href and datetime, how to move upward with .parent, how to move sideways with find_next_sibling(), and how to turn one article card into one Python dictionary. In the next chapter you will do this for many cards and load the result into a DataFrame.
Chapter Navigation
Move between chapters.