Book 5 — Web Scraping with Python

Python for All

Chapter Five — Searching by Class and id

Thanasis Troboukis  ·  All Books

Book Five · Chapter Five

Searching by Class and id

HTML elements carry a class attribute that groups them visually and an id attribute that uniquely identifies them. These two attributes are how a scraper distinguishes between "all the red panels" and "all the green panels" — between active fires and contained ones.

Searching by Class Name

The class_ argument (note the underscore — Python's built-in class keyword would conflict without it) lets you filter by CSS class name. Pass it to find() or find_all() alongside a tag name and you get only the elements that have that class:

Python · Try it

      

An element can have multiple class names at once — class="panel panel-red" means the element belongs to both the panel group and the panel-red group. BeautifulSoup's class_ argument matches if the specified name is anywhere in the class list, so class_="panel-red" correctly finds elements whose class attribute is "panel panel-red".

class_ vs class: Python uses class as a keyword for defining new classes. BeautifulSoup uses class_ (with a trailing underscore) as the argument name to avoid the conflict. Do not forget the underscore — class= will raise a SyntaxError.

Reading Status from the Class Name

The class name is data, not just a styling hook. When an incident card uses panel-red for active fires and panel-green for contained ones, the class name itself tells you the status. You can read it with the .get("class") method, which returns the class list as a Python list of strings:

Python · Try it

      

.get("class") returns a list like ["panel", "panel-red"]. The loop checks each class name in the list against the status dictionary and sets the status when it finds a match. This is cleaner than hardcoding "panel-red" in every comparison.

.get() vs ["class"]: You can access an attribute with tag["class"] (like a dictionary) or with tag.get("class"). The difference: tag["class"] raises a KeyError if the attribute is missing; tag.get("class") returns None instead. Use .get() for any attribute that might be absent.

Searching by id

The id attribute uniquely identifies one element on a page. When you know the id, you can jump directly to that element without searching the whole tree. Pass id="the-id" to find():

Python · Try it

      

Once you have the section by id, every subsequent search is scoped to that section. Calling forest_section.find_all("div", class_="panel") returns only the panels inside the forest fires section, not the panel in the urban fires section.

id is unique — class is a group: Only one element per page should have a given id. Any number of elements can share the same class name. Use id= when you need to pinpoint one specific element; use class_= when you want to collect a category of elements.

CSS Selectors with select()

BeautifulSoup also supports CSS selector syntax through the select() method. If you have ever written CSS stylesheets, you already know these patterns. select() always returns a list; select_one() returns the first match or None.

Python · Try it

      
select() or find_all() — which to use? They produce the same results. Use select() when the CSS selector syntax is more concise — especially when you need to combine tag and class in one expression like "div.panel-red td". Use find_all() when you prefer explicit keyword arguments. Both are correct.

Your Turn — Active Fires Only

Use find_all() with class_="panel-red" to extract only the active incidents from the page below. For each active panel, print the location and start date.

Python · Your turn

      
What you learned in this chapter: how to use class_= to filter elements by CSS class name; how to read the class list from a tag with .get("class"); how to jump to a unique element with find(id="..."); and how select() lets you use CSS selector syntax for concise targeting. In the next chapter you will go deeper into extracting text and attributes — including link URLs and custom data attributes.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…