Book 4 — Data Analysis with Python

Python for All

Chapter Seven — Sorting and New Columns

Thanasis Troboukis  ·  All Books

Book Four · Chapter Seven

Sorting and New Columns

Sorting makes patterns visible. Adding computed columns turns raw prices into derived insights — price per gram, affordability scores, percentage change. This chapter covers both.

.sort_values() — Ordering Your Data

A table of prices sorted from cheapest to most expensive is far more readable than the same data in arbitrary order. .sort_values() sorts a DataFrame by one or more columns.

Python · Try it — sort ascending (cheapest first)

      
Python · Try it — sort descending (most expensive first)

      

Sort by multiple columns

You can sort by more than one column. The first column is the primary sort key; tied rows are then broken by the second column.

Python · Try it — sort by category, then by price

      

Adding New Columns

One of the most common data analysis tasks is creating a derived column — a new column calculated from existing ones. You create it simply by assigning to a new column name.

A price-per-100g column

Our dataset has prices for items sold in different weights. To compare fairly, we need a common unit. The data here uses fixed standard weights for illustration.

The calculation happens in two steps. First, price / weight_g gives the price for 1 gram. For example, bread costs 1.29 / 400 = 0.003225, so one gram of bread costs about €0.003225. That number is correct, but it is awkward to read because it is so small.

Then we multiply by 100 to convert the result from price per 1 gram into price per 100 grams. So for bread, 0.003225 × 100 = 0.3225, which means the bread costs about €0.3225 per 100g. That is a much easier unit for comparing foods.

Python · Try it — calculated column

      

When you sort by price_per_100g, the true value ranking is different from the raw price ranking. Rice looks cheap at €0.89/kg, but so does pasta once you normalise by weight. Salmon and cheese are significantly more expensive per gram — an insight hidden in the raw prices.

A price-above-average flag

Sometimes you do not need another number. You just want a quick label that says whether each item is above the average price or not. That is what a flag does. It creates a new column of True and False values. If an item's price is greater than the mean price, the flag is True. Otherwise it is False.

Python · Try it — boolean column

      

.loc[] — Classifying Prices into Bands

Sometimes you want to turn numeric values into labels — for example, grouping prices into bands such as "budget", "mid-range", and "premium". You can do this with a new column plus a few .loc[] assignments.

Python · Try it

      
Order matters: Start with the broadest default label, then overwrite it with narrower conditions. Here we begin with "Premium", then replace some rows with "Mid-range", then replace the cheapest rows with "Budget".

String Methods with .str

pandas exposes Python string methods on text columns through the .str accessor. This lets you clean, search, and transform text without writing loops.

Method What It Does Example
.str.upper() Converts text to uppercase. "Olive Oil" becomes "OLIVE OIL".
.str.lower() Converts text to lowercase. "Milk" becomes "milk".
.str.len() Counts how many characters each string has. "Olive Oil" has length 9.
.str.contains("...") Checks whether text contains a pattern. .str.contains(" ") finds names with a space, such as "Olive Oil".
.str.startswith("...") Checks whether text begins with certain characters. .str.startswith("M") matches "Milk".
.str.endswith("...") Checks whether text ends with certain characters. .str.endswith("d") matches "Bread".
.str.replace(old, new) Replaces one piece of text with another. "Olive Oil" can become "Olive-Oil".
.str.strip() Removes spaces from the beginning and end of text. " Milk " becomes "Milk".
What does regex=True mean? It tells pandas to treat the search text as a pattern, not as ordinary text. For example, df["item"].str.contains("^B", regex=True) means “find names that start with the letter B”, so it matches "Bread" and "Butter". If you want pandas to look for the exact characters only, use regex=False.
Python · Try it

      

Your Turn — Build a Price Index

Create a "value score" column that divides each item's price by the overall mean price. A score below 1.0 means the item is cheaper than average; above 1.0 means it is more expensive. Sort the result from best value to worst.

Python · Your turn

      
What you learned in this chapter: sorting with .sort_values() by one or multiple columns; creating new columns via arithmetic on existing ones; classifying values into categories with .loc[] and boolean conditions; and using .str methods to work with text columns. In the final chapter, you will learn to handle missing data and put all these skills together in a complete analysis.

Chapter Navigation

Move between chapters.

Loading Python environment — this may take a moment…