Weekend Challenges

Extended Challenges

Data pipeline: Write a Python script that downloads a public dataset (e.g. NYC taxi data or a CSV from Our World in Data), validates every row against a schema (use pydantic or manual checks), transforms the data, and writes a cleaned output file. Handle malformed rows gracefully with a log entry and a skip.
Type annotations: Add type hints to every function in your project. Install mypy and run mypy catalogue/ --strict. Fix every error until mypy exits cleanly. Notice how type errors reveal logic bugs.
Publish a real CLI tool: Package your Week 3 CLI utility as a proper Python package with an [project.scripts] entry point in pyproject.toml. Install it locally with pip install -e . and run it by name from any directory.
Concurrency exploration: Rewrite a slow loop (e.g. fetching data from 20 URLs sequentially) using asyncio with aiohttp or httpx. Compare the wall-clock time of the sequential vs async version using time or timeit.
Hypothesis property-based testing: Install hypothesis and write a property-based test for your statistics function: assert that mean is always between min and max for any non-empty list of integers. Let Hypothesis find edge cases you would not have thought of.

How does a lock file differ from a pinned requirements.txt? In what scenario could even a pinned requirements file produce a different environment on two machines?
What is the difference between a direct dependency and a transitive dependency? Who is responsible for fixing a vulnerability in a transitive dependency?
Why is mypy --strict significantly more demanding than basic type hints? What categories of bugs did it find in your code?
When would you choose asyncio over threading over multiprocessing in Python? What is the GIL and why does it matter?
Review your test suite: are you testing behaviour or implementation? If you refactored the internals of a function without changing its public interface, should your tests still pass?