Weekend Challenges

Extended Challenges

  • Data pipeline: Write a Python script that downloads a public dataset (e.g. NYC taxi data or a CSV from Our World in Data), validates every row against a schema (use pydantic or manual checks), transforms the data, and writes a cleaned output file. Handle malformed rows gracefully with a log entry and a skip.
  • Type annotations: Add type hints to every function in your project. Install mypy and run mypy catalogue/ --strict. Fix every error until mypy exits cleanly. Notice how type errors reveal logic bugs.
  • Publish a real CLI tool: Package your Week 3 CLI utility as a proper Python package with an [project.scripts] entry point in pyproject.toml. Install it locally with pip install -e . and run it by name from any directory.
  • Concurrency exploration: Rewrite a slow loop (e.g. fetching data from 20 URLs sequentially) using asyncio with aiohttp or httpx. Compare the wall-clock time of the sequential vs async version using time or timeit.
  • Hypothesis property-based testing: Install hypothesis and write a property-based test for your statistics function: assert that mean is always between min and max for any non-empty list of integers. Let Hypothesis find edge cases you would not have thought of.

Reflection

  • How does a lock file differ from a pinned requirements.txt? In what scenario could even a pinned requirements file produce a different environment on two machines?
  • What is the difference between a direct dependency and a transitive dependency? Who is responsible for fixing a vulnerability in a transitive dependency?
  • Why is mypy --strict significantly more demanding than basic type hints? What categories of bugs did it find in your code?
  • When would you choose asyncio over threading over multiprocessing in Python? What is the GIL and why does it matter?
  • Review your test suite: are you testing behaviour or implementation? If you refactored the internals of a function without changing its public interface, should your tests still pass?