Python Mastery

Every concept, topic by topic. From your first variable to building data structures, ML-ready.

Table of Contents

01 The Basics

What is PythonPython is a high-level, interpreted, dynamically typed, garbage-collected, multi-paradigm programming language created by Guido van Rossum and first released in 1991. It was designed with a philosophy of code readability — enforced by significant whitespace (indentation instead of braces) and a guiding document called The Zen of Python (import this). Python is the reference implementation (CPython), but alternative implementations exist: PyPy (JIT-compiled), Jython (runs on JVM), IronPython (.NET), and MicroPython (embedded). Python consistently ranks #1 in the TIOBE, PYPL, and Stack Overflow surveys, making it the world's most popular programming language for 2024-2026.
Core design philosophy
  • Readability counts: Code is read far more often than written — Python prioritizes clean, English-like syntax.
  • Batteries included: The standard library ships with modules for JSON, HTTP, sockets, threading, regex, SQLite, CSV, XML, crypto, and more — no extra installs needed.
  • Duck typing: "If it walks like a duck and quacks like a duck, it's a duck." Type compatibility is determined at runtime by available methods, not declared types.
  • Everything is an object: Functions, classes, modules, numbers, and even types themselves are first-class objects you can pass around, inspect, and modify.
  • One obvious way: Prefer one clear idiomatic solution over many clever ones.
How it differs
  • vs JavaScript: Python uses indentation instead of { }, has true integers with arbitrary precision (not all numbers are float64), and the standard library is vast. JS has event-loop concurrency by default; Python requires explicit asyncio.
  • vs Java: No compile step, no static types required, no verbose boilerplate. A Python "hello world" is print("hi") vs Java's class+main+System.out. Python has true REPL, Java does not (traditionally).
  • vs Go: Python is dynamically typed and interpreted; Go is statically typed and compiles to a native binary. Go starts faster and has true parallelism; Python has the GIL limiting CPU parallelism.
  • vs C/C++: No manual memory management, no pointers, no headers, no compile step. Python is 10-100× slower on CPU-bound work but 10× faster to write.
  • vs Ruby: Closest cousin philosophically. Python emphasizes "one way to do it," Ruby embraces "many ways." Python dominates data science; Ruby dominates web (Rails).
Why use PythonPython dominates in data science, machine learning, AI, scripting, automation, scientific computing, web backends, DevOps, and education. The ecosystem is unmatched: NumPy, Pandas, PyTorch, TensorFlow, scikit-learn, Django, Flask, FastAPI, Requests, Jupyter. Python is the glue language — perfect for rapid prototyping, ad-hoc data analysis, and bolting together C/C++ libraries via bindings. It is also the #1 teaching language in universities worldwide.
Common gotchas
  • Indentation matters: Mixing tabs and spaces raises IndentationError. Always use 4 spaces (PEP 8).
  • The GIL: CPython can only execute one Python bytecode at a time — real CPU parallelism requires multiprocessing or native extensions.
  • Mutable default arguments: def f(x=[]) shares the same list across calls — a classic footgun.
  • Speed: Pure Python is ~50× slower than C. Hot loops need NumPy, Cython, or Numba.
Real-world usersGoogle (early core infra, YouTube), Instagram (entire backend is Django), Spotify (data pipelines + backend services), Dropbox (desktop client + server), Netflix (ML infra), NASA (scientific computing), OpenAI (GPT training pipelines), Meta (ML research via PyTorch), Reddit (originally), Pinterest, Quora, and essentially every AI/ML company on earth.

What is Python?

Python is an interpreted, dynamically typed, garbage-collected language.

# How Python runs your code behind the scenes:
# 1. Lexer tokenizes your source code
# 2. Parser builds an AST (Abstract Syntax Tree)
# 3. Compiler turns AST → bytecode (.pyc files)
# 4. Python Virtual Machine (PVM) executes bytecode

# You can actually SEE the bytecode:
import dis
dis.dis(lambda: print("hello"))
# LOAD_GLOBAL    0 (print)
# LOAD_CONST     1 ('hello')
# CALL_FUNCTION  1
# RETURN_VALUE

Variables & Data Types

A variable is a name that points to an object in memory. It's a label stuck on a box, not a box itself.

# ── Basic types ──
name = "Yatin"           # str
age = 25                  # int
height = 5.9              # float
is_dev = True             # bool
nothing = None            # NoneType

print(type(name))    # <class 'str'>
print(type(age))     # <class 'int'>

# ── Everything is an object ──
x = 42
print(id(x))      # memory address
print(type(x))    # <class 'int'>

# Two names can point to the SAME object
y = x
print(id(x) == id(y))  # True

# Python caches small integers (-5 to 256)
a = 256
b = 256
print(a is b)  # True — same cached object

Mutable vs Immutable

Immutable objects can't be changed after creation — modifying them creates a new object. Mutable objects can be changed in-place.

Immutable (can't change)Mutable (can change in-place)
int, float, bool, str, tuple, frozensetlist, dict, set, bytearray
# Immutable — "changing" creates a NEW object
s = "hello"
print(id(s))      # 4377001200
s = s + " world"
print(id(s))      # 4377055600 — different object!

# Mutable — you modify the SAME object
lst = [1, 2, 3]
print(id(lst))    # 4377102400
lst.append(4)
print(id(lst))    # 4377102400 — same object!

Operators

Symbols that perform operations on values — arithmetic, comparison, logic, and assignment.

# ── Arithmetic ──
print(10 / 3)     # 3.333 (true division — ALWAYS float)
print(10 // 3)    # 3     (floor division)
print(10 % 3)     # 1     (modulo)
print(10 ** 3)    # 1000  (power)

# ── Identity: is vs == ──
a = [1, 2]
b = [1, 2]
print(a == b)     # True  — same VALUE
print(a is b)     # False — different OBJECTS

# ── Short-circuit ──
print("" or "default")       # "default" — common pattern!
print("hello" and "world")   # "world"

# ── Walrus operator := (3.8+) ──
data = [1, 2, 3, 4, 5]
if (n := len(data)) > 3:
    print(f"Too many: {n}")  # Too many: 5

# ── Truthy / Falsy ──
# Falsy: False, 0, 0.0, "", [], {}, set(), None
# Everything else is Truthy
print(bool([]))       # False
print(bool([0]))      # True — list has an element

Type Conversion

Convert between types using built-in functions like int(), str(), float(), and list().

x = "42"
print(int(x))       # 42
print(float(x))     # 42.0
print(str(42))      # "42"
print(bool(0))      # False
print(list("abc"))  # ['a', 'b', 'c']

02 Strings

What is itA Python str is an immutable sequence of Unicode code points. Since Python 3, all strings are Unicode by default — there is no separate "bytes string" type (use bytes for raw byte data). Strings support indexing, slicing, iteration, and a rich set of methods. Because they are immutable, every "modification" creates a new string object.
Key features
  • f-strings: f"hello {name}" — introduced in Python 3.6, now the preferred formatting style. Supports inline expressions, formatting specs, and debug syntax f"{x=}".
  • Multiple quote styles: Single ', double ", triple ''' or """ for multiline. Raw strings r"\n" ignore backslashes.
  • Unicode-native: len("café") is 4, len("café".encode()) is 5 (UTF-8).
  • Rich methods: .strip(), .split(), .join(), .replace(), .upper(), .startswith(), .format(), plus 40+ more.
  • Interning: Short strings and identifiers are automatically interned — "foo" is "foo" is often True (implementation detail).
How it differs
  • vs JavaScript: JS strings are also immutable, but Python distinguishes str from bytes. Python has triple-quoted multiline strings built in; JS uses template literals.
  • vs Java: Java strings are also immutable UTF-16, but Python's f-strings are far more ergonomic than Java's String.format(). Python slicing s[1:5] beats s.substring(1,5).
  • vs Go: Go strings are immutable UTF-8 byte sequences — indexing gives bytes, not characters. Python indexing gives code points.
  • vs C/C++: No null-terminator, no buffer-overflow risk, no manual allocation. Python pays with higher memory overhead (~49 bytes per empty string).
Why immutability mattersImmutability makes strings hashable (usable as dict keys), thread-safe (no locking required), and enables interning optimizations. The downside: repeated concatenation in a loop is O(n²) — always use "".join(list_of_strings) instead of s += x.
Common gotchas
  • Encoding confusion: strbytes. Reading a file without encoding="utf-8" uses the system default — a bug factory on Windows.
  • String + int: "age: " + 25 raises TypeError. Unlike JS, Python does not auto-coerce.
  • Concatenation in loops: O(n²) behavior — use join() or io.StringIO.
  • Comparing with is: s is "hello" may work due to interning but is not guaranteed. Always use ==.
Real-world examplesf-strings are ubiquitous in logging (log.info(f"user {uid} logged in")), SQL query building (with parameterization!), and template rendering. The str.format_map() method powers Jinja2 templates. re (regex) and string.Template are used heavily in Django/Flask projects.

Strings are immutable sequences of Unicode characters. Every operation creates a new string.

# ── Creation ──
s1 = 'single'
s2 = "double"                           # identical
s3 = """multi
line"""
s4 = r"raw: \n stays literal"           # no escape processing

# ── f-strings (THE way to format) ──
name = "Yatin"
print(f"Hello {name}")                  # Hello Yatin
print(f"{3.14159:.2f}")                  # 3.14
print(f"{1000000:,}")                    # 1,000,000
print(f"{'hello':*^20}")                 # *******hello********

# ── Indexing & Slicing ──
s = "Python"
print(s[0])        # P
print(s[-1])       # n
print(s[0:3])      # Pyt    (stop is EXCLUDED)
print(s[::-1])      # nohtyP (reverse)

# ── Common Methods ──
s = "  Hello, World!  "
s.strip()            # "Hello, World!"
s.lower()            # "  hello, world!  "
s.upper()            # "  HELLO, WORLD!  "
s.strip().split(",") # ['Hello', ' World!']
"-".join(["a","b"])  # "a-b"
"hello".replace("l", "L")   # "heLLo"
"hello".startswith("he")    # True
"hello".find("ll")          # 2 (index, -1 if not found)
"42".isdigit()               # True

03 Control Flow

What is itControl flow determines the order of execution: conditionals (if/elif/else), loops (for, while), loop control (break, continue, else-on-loop), and structural pattern matching (match/case, added in Python 3.10). Python uses indentation — typically 4 spaces — to define blocks, eliminating the need for curly braces.
Key constructs
  • if/elif/else: Standard conditionals. No parentheses required. elif is the keyword (not else if).
  • for item in iterable: Pythonic iteration — always iterates over an iterable, never a C-style counter. Use range(n) or enumerate() for index.
  • while: Runs until condition is false. Common pattern: while True: ... if done: break.
  • match/case: Structural pattern matching — destructures tuples, dicts, and class instances. More powerful than C's switch.
  • Loop else: The else clause on a loop runs if the loop completes without break — a unique and often confusing Python feature.
  • Ternary: x if cond else y — inline conditional expression.
  • Walrus :=: Assignment expression (3.8+) — if (n := len(a)) > 10:.
How it differs
  • vs C/Java/Go: No braces — indentation is syntactically significant. No C-style for(i=0; i<n; i++) — use for i in range(n). No switch until 3.10's match.
  • vs JavaScript: No implicit truthy-to-boolean coercion surprises ([] is falsy in Python; truthy in JS). Python's for...of equivalent is just for.
  • vs Ruby: Similar philosophy but Ruby uses end keywords; Python uses dedent. Ruby has unless; Python does not.
Why use itPython's control flow is optimized for readability. for item in collection is cleaner than managing an index variable. The walrus operator eliminates duplicate work in while loops reading from files or sockets. Pattern matching (match) is ideal for parsing ASTs, protocol messages, and tagged unions.
Common gotchas
  • Mutating during iteration: Modifying a list while iterating over it causes silent bugs — iterate over a copy (list(lst)) instead.
  • Loop variable leaks: Unlike JavaScript let, the loop variable persists after the loop ends.
  • Loop else: Counter-intuitive — runs only if no break occurred.
  • Chained comparisons: 1 < x < 10 works (unlike C where it's two comparisons).
Real-world examplesPattern matching is used heavily in interpreter/compiler projects (e.g., parsing JSON/AST trees). Walrus is common in while chunk := f.read(4096): streaming loops. Loop-else is used in search loops where you only execute fallback logic if nothing was found.
# ── if / elif / else ──
score = 85
if score >= 90:
    grade = "A"
elif score >= 80:
    grade = "B"
else:
    grade = "F"

# ── Ternary ──
status = "adult" if age >= 18 else "minor"

# ── match/case (3.10+) — Structural Pattern Matching ──
point = (3, 4)
match point:
    case (0, 0):
        print("Origin")
    case (x, 0):
        print(f"On x-axis at {x}")
    case (x, y):
        print(f"Point at {x}, {y}")  # Point at 3, 4

# ── Loops ──
for fruit in ["apple", "banana"]:
    print(fruit)

for i in range(5):          # 0, 1, 2, 3, 4
    print(i)

for i in range(2, 10, 3): # 2, 5, 8  (start, stop, step)
    print(i)

# ── enumerate & zip ──
fruits = ["apple", "banana"]
for i, f in enumerate(fruits):
    print(f"{i}: {f}")          # 0: apple, 1: banana

names = ["Alice", "Bob"]
scores = [90, 85]
for name, score in zip(names, scores):
    print(f"{name}: {score}")

# ── while ──
count = 0
while count < 5:
    count += 1

# ── break, continue, for...else ──
for n in range(10):
    if n == 3: continue    # skip 3
    if n == 7: break       # stop at 7
    print(n)

# for...else — else runs if loop completed WITHOUT break
for n in range(2, 10):
    for x in range(2, n):
        if n % x == 0: break
    else:
        print(f"{n} is prime")

04 Functions

What is itA Python function is a first-class object defined with def (or anonymously with lambda). Because functions are objects, you can assign them to variables, pass them as arguments, return them from other functions, store them in lists/dicts, and attach attributes to them. Functions have introspection data (__name__, __doc__, __annotations__, __defaults__, __closure__) and support a remarkably flexible argument system: positional, keyword, default, variadic *args, keyword-only **kwargs, positional-only /, and keyword-only * separators.
Key features
  • Default arguments: def f(x=10): — evaluated once at definition time, not per call. This is the source of the infamous "mutable default" bug.
  • Keyword arguments: f(name="Alice", age=30) — improves readability, order doesn't matter.
  • Variadic args: *args collects extras into a tuple; **kwargs into a dict.
  • Positional-only / keyword-only: def f(a, b, /, c, *, d):a,b must be positional, d must be keyword.
  • Docstrings: First string literal in the body becomes __doc__ — used by help() and doc generators like Sphinx.
  • Type hints: def add(x: int, y: int) -> int: — ignored at runtime, used by mypy, IDEs, and runtime validators like Pydantic.
  • Lambdas: Single-expression anonymous functions — lambda x: x*2. No statements allowed.
How it differs
  • vs JavaScript: Python has no hoisting — functions must be defined before use. JS's function declarations hoist; Python's def does not. Python has true keyword args; JS fakes them with object destructuring.
  • vs Java: Java overloads by signature (multiple add(int, int) / add(double, double)); Python does not — last definition wins. Python uses default args and @singledispatch instead.
  • vs Go: Go returns multiple values natively (return x, err); Python fakes this with tuples. Go has no default args or keyword args at all.
  • vs C++: No function overloading, no templates, no const-ness. But Python functions can take any callable with any shape.
  • vs Ruby: Ruby makes return optional (last expression is returned); Python requires explicit return. Ruby blocks/procs are more central than Python lambdas.
Why use itFunctions are the fundamental unit of reuse and abstraction in Python. The flexible argument system means one function can serve many call styles. First-class function support enables decorators, higher-order functions, callbacks, strategy patterns, and functional composition — patterns that require verbose boilerplate in Java/C++.
Common gotchas
  • Mutable default args: def f(items=[]): — the list is shared across all calls! Use def f(items=None): items = items or [].
  • Late binding in closures: [lambda: i for i in range(3)] all return 2, not 0,1,2. Fix with lambda i=i: i.
  • Missing return: Functions without return implicitly return None — easy to forget.
  • Shadowing built-ins: def list(): ... silently hides the built-in list type.
Real-world examplesEvery Python framework is built on functions: Flask/FastAPI route handlers, Django views, pytest test functions, Click/Typer CLI commands, Celery tasks, AWS Lambda handlers. The functools module (reduce, partial, lru_cache, wraps) is pure functional-programming gold used in nearly every real codebase.
# ── Defining ──
def greet(name):
    """Return a greeting. This is a docstring."""
    return f"Hello, {name}!"

print(greet("Yatin"))   # Hello, Yatin!

# ── Default parameters ──
def power(base, exp=2):
    return base ** exp

print(power(3))      # 9
print(power(3, 3))   # 27

# ── Multiple return values (actually a tuple) ──
def divide(a, b):
    return a // b, a % b

q, r = divide(17, 5)  # q=3, r=2

# ── Keyword arguments ──
def create_user(name, age, role="viewer"):
    return {"name": name, "age": age, "role": role}

user = create_user(age=25, name="Yatin", role="admin")
DANGER: Mutable Default Arguments — #1 Python gotcha. Default args are evaluated ONCE at function definition, not each call.
# ── THE BUG ──
def append_to(element, target=[]):   # DON'T DO THIS
    target.append(element)
    return target

print(append_to(1))  # [1]
print(append_to(2))  # [1, 2]  — WHAT?! Remembered the old list!

# ── THE FIX ──
def append_to(element, target=None):
    if target is None:
        target = []
    target.append(element)
    return target

Lambda — Anonymous Functions

Small one-line functions without a name — perfect for quick inline operations like sorting or filtering. Syntax: lambda arguments: expression

Basic Syntax

# lambda is just a shortcut for simple functions
square = lambda x: x ** 2
square(5)   # 25

# Equivalent to:
def square(x):
    return x ** 2

# Multiple arguments
add = lambda a, b: a + b
add(3, 4)    # 7

# No arguments
greet = lambda: "Hello!"
greet()      # "Hello!"

# With default values
power = lambda x, n=2: x ** n
power(3)       # 9  (default n=2)
power(3, 3)    # 27

Sorting with Lambda

# Sort by string length
names = ["Charlie", "Alice", "Bob"]
names.sort(key=lambda n: len(n))
# ['Bob', 'Alice', 'Charlie']

# Sort list of tuples by second element
students = [("Alice", 90), ("Bob", 75), ("Carol", 85)]
students.sort(key=lambda s: s[1])
# [('Bob', 75), ('Carol', 85), ('Alice', 90)]

# Sort dicts by a specific key
users = [
    {"name": "Yatin", "age": 25},
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 20}
]
users.sort(key=lambda u: u["age"])
# sorted by age: Bob(20), Yatin(25), Alice(30)

# Sort dict by value
scores = {"Alice": 90, "Bob": 85, "Carol": 95}
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
# [('Carol', 95), ('Alice', 90), ('Bob', 85)]

# Multi-key sort: first by age, then by name
people = [("Bob", 25), ("Alice", 25), ("Carol", 20)]
people.sort(key=lambda p: (p[1], p[0]))
# [('Carol', 20), ('Alice', 25), ('Bob', 25)]

Lambda with map, filter, reduce

nums = [1, 2, 3, 4, 5]

# map — apply function to every element
squares = list(map(lambda x: x**2, nums))
# [1, 4, 9, 16, 25]

# filter — keep elements that return True
evens = list(filter(lambda x: x % 2 == 0, nums))
# [2, 4]

# reduce — combine all elements into one value
from functools import reduce
total = reduce(lambda a, b: a + b, nums)
# 15  (1+2+3+4+5)

product = reduce(lambda a, b: a * b, nums)
# 120  (1*2*3*4*5)

# But list comprehensions are usually cleaner!
squares = [x**2 for x in nums]        # prefer over map + lambda
evens = [x for x in nums if x % 2 == 0] # prefer over filter + lambda

Lambda with Other Built-ins

# min/max with key
words = ["python", "go", "javascript", "rust"]
shortest = min(words, key=lambda w: len(w))   # "go"
longest = max(words, key=lambda w: len(w))    # "javascript"

# Find user with highest score
users = [{"name": "Alice", "score": 90}, {"name": "Bob", "score": 95}]
best = max(users, key=lambda u: u["score"])
# {'name': 'Bob', 'score': 95}

Conditional Logic in Lambda

# Ternary expression inside lambda
check = lambda x: "even" if x % 2 == 0 else "odd"
check(4)    # "even"
check(7)    # "odd"

grade = lambda s: "A" if s >= 90 else "B" if s >= 80 else "C"
grade(95)   # "A"
grade(85)   # "B"
grade(70)   # "C"

Lambda vs def — When to Use Which

# ✓ USE lambda — simple, one-line, inline with sort/map/filter
names.sort(key=lambda n: n.lower())

# ✗ DON'T assign lambda to a variable (use def instead)
# Bad:
multiply = lambda x, y: x * y    # PEP 8 discourages this

# Good:
def multiply(x, y):
    return x * y

# ✗ DON'T use lambda for complex logic
# Lambda can only have ONE expression — no statements, loops, or assignments
# lambda x: for i in x: print(i)  — SyntaxError!

Functions are Objects

In Python, functions can be stored in variables, passed as arguments, and put in data structures — just like any other value.

def add(a, b): return a + b
def sub(a, b): return a - b

# Assign to variable, put in dict, pass as argument
ops = {"+": add, "-": sub}
print(ops["+"](10, 3))   # 13

# Higher-order function — takes a function as argument
def apply(func, x, y):
    return func(x, y)

print(apply(add, 5, 3))  # 8

# map, filter
nums = [1, 2, 3, 4]
list(map(lambda x: x**2, nums))          # [1, 4, 9, 16]
list(filter(lambda x: x % 2 == 0, nums)) # [2, 4]

05 Data Structures

What is itPython ships with four core built-in container types baked into the language: list (ordered, mutable, allows duplicates), tuple (ordered, immutable, hashable if contents are), dict (insertion-ordered since 3.7, mutable key-value map with O(1) average lookup), and set (unordered, mutable, no duplicates, hash-based). All four are implemented in C for speed and are deeply integrated with syntax (literals, comprehensions, unpacking, destructuring).
The four core types
  • list: [1, 2, 3] — dynamic array (amortized O(1) append), supports slicing, comprehensions, and in-place mutation.
  • tuple: (1, 2, 3) — immutable, slightly smaller and faster than list, hashable (can be a dict key), used for fixed records and multi-return values.
  • dict: {"k": 1} — open-addressing hash table with key-order preservation since 3.7. Used as the backbone of Python (modules, classes, namespaces are all dicts).
  • set: {1, 2, 3} — hash set with O(1) membership test. Supports union |, intersection &, difference -, symmetric diff ^.
  • collections extras: deque (O(1) appends both ends), defaultdict, Counter, OrderedDict, ChainMap, namedtuple.
How it differs
  • vs JavaScript: JS has only Array and Object (plus ES6 Map/Set). Python's dict is insertion-ordered natively; JS object key order is quirky. Python tuples have no JS equivalent.
  • vs Java: Java requires ArrayList<Integer>, HashMap<K,V>, HashSet<T> with explicit generics. Python's literal syntax ([], {}, ()) is vastly terser.
  • vs Go: Go has slices, maps, and no built-in set. Go maps have no ordering guarantee; Python dicts do. Go has no tuple literals (use structs).
  • vs C++: Python dict ≈ std::unordered_map, list ≈ std::vector, set ≈ std::unordered_set, but with type-erased elements and GC.
  • vs Ruby: Ruby has Array, Hash, Set (stdlib), Symbol — no immutable tuple equivalent. Ruby hash also preserves insertion order.
Why use eachlist for ordered collections you'll modify; tuple for fixed records, multi-return, and dict keys; dict for key-value lookup and as a general-purpose record type; set for deduplication and membership testing. Correct choice is often O(n²) vs O(n) — e.g., use a set, not a list, for if x in collection in a hot loop.
Common gotchas
  • Shallow copy: a = b[:] copies the outer list but nested lists are still shared.
  • Mutable keys: Only hashable objects can be dict keys — lists and dicts can't.
  • Set of lists: {[1,2]} raises TypeError — lists are unhashable.
  • Dict iteration while mutating: Raises RuntimeError: dictionary changed size during iteration.
  • Tuple of one: (1) is just 1; you need (1,) — trailing comma!
Real-world examplesJSON parsing produces nested dict/list. Database rows are often tuple or namedtuple. Config files load as dict. Deduplication uses set(items). Django ORM QuerySets, pandas Index, Flask's request.args — all built on these containers.

Lists — Ordered, Mutable

The most common data structure — you can add, remove, and change elements freely. Keeps insertion order.

Creating Lists

# Different ways to create
empty = []
nums = [1, 2, 3, 4, 5]
mixed = [1, "hello", 3.14, True, None]   # any type
nested = [[1, 2], [3, 4], [5, 6]]        # 2D list
from_range = list(range(5))               # [0, 1, 2, 3, 4]
repeated = [0] * 5                        # [0, 0, 0, 0, 0]
from_string = list("hello")               # ['h', 'e', 'l', 'l', 'o']

Access & Slicing

nums = [10, 20, 30, 40, 50]

# Indexing
nums[0]           # 10  — first
nums[-1]          # 50  — last
nums[-2]          # 40  — second to last

# Slicing — [start:stop:step]
nums[1:4]         # [20, 30, 40]  — index 1 to 3
nums[:3]          # [10, 20, 30]  — first 3
nums[2:]          # [30, 40, 50]  — from index 2 onwards
nums[::]          # [10, 20, 30, 40, 50]  — full copy
nums[::-1]        # [50, 40, 30, 20, 10]  — reversed
nums[::2]         # [10, 30, 50]  — every 2nd element

# Slice assignment (replace a range)
nums[1:3] = [200, 300]   # [10, 200, 300, 40, 50]
nums[1:3] = [99]          # [10, 99, 40, 50]  — can change size!

Adding Elements

fruits = ["apple", "banana"]

fruits.append("cherry")          # add ONE item to end
# ['apple', 'banana', 'cherry']

fruits.insert(1, "mango")        # insert at specific index
# ['apple', 'mango', 'banana', 'cherry']

fruits.extend(["grape", "kiwi"]) # add MULTIPLE items from iterable
# ['apple', 'mango', 'banana', 'cherry', 'grape', 'kiwi']

# + creates a NEW list (doesn't modify original)
new = fruits + ["melon"]         # fruits is unchanged!

# Common mistake: append vs extend
a = [1, 2]
a.append([3, 4])     # [1, 2, [3, 4]]  — adds list AS element!
b = [1, 2]
b.extend([3, 4])     # [1, 2, 3, 4]    — adds each item

Removing Elements

nums = [10, 20, 30, 20, 40]

nums.remove(20)       # removes FIRST occurrence only
# [10, 30, 20, 40]

popped = nums.pop()   # remove & return LAST
# popped = 40, nums = [10, 30, 20]

popped = nums.pop(1)  # remove & return at INDEX
# popped = 30, nums = [10, 20]

del nums[0]           # delete by index (no return)
# [20]

nums = [1, 2, 3, 4, 5]
del nums[1:3]         # delete a slice
# [1, 4, 5]

nums.clear()          # remove ALL elements
# []

Searching & Counting

letters = ["a", "b", "c", "b", "a"]

letters.index("b")       # 1 — first occurrence index
letters.index("b", 2)    # 3 — search starting from index 2
letters.count("a")       # 2 — how many times "a" appears

"c" in letters           # True  — membership check
"z" in letters           # False

len(letters)             # 5

Sorting & Reversing

nums = [3, 1, 4, 1, 5]

# In-place sort (modifies original, returns None!)
nums.sort()                      # [1, 1, 3, 4, 5]
nums.sort(reverse=True)         # [5, 4, 3, 1, 1]

# sorted() — returns NEW list, original unchanged
original = [3, 1, 4]
new_sorted = sorted(original)    # [1, 3, 4]
# original is still [3, 1, 4]

# Custom sort with key
words = ["banana", "apple", "cherry"]
words.sort(key=len)              # ['apple', 'banana', 'cherry']

users = [("Yatin", 25), ("Alice", 30), ("Bob", 20)]
users.sort(key=lambda u: u[1])   # sort by age

# Reverse (in-place)
nums = [1, 2, 3]
nums.reverse()                   # [3, 2, 1]
list(reversed(nums))             # returns NEW reversed list

Unpacking & Destructuring

# Basic unpacking
a, b, c = [1, 2, 3]       # a=1, b=2, c=3

# Star unpacking
first, *middle, last = [1, 2, 3, 4, 5]
# first=1, middle=[2, 3, 4], last=5

head, *tail = [1, 2, 3, 4]
# head=1, tail=[2, 3, 4]

*init, last = [1, 2, 3, 4]
# init=[1, 2, 3], last=4

# Swap without temp variable
x, y = 10, 20
x, y = y, x            # x=20, y=10

# Ignore values with _
_, b, _ = [1, 2, 3]     # only care about b

Copying — Shallow vs Deep

import copy

# Shallow copy — 3 ways (all equivalent)
original = [[1, 2], [3, 4]]
a = original.copy()
b = list(original)
c = original[::]

# Shallow = new outer list, but inner lists are SHARED
original[0][0] = 999
print(a[0][0])       # 999 — changed too!

# Deep copy — fully independent at all levels
original = [[1, 2], [3, 4]]
deep = copy.deepcopy(original)
original[0][0] = 999
print(deep[0][0])     # 1 — independent!

Common Patterns

# Enumerate — get index + value
for i, fruit in enumerate(["apple", "banana", "cherry"]):
    print(f"{i}: {fruit}")

# Zip — iterate multiple lists together
names = ["Alice", "Bob"]
ages = [25, 30]
for name, age in zip(names, ages):
    print(f"{name} is {age}")

# List as stack (LIFO)
stack = []
stack.append(1)     # push
stack.append(2)
stack.pop()          # 2 — last in, first out

# Flatten nested list — 3 ways
nested = [[1, 2], [3, 4], [5]]

flat = [x for sub in nested for x in sub]   # comprehension (most Pythonic)
flat = sum(nested, [])                         # shortest but slow for large lists (O(n²))

from itertools import chain
flat = list(chain.from_iterable(nested))        # fastest for big data (O(n))
# all give [1, 2, 3, 4, 5]

# Recursive flatten — handles ANY nesting depth
def flatten(lst):
    result = []
    for item in lst:
        if isinstance(item, (list, tuple)):
            result.extend(flatten(item))   # recurse into nested
        else:
            result.append(item)
    return result

flatten([1, [2, [3, 4]], (5, 6), 7])
# [1, 2, 3, 4, 5, 6, 7]

# Filter with list comprehension
nums = [1, 2, 3, 4, 5, 6]
evens = [n for n in nums if n % 2 == 0]   # [2, 4, 6]

# min, max, sum
min(nums)   # 1
max(nums)   # 6
sum(nums)   # 21

# Check if all/any match a condition
all(n > 0 for n in nums)   # True  — all positive
any(n > 5 for n in nums)   # True  — at least one > 5

List Gotchas

# 1. Mutable default argument — CLASSIC bug
def add_item(item, lst=[]):   # BAD! shared across calls
    lst.append(item)
    return lst

add_item(1)   # [1]
add_item(2)   # [1, 2] — not [2]!

def add_item(item, lst=None):  # GOOD! use None
    if lst is None:
        lst = []
    lst.append(item)
    return lst

# 2. Multiplying nested lists — shared references!
grid = [[0] * 3] * 3       # BAD!
grid[0][0] = 1
# [[1,0,0], [1,0,0], [1,0,0]] — all rows changed!

grid = [[0] * 3 for _ in range(3)]   # GOOD!
grid[0][0] = 1
# [[1,0,0], [0,0,0], [0,0,0]] — only first row

# 3. sort() returns None, not the list
result = [3, 1, 2].sort()   # None! not [1,2,3]
result = sorted([3, 1, 2]) # [1,2,3] ✓

Tuples — Ordered, Immutable

Like lists, but you can't change values once created. Faster than lists, use less memory, and can be dictionary keys.

Creating Tuples

# Different ways to create
empty = ()
single = (42,)              # MUST have trailing comma!
not_a_tuple = (42)          # just an int in parentheses
pair = (3, 4)
mixed = (1, "hello", 3.14)
from_list = tuple([1, 2, 3])

# Parentheses are optional (packing)
coords = 10, 20, 30        # same as (10, 20, 30)

Access & Slicing (Same as Lists)

t = (10, 20, 30, 40, 50)

t[0]         # 10
t[-1]        # 50
t[1:4]       # (20, 30, 40)  — slice returns a tuple
t[::-1]      # (50, 40, 30, 20, 10)

len(t)        # 5
min(t)        # 10
max(t)        # 50
sum(t)        # 150

Tuple Methods (Only 2!)

t = (1, 2, 3, 2, 1)

t.count(2)     # 2 — how many times 2 appears
t.index(3)     # 2 — index of first occurrence

# That's it! No append, remove, sort — tuples are immutable

Why Use Tuples Over Lists?

# 1. As dictionary keys (lists CAN'T do this)
locations = {}
locations[(40.7, -74.0)] = "New York"
locations[(51.5, -0.1)] = "London"

# 2. As set elements
points = {(0, 0), (1, 1), (2, 2)}

# 3. Function returning multiple values
def get_user():
    return "Yatin", 25     # returns a tuple

name, age = get_user()     # unpack

# 4. Protect data from accidental changes
DAYS = ("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")

# 5. Slightly faster & less memory than lists
import sys
sys.getsizeof([1, 2, 3])   # 120 bytes
sys.getsizeof((1, 2, 3))   # 64 bytes  — almost half!

Unpacking & Patterns

# Unpacking works same as lists
a, b, c = (1, 2, 3)
first, *rest = (1, 2, 3, 4)   # first=1, rest=[2,3,4]

# Swap values (this creates tuples under the hood)
x, y = 10, 20
x, y = y, x

# Tuple concatenation (creates NEW tuple)
a = (1, 2)
b = (3, 4)
c = a + b        # (1, 2, 3, 4)
d = a * 3        # (1, 2, 1, 2, 1, 2)

Named Tuples — Tuples with Names

from collections import namedtuple

# Create a "class-like" tuple
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)

print(p.x, p.y)     # 3 4  — access by name
print(p[0], p[1])   # 3 4  — still works by index

# Practical example
User = namedtuple("User", ["name", "age", "email"])
user = User("Yatin", 25, "y@dev.com")
print(user.name)     # "Yatin"

# Convert to dict
user._asdict()       # {'name': 'Yatin', 'age': 25, 'email': 'y@dev.com'}

# Create modified copy (tuples are immutable, so _replace returns new)
older = user._replace(age=26)
# User(name='Yatin', age=26, email='y@dev.com')

Tuple Gotchas

# 1. Single element needs comma!
t = (42,)     # tuple
t = (42)      # just int 42
type((42,))   # <class 'tuple'>
type((42))    # <class 'int'>

# 2. Immutable BUT can contain mutable objects
t = ([1, 2], [3, 4])
t[0].append(99)    # works! inner list is mutable
# ([1, 2, 99], [3, 4])
# t[0] = [5, 6]     # TypeError! can't reassign tuple element

Dictionaries — Key-Value Mappings

Store data as key-value pairs for fast O(1) lookup. The most-used data structure after lists. Keys must be hashable (immutable).

Creating Dictionaries

# Different ways to create
empty = {}
user = {"name": "Yatin", "age": 25}
from_tuples = dict([("a", 1), ("b", 2)])
from_kwargs = dict(name="Yatin", age=25)
from_keys = dict.fromkeys(["a", "b", "c"], 0)  # {'a': 0, 'b': 0, 'c': 0}
from_zip = dict(zip(["name", "age"], ["Yatin", 25]))

# Dict comprehension
squares = {n: n**2 for n in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Access & Lookup

user = {"name": "Yatin", "age": 25, "skills": ["Python"]}

# Direct access — raises KeyError if missing!
user["name"]                   # "Yatin"
# user["email"]               # KeyError!

# Safe access with .get()
user.get("email")              # None — no error
user.get("email", "N/A")      # "N/A" — custom default

# Check if key exists
"name" in user                # True
"email" in user               # False

Adding & Updating

user = {"name": "Yatin"}

# Add / update single key
user["age"] = 25               # add new key
user["name"] = "Yatin Dora"   # update existing

# Update multiple at once
user.update({"email": "y@dev.com", "age": 26})

# setdefault — add ONLY if key doesn't exist
user.setdefault("name", "Unknown")   # "Yatin Dora" — already exists, no change
user.setdefault("city", "NYC")       # "NYC" — added because key was missing

# Merge with | operator (3.9+)
a = {"x": 1, "y": 2}
b = {"y": 3, "z": 4}

merged = a | b     # {'x': 1, 'y': 3, 'z': 4}
# creates NEW dict — "y" exists in both, b's value (3) wins
# a and b are unchanged

a |= b             # modifies a IN-PLACE (like += for dicts)
# a is now {'x': 1, 'y': 3, 'z': 4}  — same as a.update(b)

# Key rule: when keys overlap, RIGHT side always wins
{"a": 1} | {"a": 99}   # {'a': 99}

# Before 3.9 — old ways to merge
merged = {**a, **b}          # unpacking (3.5+)
a.update(b)                  # in-place (like |=)

Removing Elements

user = {"name": "Yatin", "age": 25, "city": "NYC"}

# del — raises KeyError if missing
del user["city"]

# pop — remove & return value (with optional default)
age = user.pop("age")              # 25
email = user.pop("email", None)   # None — no error

# popitem — remove & return last inserted pair
user["a"] = 1
user["b"] = 2
user.popitem()    # ('b', 2)

# clear — remove everything
user.clear()      # {}

Iteration

user = {"name": "Yatin", "age": 25, "city": "NYC"}

# Loop through keys (default)
for key in user:
    print(key)          # name, age, city

# Loop through values
for val in user.values():
    print(val)          # Yatin, 25, NYC

# Loop through key-value pairs
for k, v in user.items():
    print(f"{k}: {v}")  # name: Yatin, age: 25, city: NYC

# Get all keys/values as lists
keys = list(user.keys())      # ['name', 'age', 'city']
vals = list(user.values())    # ['Yatin', 25, 'NYC']

Nested Dictionaries

users = {
    "user1": {
        "name": "Yatin",
        "skills": ["Python", "JS"],
        "address": {"city": "NYC", "zip": "10001"}
    },
    "user2": {
        "name": "Alice",
        "skills": ["Go", "Rust"],
        "address": {"city": "LA", "zip": "90001"}
    }
}

# Access nested data
users["user1"]["name"]                # "Yatin"
users["user1"]["skills"][0]           # "Python"
users["user1"]["address"]["city"]    # "NYC"

# Safe nested access (avoid KeyError chains)
city = users.get("user3", {}).get("address", {}).get("city", "Unknown")
# "Unknown"

defaultdict & Counter

from collections import defaultdict, Counter

# defaultdict — auto-creates missing keys with a default value
word_count = defaultdict(int)      # default = 0
for w in "hello world hello python hello".split():
    word_count[w] += 1
# {'hello': 3, 'world': 1, 'python': 1}

# Group items
groups = defaultdict(list)
for name, dept in [("Alice", "eng"), ("Bob", "hr"), ("Carol", "eng")]:
    groups[dept].append(name)
# {'eng': ['Alice', 'Carol'], 'hr': ['Bob']}

# Counter — count anything iterable
counts = Counter("mississippi")
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})

counts.most_common(2)             # [('s', 4), ('i', 4)]
counts["s"]                       # 4
counts["z"]                       # 0 — no KeyError!

# Counter arithmetic
a = Counter("aab")    # {'a': 2, 'b': 1}
b = Counter("abc")    # {'a': 1, 'b': 1, 'c': 1}
a + b                  # Counter({'a': 3, 'b': 2, 'c': 1})
a - b                  # Counter({'a': 1})

Dict Gotchas

# 1. Keys must be hashable (immutable)
d = {[1, 2]: "val"}    # TypeError! lists can't be keys
d = {(1, 2): "val"}    # OK — tuples can be keys

# 2. Don't modify dict size while iterating
d = {"a": 1, "b": 2, "c": 3}
# BAD: for k in d: del d[k]  — RuntimeError!

# GOOD: iterate over a copy
for k in list(d.keys()):
    if d[k] < 3:
        del d[k]

# 3. {} is an empty dict, NOT an empty set
type({})       # <class 'dict'>
type(set())    # <class 'set'>

Sets — Unordered, Unique

No duplicates allowed and no guaranteed order. Lightning-fast membership testing with O(1) lookup.

Creating Sets

empty = set()             # NOT {} — that's a dict!
nums = {1, 2, 3, 3}     # {1, 2, 3} — duplicate removed
from_list = set([1, 2, 2, 3])   # {1, 2, 3}
from_string = set("hello")      # {'h', 'e', 'l', 'o'}

# Set comprehension
evens = {n for n in range(10) if n % 2 == 0}
# {0, 2, 4, 6, 8}

Adding & Removing

s = {1, 2, 3}

s.add(4)             # {1, 2, 3, 4}
s.add(3)             # {1, 2, 3, 4} — no effect, already exists

s.update([5, 6])     # add multiple: {1, 2, 3, 4, 5, 6}

s.remove(6)           # raises KeyError if not found
s.discard(99)         # no error if not found — safer!

popped = s.pop()      # remove & return arbitrary element
s.clear()             # empty the set

Set Operations (Math)

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Union — everything from both
a | b               # {1, 2, 3, 4, 5, 6}
a.union(b)          # same thing

# Intersection — common elements
a & b               # {3, 4}
a.intersection(b)   # same thing

# Difference — in a but NOT in b
a - b               # {1, 2}
a.difference(b)     # same thing

# Symmetric difference — in either but NOT both
a ^ b               # {1, 2, 5, 6}

# Subset & superset checks
{1, 2} <= {1, 2, 3}      # True  — subset
{1, 2, 3} >= {1, 2}      # True  — superset
{1, 2}.isdisjoint({3, 4}) # True  — no common elements

Practical Uses

# Remove duplicates from a list
names = ["alice", "bob", "alice", "carol", "bob"]
unique = list(set(names))   # ['alice', 'bob', 'carol'] (order may vary)

# Remove duplicates preserving order (Python 3.7+)
unique_ordered = list(dict.fromkeys(names))
# ['alice', 'bob', 'carol'] — original order kept

# O(1) membership test — WAY faster than lists for large data
big_set = set(range(1_000_000))
999999 in big_set      # instant!
999999 in list(big_set)  # slow — has to check each element

# Find common elements between lists
list1 = [1, 2, 3, 4]
list2 = [3, 4, 5, 6]
common = list(set(list1) & set(list2))   # [3, 4]

Quick Comparison

# ┌──────────────┬─────────┬───────────┬─────────┬───────────┐
# │              │  List   │  Tuple    │  Dict   │   Set     │
# ├──────────────┼─────────┼───────────┼─────────┼───────────┤
# │ Syntax       │ [1,2,3] │ (1,2,3)   │ {k: v}  │ {1,2,3}   │
# │ Ordered      │ ✓       │ ✓         │ ✓ (3.7+)│ ✗         │
# │ Mutable      │ ✓       │ ✗         │ ✓       │ ✓         │
# │ Duplicates   │ ✓       │ ✓         │ keys ✗  │ ✗         │
# │ Index access │ ✓       │ ✓         │ by key  │ ✗         │
# │ Hashable     │ ✗       │ ✓         │ keys ✓  │ items ✓   │
# │ Use case     │ general │ fixed     │ lookup  │ unique    │
# │              │ storage │ data      │ mapping │ fast test │
# └──────────────┴─────────┴───────────┴─────────┴───────────┘

06 OOP & Classes

What is itPython's object-oriented system is based on classes as first-class objects created at runtime by a metaclass (default: type). Everything — integers, strings, functions, modules, even classes themselves — is an instance of some class. Python OOP supports encapsulation (by convention, not enforced), inheritance (single and multiple with C3 linearization MRO), and polymorphism (primarily via duck typing, not interfaces). The entire object model is exposed for introspection: you can list attributes, walk the MRO, and even replace methods at runtime.
Key features
  • Everything is an object: type(42) is int, type(int) is type, type(type) is type (self-referential).
  • Explicit self: The first parameter of instance methods is always self — explicit rather than implicit like Java's this.
  • No true private: _name is convention for "protected", __name triggers name-mangling (_ClassName__name). Nothing is actually hidden.
  • Class/static methods: @classmethod receives the class, @staticmethod receives nothing.
  • Properties: @property turns method calls into attribute access — no getters/setters boilerplate.
  • Duck typing: No interface declarations needed; if an object has .read(), it's "file-like".
  • Multiple inheritance: With C3 linearization (MRO) for method resolution.
How it differs
  • vs Java: No public/private/protected keywords, no interfaces (use ABCs or Protocols), no checked exceptions, supports multiple inheritance (Java only single + interfaces).
  • vs JavaScript: Python classes are real classes, not prototype sugar. Python supports multiple inheritance; JS only single via extends.
  • vs C++: No operator overloading via operator+ — Python uses dunder methods (__add__). No templates; everything is runtime.
  • vs Go: Go has no classes at all — just structs with methods and interfaces satisfied structurally. Python has classes; Go has composition only.
  • vs Ruby: Very similar philosophy. Ruby has true private, open classes (monkey-patching is idiomatic). Python can monkey-patch but it's frowned upon.
Why use itOOP in Python is best used for modeling domain entities (users, orders, HTTP requests), grouping state and behavior, and building polymorphic plugin systems. For simple data containers, prefer @dataclass or pydantic.BaseModel over raw classes. Python doesn't force OOP — many real codebases use a mix of functions, dataclasses, and occasional classes.
Common gotchas
  • Class vs instance attributes: Mutable class attributes (e.g., items = []) are shared across all instances — a famous bug.
  • Forgetting self: def method(x): inside a class is a common error — self must be first.
  • Overriding __init__ without super().__init__(): Breaks parent initialization.
  • Equality: Default == compares identity, not value. Override __eq__ (and then __hash__).
Real-world examplesDjango models (class User(Model)), SQLAlchemy ORM, Pydantic BaseModel, PyTorch nn.Module, Flask View classes, Celery Tasks, pytest fixtures via classes — OOP is the foundation of every major Python framework.

A class = blueprint. An object = instance created from that blueprint.

class Dog:
    # Class attribute — shared by ALL instances
    species = "Canis familiaris"

    def __init__(self, name, age):
        # Instance attributes — unique to each dog
        self.name = name
        self.age = age

    def bark(self):
        return f"{self.name} says Woof!"

    def human_years(self):
        return self.age * 7

rex = Dog("Rex", 5)
buddy = Dog("Buddy", 3)

print(rex.bark())          # Rex says Woof!
print(buddy.human_years()) # 21
print(rex.species)         # Canis familiaris (shared)

Class vs Instance vs Static Methods

Three ways to define methods — instance methods get self, class methods get cls, static methods get nothing. Each serves a different purpose.

Instance Method (Regular) — uses self

The default. Has access to the instance (self) and its data. Can also access class attributes through self.

class User:
    platform = "Web"              # class attribute

    def __init__(self, name):
        self.name = name           # instance attribute

    def greet(self):              # instance method
        return f"Hi, I'm {self.name} on {self.platform}"

u = User("Yatin")
u.greet()    # "Hi, I'm Yatin on Web"

# What actually happens when you call u.greet():
# Python converts it to → User.greet(u)
# That's why self is the first parameter!

Class Method — uses cls

Gets the class itself (not an instance). Can't access instance data (self). Mainly used as factory methods — alternative ways to create objects.

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    @classmethod
    def from_string(cls, data_str):
        # cls IS the User class — same as calling User(...)
        name, age = data_str.split("-")
        return cls(name, int(age))       # creates a new User

    @classmethod
    def from_dict(cls, data):
        return cls(data["name"], data["age"])

# Multiple ways to create a User
u1 = User("Yatin", 25)                          # normal
u2 = User.from_string("Alice-30")               # from string
u3 = User.from_dict({"name": "Bob", "age": 20}) # from dict

Real-World Examples

The core idea: your __init__ takes clean, final values. But real data comes in messy formats — JSON strings, CSV rows, env variables, database rows. Class methods handle the parsing and conversion, then call cls() to create the object. Same object, different input sources.

1. Python's own datetime uses classmethods! — You've already used these. They're all factory methods that create a datetime object from different inputs.

from datetime import datetime
now = datetime.now()                     # from system clock
today = datetime.today()                 # from today's date
d = datetime.fromtimestamp(1679500000)   # from unix timestamp
# All return a datetime object — just different ways to create one

2. Config loader — Your app needs config settings, but they come from different places depending on the environment. __init__ just stores the settings dict. Each classmethod knows how to read from one source.

class Config:
    def __init__(self, settings):
        self.settings = settings       # just stores clean data

    @classmethod
    def from_json(cls, filepath):      # reads a JSON file, parses it
        with open(filepath) as f:
            return cls(json.load(f))    # cls() calls __init__ with parsed dict

    @classmethod
    def from_env(cls):                 # reads from environment variables
        return cls({
            "debug": os.getenv("DEBUG", "false"),
            "db_url": os.getenv("DATABASE_URL"),
        })

    @classmethod
    def default(cls):                  # hardcoded defaults for dev
        return cls({"debug": "false", "db_url": "sqlite:///app.db"})

# Pick whichever fits the situation
config = Config.from_json("config.json")   # production — read from file
config = Config.from_env()                  # docker/cloud — read from env vars
config = Config.default()                   # development — use defaults

3. User from different data sources — APIs send JSON, files have CSV, databases return tuples. Each classmethod knows how to parse one format and extract name, email, age for __init__.

class User:
    def __init__(self, name, email, age):
        self.name = name               # __init__ only cares about
        self.email = email             # clean, final values
        self.age = age

    @classmethod
    def from_json(cls, json_str):      # API sends '{"name": "...", ...}'
        data = json.loads(json_str)    # parse JSON string → dict
        return cls(data["name"], data["email"], data["age"])

    @classmethod
    def from_csv_row(cls, row):        # CSV file has "Yatin,y@dev.com,25"
        name, email, age = row.split(",")
        return cls(name, email, int(age)) # convert age string → int

    @classmethod
    def from_db_row(cls, row):         # DB returns (id, name, email, age)
        return cls(row[1], row[2], row[3])  # skip id at index 0

# All create the same User object — parsing logic stays in the classmethod
# __init__ stays clean and simple

Why cls Instead of Just Writing the Class Name?

Because cls is dynamic — it becomes whatever class called the method. Hardcoding the class name breaks inheritance.

# ❌ Hardcoded class name — breaks for subclasses
class Animal:
    def __init__(self, name):
        self.name = name

    @classmethod
    def create(cls, name):
        return Animal(name)       # hardcoded!

class Dog(Animal):
    def bark(self): return "Woof!"

d = Dog.create("Buddy")
type(d)       # <class 'Animal'> — NOT a Dog!
d.bark()      # AttributeError! Animal has no bark()
# ✓ Using cls — works for ALL subclasses
class Animal:
    def __init__(self, name):
        self.name = name

    @classmethod
    def create(cls, name):
        return cls(name)          # cls = whatever class calls it

class Dog(Animal):
    def bark(self): return "Woof!"

d = Dog.create("Buddy")      # cls = Dog → Dog("Buddy")
type(d)       # <class 'Dog'> ✓
d.bark()      # "Woof!" ✓

a = Animal.create("Cat")    # cls = Animal → Animal("Cat")
type(a)       # <class 'Animal'> ✓

self vs cls — What's the Difference?

self is a specific object (instance). cls is the class itself (the blueprint). They operate at different levels.

class Car:
    total_cars = 0                # class-level data (shared by all)

    def __init__(self, brand):
        self.brand = brand        # instance-level data (unique to each)
        Car.total_cars += 1

    # Instance method — works with ONE specific car
    def describe(self):
        return f"I am a {self.brand}"
        # self = the specific car object
        # self.brand = THIS car's brand

    # Class method — works with the Car CLASS itself
    @classmethod
    def get_count(cls):
        return f"{cls.total_cars} {cls.__name__}s made"
        # cls = the Car class (not any specific car)
        # cls.total_cars = class-level data

    @classmethod
    def create_default(cls):
        return cls("Unknown")      # cls() creates a new instance
        # self CAN'T do this — self is already an instance

c1 = Car("BMW")
c2 = Car("Tesla")

c1.describe()        # "I am a BMW"    — self = c1
c2.describe()        # "I am a Tesla"  — self = c2
Car.get_count()      # "2 Cars made"   — cls = Car

# ┌──────────────┬─────────────────────┬────────────────────────┐
# │              │ self                │ cls                    │
# ├──────────────┼─────────────────────┼────────────────────────┤
# │ What is it?  │ ONE specific object │ The class (blueprint)  │
# │ Example      │ self = my BMW       │ cls = Car              │
# │ Access       │ self.brand (unique) │ cls.total_cars (shared)│
# │ Can create?  │ No (already exists) │ Yes — cls() makes new  │
# │ Used in      │ instance methods    │ @classmethod           │
# └──────────────┴─────────────────────┴────────────────────────┘

Static Method — no self, no cls

Gets nothing — can't access instance or class data. It's just a regular function that lives inside the class for organization.

class MathUtils:
    @staticmethod
    def add(a, b):
        return a + b

    @staticmethod
    def is_even(n):
        return n % 2 == 0

# Call without creating an instance
MathUtils.add(3, 4)      # 7
MathUtils.is_even(5)      # False

# Also works on instances (but no point)
m = MathUtils()
m.add(3, 4)               # 7 — same thing

All Three Together

class Pizza:
    base_price = 10

    def __init__(self, toppings):
        self.toppings = toppings

    # Instance method — needs self (instance data)
    def price(self):
        return self.base_price + len(self.toppings) * 2

    # Class method — factory, alternative constructor
    @classmethod
    def margherita(cls):
        return cls(["mozzarella", "tomato"])

    # Static method — utility, no access to self or cls
    @staticmethod
    def is_valid_topping(topping):
        return topping in ["cheese", "tomato", "mushroom"]

p = Pizza.margherita()                     # classmethod — factory
print(p.price())                           # 14 — instance method
print(Pizza.is_valid_topping("cheese"))    # True — static method

Quick Comparison

# ┌────────────────┬───────────────┬────────────────┬───────────────┐
# │                │ Instance      │ @classmethod   │ @staticmethod │
# ├────────────────┼───────────────┼────────────────┼───────────────┤
# │ First param    │ self          │ cls            │ nothing       │
# │ Access instance│ ✓             │ ✗              │ ✗             │
# │ Access class   │ ✓ (via self)  │ ✓ (via cls)    │ ✗             │
# │ Called on      │ instance      │ class or inst  │ class or inst │
# │ Use case       │ work with     │ factory /      │ utility       │
# │                │ object data   │ alt constructor│ function      │
# └────────────────┴───────────────┴────────────────┴───────────────┘

07 __init__ & self — Complete Explanation

What are they__init__ is Python's initializer (not a constructor!) — it's called after the object is already created by __new__. Its job is to set up instance attributes. self is simply the first positional parameter of every instance method; Python automatically passes the current instance as that argument when you call obj.method(). The name "self" is a strong convention — you could technically name it anything, but don't.
The two-step creation process
  • Step 1 — __new__(cls, ...): The real constructor. Allocates memory and returns a new instance. Almost never overridden except for singletons, immutable types, or metaclass magic.
  • Step 2 — __init__(self, ...): The initializer. Receives the freshly-created instance and populates attributes like self.name = name. Returns None (never return a value).
  • Why two steps: Immutable types like int, tuple, and str need to lock in values in __new__ because attributes can't be set afterward.
How it differs
  • vs Java/C++ constructors: Java's Foo() both allocates and initializes. Python splits it. Also Java constructors can be overloaded by signature; Python can't (use default args or classmethods instead).
  • vs JavaScript: JS constructor() is analogous but implicit — no need to name it explicitly in every call. JS has no self parameter; it uses implicit this.
  • vs Go: Go has no constructors at all — convention is NewFoo(...) functions returning structs. No self; Go uses an explicit receiver (f *Foo) in method signatures, which is spiritually identical to Python's self.
  • vs Ruby: Ruby's initializer is initialize, invoked by Foo.new. Ruby uses self implicitly (no self. prefix needed most of the time).
Why explicit selfGuido van Rossum wrote a famous blog post defending it: explicit self makes method definition and call-site behavior consistent, allows decorators and descriptors to work cleanly, and makes the scope of attribute lookup unambiguous (self.x vs x). Most importantly, it makes cls.method(instance) and instance.method() trivially equivalent.
Common gotchas
  • Forgetting self: def greet(): inside a class causes TypeError: greet() takes 0 positional arguments but 1 was given.
  • Returning from __init__: return "something" raises TypeError: __init__() should return None.
  • Shadowing self: self = something_else inside a method only rebinds the local name; the original object is untouched.
  • Calling parent's init: Must use super().__init__(...) explicitly — it's not automatic.
Real-world examplesEvery Django model's __init__ accepts **kwargs to populate fields. Pydantic's BaseModel auto-generates an __init__ from type hints. dataclasses generate __init__, __repr__, and __eq__ via the @dataclass decorator — eliminating boilerplate for 90% of cases.
This is the most-asked question in Python. Let's kill it once and for all.

What is __init__?

__init__ is the initializer (NOT the constructor). It's called automatically after the object is created. Its job: set up initial state.

# WITHOUT __init__ — blank, fragile objects
class Dog:
    pass

d = Dog()
d.name = "Rex"     # manually set every time
d.age = 5          # easy to forget one → bugs

# WITH __init__ — guaranteed setup
class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age

d = Dog("Rex", 5)  # one line, always has both attributes

What is self?

self is a reference to the specific instance calling the method. When you call rex.bark(), Python actually does Dog.bark(rex). So self IS rex.

class Demo:
    def __init__(self, name):
        # self.name = the OBJECT's attribute
        # name       = the PARAMETER passed in
        self.name = name
        print(f"I am object at {id(self)}")

d = Demo("test")
print(f"d is at {id(d)}")
# Both print the SAME address — self IS the object

The Full Lifecycle: __new__ vs __init__

__new__ creates the object (allocates memory), then __init__ sets it up. Most of the time you only need __init__.

class Dog:
    def __new__(cls, *args, **kwargs):
        print("1. __new__: CREATING object (allocating memory)")
        instance = super().__new__(cls)
        return instance

    def __init__(self, name):
        print("2. __init__: INITIALIZING object (setting up data)")
        self.name = name

d = Dog("Rex")
# 1. __new__: CREATING object (allocating memory)
# 2. __init__: INITIALIZING object (setting up data)

# So Dog("Rex") actually does:
# 1. obj = Dog.__new__(Dog, "Rex")     ← blank object
# 2. Dog.__init__(obj, "Rex")           ← set attributes
# 3. return obj

Real-World __init__

How __init__ is used in production — validating inputs, storing config, and setting up initial state.

class DatabaseConnection:
    def __init__(self, host, port, db, timeout=30):
        # Validate
        if not host:
            raise ValueError("Host required")

        # Store config
        self.host = host
        self.port = port
        self.db = db
        self.timeout = timeout

        # Set up initial state
        self.is_connected = False
        self._connection = None    # _ = private by convention
        self.query_count = 0

    def connect(self):
        print(f"Connecting to {self.host}:{self.port}/{self.db}")
        self.is_connected = True

08 Magic / Dunder Methods

What is it"Dunder" (double-underscore) methods, also called magic methods or special methods, are hooks Python calls automatically when your object is used with built-in operators, functions, or language constructs. They're the engine behind Python's operator overloading and protocol-based polymorphism. Implementing __len__ makes len(obj) work; implementing __iter__ makes your object usable in for x in obj:; implementing __add__ makes obj + other work. They're how you make user-defined classes feel like built-in types.
Key categories
  • Lifecycle: __new__, __init__, __del__, __init_subclass__.
  • Representation: __str__ (human-readable, for print), __repr__ (unambiguous, for REPL and debugging), __format__, __bytes__.
  • Comparison: __eq__, __lt__, __le__, __gt__, __ge__, __hash__.
  • Arithmetic: __add__, __sub__, __mul__, __truediv__, __floordiv__, __mod__, __pow__, plus reversed (__radd__) and in-place (__iadd__) variants.
  • Container: __len__, __getitem__, __setitem__, __delitem__, __contains__, __iter__, __next__.
  • Callable: __call__ — makes instances callable like functions.
  • Context manager: __enter__, __exit__.
  • Attribute access: __getattr__, __setattr__, __getattribute__, __delattr__.
How it differs
  • vs C++/C#: C++ has operator+, operator==, etc. as language keywords. Python's approach is more general — the protocols are just methods, so a third-party class can fully impersonate a builtin.
  • vs Java: Java has NO operator overloading (except + for strings). Equivalents like .equals(), .hashCode(), .compareTo(), .toString() are interface methods with no operator sugar.
  • vs JavaScript: JS has almost no operator overloading — no way to define a + b for custom objects except by coercion via Symbol.toPrimitive.
  • vs Ruby: Ruby also allows defining +, [], etc. as methods but uses cleaner names (+, <=>). Ruby's to_s ≈ Python's __str__.
  • vs Go: Go has NO operator overloading at all — types support only the operators the language knows about.
Why use itDunder methods let you build domain types that feel native — a Vector class you can add with +, a Matrix you can index with m[i,j], a Money type you can compare with <. They're also the glue for Python's protocols: the "iterator protocol", "context manager protocol", "sequence protocol" are all just "implement these dunders".
Common gotchas
  • __eq__ without __hash__: Overriding __eq__ sets __hash__ to None, making the class unhashable.
  • __repr__ should be unambiguous: Ideally valid Python that recreates the object.
  • __del__ is unreliable: Not guaranteed to run (circular refs, interpreter shutdown). Use context managers instead.
  • __getattr__ vs __getattribute__: The former is only called when normal lookup fails; the latter is called for every attribute access (dangerous).
Real-world examplesNumPy arrays implement nearly every arithmetic dunder for element-wise ops. pathlib.Path uses __truediv__ so you can write Path("/") / "tmp" / "file". SQLAlchemy overloads __eq__, __lt__ on columns to build SQL WHERE clauses. collections.OrderedDict customizes iteration dunders.

Double-underscore methods let your objects work with Python's operators and built-in functions.

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):         # for developers (debugging)
        return f"Vector({self.x}, {self.y})"

    def __str__(self):          # for users (print)
        return f"({self.x}, {self.y})"

    def __add__(self, other):   # + operator
        return Vector(self.x + other.x, self.y + other.y)

    def __eq__(self, other):    # == operator
        return self.x == other.x and self.y == other.y

    def __abs__(self):          # abs() function
        return (self.x**2 + self.y**2)**0.5

    def __len__(self):          # len() function
        return int(abs(self))

    def __getitem__(self, i):   # v[0], v[1]
        return (self.x, self.y)[i]

    def __iter__(self):         # for x in v, unpacking
        yield self.x
        yield self.y

    def __call__(self, scalar): # v(3) — make it callable
        return Vector(self.x * scalar, self.y * scalar)

v1 = Vector(3, 4)
v2 = Vector(1, 2)

print(v1)           # (3, 4)         __str__
print(repr(v1))     # Vector(3, 4)   __repr__
print(v1 + v2)      # (4, 6)         __add__
print(abs(v1))      # 5.0            __abs__
print(v1[0])        # 3              __getitem__
print(v1(3))        # (9, 12)        __call__
x, y = v1            # unpacking via __iter__

Reference Table

Quick lookup of all common dunder methods, grouped by what they do and what triggers them.

CategoryMethodsTriggered by
Lifecycle__new__, __init__, __del__Object creation/destruction
String__str__, __repr__, __format__str(), repr(), f-strings
Compare__eq__, __lt__, __gt__, __le__, __ge__==, <, >
Math__add__, __sub__, __mul__, __truediv__+ - * /
Container__len__, __getitem__, __setitem__, __contains__len(), [], in
Iteration__iter__, __next__for loops
Context__enter__, __exit__with statement
Callable__call__obj()
Hash__hash__hash(), dict keys

09 Inheritance & MRO

What is itPython supports multiple inheritance — a class can extend any number of parent classes. When an attribute or method is looked up on an instance, Python walks the class hierarchy in a specific order called the Method Resolution Order (MRO), computed using the C3 linearization algorithm. C3 guarantees that (a) subclasses come before parents, (b) parent order from the class declaration is preserved, and (c) any shared ancestor appears only once. You can inspect the MRO with ClassName.__mro__ or ClassName.mro().
Key features
  • Single inheritance: class Dog(Animal): — the common case.
  • Multiple inheritance: class SmartPhone(Phone, Camera, Computer): — parents listed left-to-right.
  • super(): Calls the "next class in the MRO" — not necessarily the direct parent. Essential for cooperative multiple inheritance.
  • Mixins: Small classes providing specific functionality, meant to be combined (e.g., Django's LoginRequiredMixin).
  • Abstract Base Classes (ABCs): from abc import ABC, abstractmethod — enforces subclasses to implement certain methods.
  • isinstance/issubclass: Runtime type checks that respect the inheritance hierarchy.
How it differs
  • vs Java/C#: Java forbids multiple inheritance of classes — you must use interfaces (or default methods). Python allows it but with C3 MRO to avoid the "diamond problem".
  • vs C++: C++ allows multiple inheritance but leaves diamond problem to the programmer (via virtual base classes). Python solves it automatically.
  • vs JavaScript: JS only supports single prototype chain inheritance. Mixins must be simulated by copying properties.
  • vs Go: Go has no inheritance at all — only composition via embedded structs. Polymorphism comes from interfaces.
  • vs Ruby: Ruby has single inheritance + modules (mixins via include). Very similar in spirit to Python's multiple inheritance.
Why use itInheritance is best for modeling "is-a" relationships and sharing implementation. Mixins are the idiomatic way to compose behaviors (e.g., JsonSerializableMixin, TimestampMixin). ABCs enforce contracts. But the modern Python wisdom is "composition over inheritance" — deep hierarchies become hard to reason about.
Common gotchas
  • Diamond problem: If D(B, C) and both B and C inherit from A, MRO ensures A is visited once. But all classes must call super().__init__() for this to work.
  • Forgetting super(): Breaks the MRO chain — later parents' __init__ methods never run.
  • MRO conflicts: Some multi-inheritance layouts are impossible to linearize and raise TypeError at class creation.
  • Calling ParentClass.method(self, ...): Bypasses MRO — almost always wrong in multi-inheritance.
Real-world examplesDjango's class-based views use heavy mixin composition (LoginRequiredMixin, PermissionRequiredMixin, ListView). Flask-RESTful resources subclass Resource. SQLAlchemy models inherit from a declarative_base(). PyTorch layers subclass nn.Module — all leveraging inheritance and MRO.
class Animal:
    def __init__(self, name):
        self.name = name

    def speak(self):
        raise NotImplementedError

    def describe(self):
        return f"I am {self.name}, an animal"

class Dog(Animal):
    def speak(self):
        return f"{self.name}: Woof!"

    def describe(self):
        return f"I am {self.name}, a dog"

class Cat(Animal):
    def speak(self):
        return f"{self.name}: Meow!"

# Polymorphism
for a in [Dog("Rex"), Cat("Whiskers")]:
    print(a.speak())

# ── super() ──
# Animal (grandparent) → Dog (parent) → GuideDog (child)
class GuideDog(Dog):
    def __init__(self, name, owner):
        super().__init__(name)   # call parent (Dog) __init__
        self.owner = owner

    def speak(self):
        # call parent (Dog) method
        parent_sound = super().speak()          # "Buddy: Woof!"
        # super().describe() → calls Dog.describe() (parent, one step up MRO)
        parent_desc = super().describe()           # "I am Buddy, a dog"
        # Animal.describe(self) → directly calls grandparent, skips MRO
        grandparent_desc = Animal.describe(self)    # "I am Buddy, an animal"
        return f"{parent_sound} | {parent_desc} | {grandparent_desc}"

    def info(self):
        # call grandparent (Animal) attribute — self.name set by Animal.__init__
        # works because: GuideDog.__init__ → super (Dog) → Dog inherits Animal.__init__ → sets self.name
        return f"Guide dog {self.name} belongs to {self.owner}"

g = GuideDog("Buddy", "Alice")
print(g.speak())  # "Buddy: Woof! | I am Buddy, a dog | I am Buddy, an animal"
print(g.info())   # "Guide dog Buddy belongs to Alice"

# To explicitly call grandparent skipping parent:
# Animal.speak(self)  — works but breaks MRO, avoid this
# super() always follows MRO chain: GuideDog → Dog → Animal → object

# ── MRO (Method Resolution Order) ──
# MRO is the order Python searches for a method when you call it.
# When you call obj.method(), Python checks classes in MRO order
# and uses the FIRST match it finds. super() follows this same order.
#
# Single inheritance: simple chain
#   GuideDog → Dog → Animal → object
#   g.speak() → checks GuideDog first ✓ found, stop
#   g.name   → GuideDog? ✗ → Dog? ✗ → Animal.__init__ set it ✓
#
# Multiple inheritance: diamond problem
# Python uses C3 linearization to decide the order.
# Rules: 1) child before parent  2) left parent before right parent

class A:
    def greet(self): return "A"
class B(A):
    def greet(self): return "B"
class C(A):
    def greet(self): return "C"
class D(B, C):   # Diamond!  D inherits B and C, both inherit A
    pass

#       A          MRO: D → B → C → A → object
#      / \         D.greet() → checks D? ✗ → B? ✓ "B" (stop)
#     B   C        super() in B.greet() would go to C (not A!)
#      \ /         because MRO says C comes next after B
#       D

print(D().greet())  # "B"  — first match in MRO
print(D.__mro__)    # D → B → C → A → object

# isinstance / issubclass
print(isinstance(Dog("Rex"), Animal))  # True
print(issubclass(Dog, Animal))         # True

Properties

Use @property to control attribute access — add validation, computed values, or read-only fields without changing the API.

Why do we need it? Without @property, anyone can set any value directly — even invalid ones like t.celsius = -500. You could write getter/setter methods like get_celsius() / set_celsius(), but then every user of your class has to change from t.celsius to t.get_celsius(). @property lets you add validation/logic behind the scenes while keeping the clean t.celsius syntax.

# ❌ Without @property — no control over what gets set
class Temperature:
    def __init__(self, celsius):
        self.celsius = celsius

t = Temperature(100)
t.celsius = -500      # no error! but -500°C is physically impossible
print(t.celsius)       # -500  — bad data, no validation

# ✅ With @property — setter runs validation automatically
class Temperature:
    def __init__(self, celsius):
        self._celsius = celsius       # _celsius = private storage (convention)

    @property                       # GETTER: runs when you READ t.celsius
    def celsius(self):
        return self._celsius

    @celsius.setter                 # SETTER: runs when you WRITE t.celsius = value
    def celsius(self, value):
        if value < -273.15:
            raise ValueError("Below absolute zero!")
        self._celsius = value

    @property                       # read-only — no setter, so can't set t.fahrenheit = x
    def fahrenheit(self):
        return self._celsius * 9/5 + 32  # computed on the fly from _celsius

t = Temperature(100)
print(t.celsius)      # 100      — calls @property getter
print(t.fahrenheit)   # 212.0    — computed, read-only
t.celsius = 0         # calls @celsius.setter → validates → sets _celsius = 0
print(t.fahrenheit)   # 32.0     — auto-recomputed
t.celsius = -300      # ValueError: Below absolute zero!
t.fahrenheit = 50     # AttributeError: can't set — no setter defined

# The magic: caller uses simple t.celsius syntax
# but behind the scenes Python calls your getter/setter methods
# t.celsius      →  Temperature.celsius.fget(t)   (getter)
# t.celsius = 0  →  Temperature.celsius.fset(t, 0) (setter)

Private Variables

Python has no true private variables like Java/C++. Everything is accessible. But there are two conventions:

class BankAccount:
    def __init__(self, balance):
        self.name = "Savings"       # public — anyone can read/write
        self._balance = balance       # _ = "protected" — convention: don't touch from outside
        self.__pin = 1234            # __ = "private" — name mangling (Python renames it)

acc = BankAccount(1000)

# public — works fine
print(acc.name)          # "Savings"

# _ single underscore — works but "please don't"
print(acc._balance)      # 1000  — still accessible! just a convention

# __ double underscore — name mangling
print(acc.__pin)         # AttributeError: no attribute '__pin'
print(acc._BankAccount__pin)  # 1234  — Python renamed it to _ClassName__var
                               # so it's STILL accessible, just harder to find

# Summary:
#   name      → public        anyone can use
#   _name     → protected     "don't touch" convention, still accessible
#   __name    → name-mangled  Python renames to _ClassName__name, still accessible
#
# Bottom line: Python trusts developers — "we're all adults here"
# Use _ for internal attrs, __ only to avoid name clashes in subclasses

10 Error Handling

What is itPython uses exceptions as its primary error-handling mechanism, built around the try / except / else / finally construct. Exceptions are objects derived from BaseException, with the common ancestor Exception for all user-catchable errors. Python embraces a philosophy known as EAFP — "Easier to Ask Forgiveness than Permission" — attempt the operation and handle failure, instead of checking preconditions first (LBYL — "Look Before You Leap"). This makes Python code often shorter and race-condition-free.
Key features
  • try/except: Catch one or more exception types. Use except (TypeError, ValueError) as e: for multiple.
  • else: Runs only if no exception was raised — useful for narrowing the try block.
  • finally: Always runs — for cleanup (file handles, locks, connections).
  • raise: Throws an exception. raise ValueError("msg") or raise alone to re-raise.
  • Exception chaining: raise NewError from original_err preserves the cause in tracebacks.
  • Custom exceptions: Subclass Exception: class AppError(Exception): pass.
  • except*: Python 3.11+ — catches individual exceptions inside an ExceptionGroup (for async code).
How it differs
  • vs Java: Java has checked exceptions — compiler forces you to declare or catch them. Python has no checked exceptions; any function can raise anything.
  • vs Go: Go uses error return values, not exceptions — if err != nil is idiomatic. Python exceptions give cleaner happy-path code but hide control flow.
  • vs Rust: Rust uses Result<T, E> and the ? operator. Forces explicit handling at every call site.
  • vs JavaScript: Very similar try/catch. JS has no else clause. Async errors need try/catch inside async functions or .catch() on promises.
  • vs C++: Similar exception model but C++ discourages exceptions in performance-sensitive code. No finally — use RAII instead.
Why use itExceptions separate happy-path logic from error-handling logic, making normal code cleaner. They propagate automatically up the call stack, so deeply nested callers don't need explicit error returns. Python's EAFP style also sidesteps TOCTOU (time-of-check to time-of-use) bugs that plague LBYL code.
Common gotchas
  • Bare except:: Catches everything including KeyboardInterrupt and SystemExit. Use except Exception: instead.
  • Catching too broadly: except Exception: can swallow bugs. Catch the specific type.
  • Losing the original traceback: When re-raising, use raise NewError from e to preserve chain.
  • Resource leaks: Without finally or with, exceptions can leak file/socket/lock resources.
  • except order: List specific exceptions before general ones — Python checks top-to-bottom.
Real-world examplesFlask/FastAPI catch framework exceptions and return HTTP error responses. SQLAlchemy raises IntegrityError, OperationalError. requests raises ConnectionError, Timeout, HTTPError. Custom domain exceptions (PaymentFailed, InvalidTokenError) form the error vocabulary of any mature Python app.
# ── try / except / else / finally ──
try:
    result = 10 / 0
except ZeroDivisionError as e:
    print(f"Error: {e}")
except (TypeError, ValueError):
    print("Type or value error")
except Exception as e:       # catch-all (use sparingly)
    print(f"Unknown: {e}")
else:                          # runs ONLY if no exception
    print("Success!")
finally:                       # runs ALWAYS
    print("Cleanup")

# ── raise vs throw ──
# "raise" IS Python's "throw" — same concept, different keyword
# Python: raise    |   Java/C#/JS/C++: throw
#
# What raise does:
# 1. Creates an exception object
# 2. Immediately stops the current function
# 3. Walks UP the call stack looking for a matching except
# 4. If nothing catches it → program crashes with traceback

# ── Raising ──
def validate_age(age):
    if age < 0:
        raise ValueError(f"Age can't be negative: {age}")

# raise separates DETECTING an error from HANDLING it
def divide(a, b):
    if b == 0:
        raise ZeroDivisionError("can't divide by zero")  # stops HERE
    return a / b          # never reached if b == 0

# Caller decides what to do with the error
try:
    divide(10, 0)
except ZeroDivisionError as e:
    print(e)              # can't divide by zero

# Three forms of raise:
raise ValueError("msg")          # raise a new exception
raise                             # re-raise current exception (inside except)
raise TypeError("x") from err    # chain: new exception caused by original

# ── Custom exceptions ──
class InsufficientFunds(Exception):
    def __init__(self, balance, amount):
        self.balance = balance
        self.amount = amount
        super().__init__(f"Need {amount}, have {balance}")

class BankAccount:
    def __init__(self, balance):
        self.balance = balance

    def withdraw(self, amount):
        if amount > self.balance:
            raise InsufficientFunds(self.balance, amount)
        self.balance -= amount

# ── Using the custom exception ──
acct = BankAccount(100)
try:
    acct.withdraw(250)
except InsufficientFunds as e:
    print(e)                 # Need 250, have 100
    print(e.balance)          # 100  ← access custom attrs
    print(e.amount)           # 250

# ── Exception hierarchies for your app ──
class AppError(Exception):
    """Base for all app errors — catch this to handle any app error."""

class NotFoundError(AppError):
    pass

class PermissionError(AppError):
    pass

try:
    raise NotFoundError("User 42 missing")
except AppError as e:       # catches NotFoundError AND PermissionError
    print(e)

# ── Re-raising & chaining ──
try:
    int("abc")
except ValueError as e:
    raise                     # re-raise same exception

try:
    int("abc")
except ValueError as original:
    raise RuntimeError("parse failed") from original  # chain: keeps both tracebacks

# ── Practical pattern: retry with fallback ──
for attempt in range(3):
    try:
        result = risky_operation()
        break                 # success — exit loop
    except ConnectionError:
        if attempt == 2:
            raise             # last attempt — give up
        print(f"Retry {attempt + 1}...")

Exception Hierarchy

# BaseException
#  ├── SystemExit, KeyboardInterrupt
#  └── Exception              ← catch THIS, not BaseException
#       ├── ValueError
#       ├── TypeError
#       ├── KeyError
#       ├── IndexError
#       ├── FileNotFoundError
#       ├── ZeroDivisionError
#       ├── AttributeError
#       └── StopIteration

11 File Handling

What is itPython exposes file I/O through the built-in open() function, which returns a file object that implements the iterator protocol and the context manager protocol. The idiomatic form is always with open(path) as f: — the with statement guarantees the file is closed even if an exception occurs. The io module underlies everything: TextIOWrapper, BufferedReader, BufferedWriter, and raw FileIO. Modern Python also provides pathlib.Path for object-oriented path manipulation that's more pleasant than os.path string-juggling.
Key features
  • Modes: 'r' (read), 'w' (write, truncate), 'a' (append), 'x' (exclusive create), 'r+' (read/write), plus 'b' for binary and 't' for text (default).
  • Text vs binary: Text mode returns str and handles encoding (default UTF-8 since 3.15). Binary mode returns bytes.
  • Encoding: open(path, encoding="utf-8") — always specify explicitly for portability.
  • Iteration: for line in f: streams line-by-line without loading the whole file.
  • pathlib: Path("file.txt").read_text() / .write_text() / .read_bytes() — one-liner helpers.
  • Seeking: f.seek(offset) / f.tell() for random access.
How it differs
  • vs Java: Java's old FileReader/BufferedReader is verbose. Java 7+ added try-with-resources (similar to Python's with). Java NIO Files.readString(path) is closest to Python's read_text().
  • vs Go: Go uses os.Open, returns *os.File and error. Requires defer f.Close() — no built-in context manager.
  • vs JavaScript (Node.js): Node uses fs.readFile (async) or fs.readFileSync. No context manager; callbacks or async/await.
  • vs C: C's fopen/fclose is manual, error-prone, and encoding-ignorant. Python abstracts it all away.
  • vs Rust: Rust's File + BufReader is explicit about buffering and encoding. Similar RAII close behavior but via Drop, not context managers.
Why use itFile I/O is needed for reading configs, writing logs, processing CSVs/JSON, generating reports, parsing large datasets. The streaming iterator protocol makes Python excellent for processing files larger than RAM — just iterate line-by-line.
Common gotchas
  • Forgetting with: Leaks file descriptors. Always use with.
  • Encoding drift: On Windows, default encoding was cp1252 pre-3.15 — caused silent corruption. Always pass encoding="utf-8".
  • Reading huge files with .read(): Loads entire file into memory. Use iteration for large files.
  • Writing without newline: f.write("line") does NOT add \n. You need f.write("line\n") or use print(..., file=f).
  • Binary on Windows: Text mode translates \r\n\n. Use binary mode for exact bytes.
Real-world examplesConfig loaders (yaml.safe_load(f)), log file tailers, CSV importers (csv.DictReader(f)), JSON API dumps (json.dump(data, f)), pandas read_csv (uses C-level file I/O underneath), Jupyter notebook .ipynb serialization — all built on Python's file API.
# ALWAYS use `with` — auto-closes, even on exception
with open("data.txt", "w") as f:
    f.write("Hello\nWorld\n")

with open("data.txt") as f:
    content = f.read()           # entire file

with open("data.txt") as f:
    for line in f:              # line by line (memory efficient)
        print(line.strip())

# Modes: "r" read, "w" write (overwrite!), "a" append
#        "x" create (fail if exists), "b" binary, "+" read+write

# ── Production: use pathlib (modern, cross-platform) ──
from pathlib import Path

config_dir = Path("config")
config_dir.mkdir(parents=True, exist_ok=True)  # create dirs safely

file = config_dir / "app.txt"     # build paths with /  (not string concat!)
file.write_text("hello")            # write + auto-close
content = file.read_text()          # read  + auto-close

file.exists()                        # True
file.name                            # "app.txt"
file.suffix                          # ".txt"
file.stem                            # "app"
file.parent                          # Path("config")
file.resolve()                       # absolute path

# Iterate files in a directory
for p in Path(".").glob("*.py"):     # all .py in current dir
    print(p.name)
for p in Path(".").rglob("*.py"):    # recursive — all subdirs too
    print(p)

# ── Production: handle errors gracefully ──
from pathlib import Path

def read_config(path: str) -> dict:
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"Config missing: {p}")
    return json.loads(p.read_text())

# Or catch errors at call site
try:
    data = Path("settings.json").read_text()
except FileNotFoundError:
    data = "{}"                   # fallback to empty config
except PermissionError:
    print("No read access!")

# ── Production: safe writes (atomic — no half-written files) ──
import tempfile, os

def safe_write(path: str, content: str):
    """Write to temp file first, then rename — atomic on most OS."""
    p = Path(path)
    tmp = p.with_suffix(".tmp")
    tmp.write_text(content)
    tmp.rename(p)                    # atomic replace — no corruption

# ── Production: encoding matters ──
# ALWAYS specify encoding — default varies by OS!
with open("data.txt", "w", encoding="utf-8") as f:
    f.write("café ☕")

Path("data.txt").read_text(encoding="utf-8")

# ── JSON ──
import json
with open("data.json", "w") as f:
    json.dump({"name": "Yatin"}, f, indent=2)

with open("data.json") as f:
    data = json.load(f)

# Production: validate JSON & handle bad data
def load_json_safe(path: str, default=None) -> dict:
    try:
        return json.loads(Path(path).read_text(encoding="utf-8"))
    except (FileNotFoundError, json.JSONDecodeError):
        return default or {}

# json.dumps / json.loads — work with strings (no file)
text = json.dumps({"age": 25})     # dict → string
obj  = json.loads(text)              # string → dict

# ── CSV ──
import csv

# Reading
with open("data.csv", encoding="utf-8") as f:
    for row in csv.DictReader(f):   # each row is a dict
        print(row["name"])

# Writing
users = [{"name": "Yatin", "age": 25}, {"name": "Bob", "age": 30}]
with open("users.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "age"])
    writer.writeheader()             # name,age
    writer.writerows(users)          # write all rows at once

# ── Large files: stream instead of loading all into memory ──
def count_errors(log_path: str) -> int:
    """Process a 10GB log file without using 10GB of RAM."""
    errors = 0
    with open(log_path, encoding="utf-8") as f:
        for line in f:            # yields one line at a time
            if "ERROR" in line:
                errors += 1
    return errors

12 Modules, Packages & Project Structure

What is itA module is any .py file — importing it runs the file top-to-bottom and binds the resulting namespace to a name. A package is a directory containing an __init__.py (marker file, empty is fine) plus one or more modules or subpackages. Python also supports namespace packages (no __init__.py, introduced in 3.3, PEP 420) that can be split across multiple directories. Imports are resolved via sys.path, which Python populates from the script's directory, PYTHONPATH env var, and installed site-packages.
Key features
  • Absolute imports: from myapp.utils import helper — preferred style.
  • Relative imports: from .utils import helper (same package), from ..models import User (parent package).
  • __init__.py: Marks a directory as a package; code runs when the package is first imported.
  • __main__.py: Lets you run a package with python -m mypackage.
  • if __name__ == "__main__":: The classic idiom — code runs only when the file is executed directly, not when imported.
  • pyproject.toml: Modern (PEP 517/518/621) way to declare dependencies, build system, and metadata. Replaces setup.py.
  • Virtual environments: venv (stdlib), virtualenv, poetry, uv, pipenv — all provide isolated dependency trees.
How it differs
  • vs JavaScript: JS has ESM (import) and CommonJS (require), with package.json and node_modules. Python has one import system but historically fragmented tooling (pip, poetry, conda, uv).
  • vs Java: Java packages are tied to directory structure with a strict naming convention (com.company.app). Python uses sys.path for flexibility but it can lead to confusing import errors.
  • vs Go: Go modules (go.mod) enforce semver and module paths. Go's imports are stricter — no circular imports allowed at all. Python allows circular imports but they're fragile.
  • vs Ruby: Ruby has require / require_relative and Bundler with Gemfile. Similar in spirit to Python's system.
  • vs C/C++: Header files and linker. Python's import system is dramatically simpler but purely runtime.
Why use itModules and packages enable code organization, reuse, and distribution. Publishing to PyPI via pip install mypackage requires proper package structure. Virtual environments isolate project dependencies so two apps can use different versions of the same library.
Common gotchas
  • Circular imports: a.py imports b.py which imports a.py — often raises ImportError for names. Fix by moving imports inside functions or restructuring.
  • ImportError vs ModuleNotFoundError: The latter (3.6+) is a subclass — module wasn't found at all.
  • Running scripts inside packages: python myapp/main.py often breaks relative imports. Use python -m myapp.main instead.
  • Shadowing stdlib: Naming your file email.py or json.py breaks import email.
  • Global state in __init__.py: Runs once per process — side effects can create hidden dependencies.
Real-world examplesEvery Python project uses modules. Django apps are Python packages with a convention (models.py, views.py, urls.py). FastAPI apps structure routers as submodules. NumPy, pandas, requests are all packages. Modern tooling: uv, poetry, hatch, pdm — all build on pyproject.toml.

Modules

A module is just a .py file. import executes it and gives you access to its contents.

# ── math_utils.py ──
PI = 3.14159
def circle_area(r): return PI * r ** 2

# ── main.py ──
import math_utils                         # whole module
from math_utils import circle_area        # specific function
from math_utils import circle_area as ca # with alias

__name__ == "__main__"

Guards code so it only runs when the file is executed directly, not when it's imported as a module.

# When Python runs a file directly: __name__ = "__main__"
# When it's imported:               __name__ = "module_name"

def circle_area(r):
    return 3.14 * r ** 2

if __name__ == "__main__":
    # Only runs when executed directly, NOT when imported
    print(circle_area(5))
    print("Tests passed!")

Packages & __init__.py

Directories become importable packages with __init__.py — controls what's publicly accessible.

myproject/ ├── __init__.py # makes this directory a package ├── models/ │ ├── __init__.py # controls what's importable from models │ ├── user.py │ └── product.py ├── utils/ │ ├── __init__.py │ └── helpers.py └── main.py
# ── models/__init__.py ──
from .user import User
from .product import Product

# Now users can do:
from models import User, Product     # clean!
# Instead of:
from models.user import User         # verbose

# ── WHY __init__.py in every folder? ──

# Problem: Python sees folders as just folders, not packages.
# Without __init__.py, `import models` fails — Python doesn't
# know this folder contains importable code.

# __init__.py tells Python: "This folder is a package."
# Think of it like an index page for a chapter in a book.

# ── What happens when you import a package (full breakdown) ──
#
# Given this structure:
#   models/
#   ├── __init__.py      →  from .user import User
#   └── user.py          →  class User: ...
#
# When you write: `from models import User`
#
# Step 1: FIND the package
#   Python searches sys.path (list of directories) for "models"
#   sys.path includes: current dir, installed packages, stdlib
#   It finds models/ folder and checks for __init__.py → found!
#
# Step 2: EXECUTE __init__.py (this is the key!)
#   Python runs models/__init__.py top to bottom, like any script.
#   That file says: `from .user import User`
#     → Python now runs models/user.py
#     → class User is created and bound to the name "User"
#       inside the models package namespace
#
# Step 3: BIND the name
#   `from models import User` grabs "User" from the models
#   namespace and binds it in YOUR file's namespace.
#   Now you can use User directly.
#
# Step 4: CACHE it
#   Python stores the module in sys.modules["models"].
#   Next time ANYONE imports models, __init__.py does NOT
#   run again — Python reuses the cached version.
#   (This is why init code only runs ONCE.)
# You can see this yourself:
import sys

# Before import — "models" not in cache
print("models" in sys.modules)       # False

from models import User               # triggers Step 1-4

# After import — cached!
print("models" in sys.modules)       # True
print(sys.modules["models"])          # <module 'models' from 'models/__init__.py'>

# See where Python searches for packages:
for p in sys.path:
    print(p)
# /Users/you/project          ← your current directory (first!)
# /usr/lib/python3.12         ← stdlib
# /usr/lib/python3.12/site-packages  ← pip installed

# ── Different import styles — same mechanism ──
import models                  # runs __init__.py → models.User
from models import User        # runs __init__.py → User directly
from models.user import User   # runs __init__.py AND user.py → User
import models.user             # runs __init__.py AND user.py → models.user.User

# KEY INSIGHT: __init__.py ALWAYS runs, no matter which style!
# Even `from models.user import User` triggers __init__.py first.

# ── 4 things __init__.py does ──

# 1. MARKS the directory as a package (can be empty!)
#    utils/__init__.py  → even an empty file works

# 2. CONTROLS the public API — hide internals, expose clean imports
# ── models/user.py ──
class User:
    def __init__(self, name):
        self.name = name

# ── models/__init__.py ──
from .user import User
from .product import Product

# Without __init__.py re-exports:
from models.user import User              # ugly — exposes internal structure
from models.product import Product

# With __init__.py re-exports:
from models import User, Product           # clean! users don't know about user.py
# 3. RUNS INIT CODE — setup that should happen once on import
#
# __init__.py runs ONCE when the package is first imported.
# Perfect for one-time setup. Think of it as the "boot up" for your package.

Example A: Shared database connection

# Problem: Every file creates its own DB connection = wasteful
#
# routes/auth.py
db = Database("sqlite:///app.db")           # connection #1
# routes/api.py
db = Database("sqlite:///app.db")           # connection #2  ← duplicate!
# routes/admin.py
db = Database("sqlite:///app.db")           # connection #3  ← duplicate!

# Solution: create it ONCE in __init__.py
#
# ── db/__init__.py ──
from .connection import Database
default_db = Database("sqlite:///app.db")   # created ONCE on first import

# ── routes/auth.py ──
from db import default_db                   # reuses same connection

# ── routes/api.py ──
from db import default_db                   # same connection — not recreated!

# ── routes/admin.py ──
from db import default_db                   # same connection — shared across app

# WHY? Python caches modules after first import.
# __init__.py runs once → default_db created once → everyone shares it.

Example B: Configure logging for the whole package

# ── myapp/__init__.py ──
import logging

# Runs ONCE when anyone does `import myapp` or `from myapp import ...`
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger("myapp")
logger.info("App package loaded!")          # prints once on startup

# ── myapp/routes/auth.py ──
import logging
logger = logging.getLogger("myapp.auth")    # inherits config from above!
logger.info("Auth route hit")               # formatted the same way

Example C: Register plugins / load components on startup

# ── plugins/__init__.py ──
from .email_plugin import EmailPlugin
from .sms_plugin import SMSPlugin

# Auto-register all plugins into a registry
registry = {}
for plugin in [EmailPlugin, SMSPlugin]:
    registry[plugin.name] = plugin         # built once at import time

# ── anywhere else ──
from plugins import registry
registry["email"].send("hello")            # use directly — no setup needed

Example D: Package version & metadata

# ── mylib/__init__.py ──
__version__ = "2.1.0"
__author__ = "Yatin"

# Now users can check:
import mylib
print(mylib.__version__)                    # "2.1.0"

# This is how real packages do it:
import flask
print(flask.__version__)                    # "3.0.0"
import requests
print(requests.__version__)                 # "2.31.0"

Example E: Validate environment on import

# ── payments/__init__.py ──
import os

# Fail FAST — crash at import time, not at runtime 10 minutes later
API_KEY = os.environ.get("STRIPE_API_KEY")
if not API_KEY:
    raise RuntimeError(
        "STRIPE_API_KEY not set! Add it to .env"
    )

# If we get here, key exists — safe to use in any file
# ── payments/charge.py ──
from . import API_KEY                       # guaranteed to exist

# WHY in __init__.py?
# Without: app starts → runs for 10 min → user pays → CRASH (no API key)
# With:    app starts → CRASH immediately → you fix it before deploying
# 4. CONTROLS __all__ — what `from package import *` includes # # When someone writes: `from models import *` # Python asks: "what is EVERYTHING in models?" # __all__ is the answer — a list of names to export. # ── models/__init__.py ── from .user import User from .product import Product from ._internal import _helper __all__ = ["User", "Product"] # ← whitelist for * imports # ── main.py ── from models import * # imports User and Product ONLY print(User) # ✅ works print(Product) # ✅ works print(_helper) # ❌ NameError — not in __all__! # But explicit imports ALWAYS work, regardless of __all__: from models import _helper # ✅ works — __all__ only restricts * # ── What happens WITHOUT __all__? ── # # `from models import *` exports EVERYTHING in __init__.py's namespace. # That includes User, Product, _helper, AND any imports like: # os, sys, json, random internal functions... # Your namespace gets polluted with stuff you didn't want. # # __all__ = ["User", "Product"] says: # "Only these two are the PUBLIC API. Everything else is internal." # ── Real example: a utils package ── # utils/__init__.py import os # needed internally import hashlib # needed internally from .security import hash_password from .security import verify_password from ._cache import _build_cache # internal helper __all__ = ["hash_password", "verify_password"] # Without __all__: `from utils import *` gives you: # hash_password, verify_password, _build_cache, os, hashlib ← messy! # # With __all__: `from utils import *` gives you: # hash_password, verify_password ← clean! only what you intended # ── __all__ also works in regular .py files, not just __init__.py ── # helpers.py __all__ = ["format_date"] def format_date(d): # public return _pad(d.day) def _pad(n): # internal — not in __all__ return str(n).zfill(2) # from helpers import * → gets format_date only, not _pad # ── "But it works without __init__.py!" — yes, partly. ── # # Structure WITHOUT __init__.py: # models/ # ├── user.py # └── product.py # # What WORKS without it (Python 3.3+): from models.user import User # ✅ works — full path to file from models.product import Product # ✅ works # What BREAKS without it (and WHY): # ─── BREAK 1: `from models import User` → ImportError ─── # # Think of it this way: # `from models import User` means "go to the models PACKAGE, find User" # But WHERE in models? Python checks __init__.py for that answer. # No __init__.py = Python has no idea User exists inside models/. # # It's like asking a receptionist for "John" but there's no receptionist. # The building has John inside room 204 (user.py), but nobody at the # front desk to point you there. # from models import User # ❌ ImportError: cannot import name 'User' # # FIX with __init__.py: # models/__init__.py → from .user import User # Now Python knows: "User? Yeah, it's in user.py, here you go." # ─── BREAK 2: `import models` → empty, useless module ─── # # Without __init__.py, `import models` gives you a namespace package # that's basically an empty shell. Nothing is loaded. # import models print(dir(models)) # ['__loader__', '__name__', ...] ← no User! models.User # ❌ AttributeError: no attribute 'User' models.user # ❌ AttributeError: no attribute 'user' # # Python found the folder, but didn't load ANY .py files inside it. # Just because files exist in a folder doesn't mean Python reads them. # __init__.py is the instruction sheet: "when someone imports me, # load these things and make them available." # # FIX with __init__.py: # models/__init__.py → from .user import User # Now: models.User works! # ─── BREAK 3: relative imports fail entirely ─── # # Inside models/user.py, you want to import from product.py: # # models/user.py from .product import Product # ❌ ImportError: no parent package # # The dot (.) means "from my parent package". # But without __init__.py, Python doesn't recognize models/ as a # real package — so there IS no "parent package" to be relative to. # # It's like saying "go to the room next door" when you're standing # outside — there's no building (package) you're inside of. # # Without __init__.py, your only option is: from models.product import Product # absolute import — works but fragile # # FIX: add __init__.py (even empty!) and relative imports work. # ── So WHY bother with __init__.py if direct imports work? ── # # 1. CLEAN IMPORTS — your users write less, know less about internals # # Without: from models.user import User ← must know file name # from models.product import Product # from models.validators import validate ← if you move validate # to another file, # ALL imports break! # # With: from models import User, Product, validate ← clean, stable # # You can move validate from validators.py to utils.py # # and NOTHING breaks — __init__.py absorbs the change. # # 2. RELATIVE IMPORTS — files within package can import each other # # # models/user.py # from .product import Product ← needs __init__.py! # from .validators import validate ← needs __init__.py! # # 3. TOOLS BREAK — pytest, mypy, Flask, Django all expect it # # pytest models/ ← may not discover tests without __init__.py # mypy models/ ← may skip type checking # pip install . ← your package won't include the folder # # 4. REFACTORING IS SAFE — move files around, __init__.py hides the change # # # You split user.py into user_model.py + user_schema.py # # Update __init__.py once: # from .user_model import User # from .user_schema import UserSchema # # Every file that does `from models import User` still works! # # BOTTOM LINE: # Without __init__.py = it works, but fragile — every import is a # hard-coded path to a specific file. Rename a file → imports break. # With __init__.py = stable public API — internal file structure can # change freely. This is why every serious project uses them. # ── Real project example ── # myapp/ # ├── __init__.py ← "from myapp import create_app" # ├── models/ # │ ├── __init__.py ← "from myapp.models import User" # │ ├── user.py # │ └── product.py # ├── routes/ # │ ├── __init__.py ← "from myapp.routes import auth_bp" # │ ├── auth.py # │ └── api.py # └── utils/ # ├── __init__.py ← "from myapp.utils import hash_password" # └── security.py # # Every __init__.py re-exports just the public stuff. # Users of your package never need to know your file structure.

13 Comprehensions

What is itComprehensions are Python's declarative, expression-based way to build collections from iterables, combining mapping and filtering into a single construct. They come in four flavors: list [x*2 for x in nums], set {x for x in nums}, dict {k: v for k, v in pairs}, and generator expression (x*2 for x in nums). Comprehensions are not just syntactic sugar — they are implemented via a dedicated bytecode sequence that is typically 30-50% faster than the equivalent for loop with .append().
The four types
  • List: [x**2 for x in range(10)] — eager, returns a list.
  • Set: {x.lower() for x in words} — deduplicates automatically.
  • Dict: {k: v**2 for k, v in data.items()} — builds a mapping.
  • Generator expression: (x**2 for x in range(10)) — lazy, no parentheses when passed as sole argument: sum(x**2 for x in nums).
  • Filtering: Append an if clause: [x for x in nums if x > 0].
  • Nested loops: [(x,y) for x in a for y in b] — equivalent to nested for.
  • Conditional expression: [x if x > 0 else 0 for x in nums] — applies ternary to each element.
How it differs
  • vs JavaScript: JS uses nums.map(x => x*2).filter(x => x > 0) — functional methods. Python comprehensions are more concise and faster than the equivalent chain.
  • vs Java: Java Streams nums.stream().map(x -> x*2).collect(toList()) — far more verbose.
  • vs C#: C# LINQ from x in nums where x > 0 select x*2 — closest analog, inspired Python's syntax in some ways.
  • vs Go: Go has no comprehensions at all — you write explicit loops. This is by design (simplicity over convenience).
  • vs Ruby: Ruby uses nums.map { |x| x*2 } — block-based. Similar expressiveness.
  • vs Haskell: Python borrowed comprehension syntax directly from Haskell's list comprehensions (which themselves came from mathematical set-builder notation).
Why use itComprehensions are concise, idiomatic, and fast. They signal intent ("I'm building a list by transforming this") more clearly than an imperative loop. Generator expressions let you stream through massive data without materializing a list — critical for memory efficiency. Pythonic code uses comprehensions liberally.
Common gotchas
  • Overuse: Deeply nested or complex comprehensions become unreadable. Break into a loop if it takes 3+ seconds to parse.
  • Side effects: [print(x) for x in nums] works but leaves garbage list. Use a real loop.
  • Variable leaks: In Python 2, the loop variable leaked out; Python 3 fixed this for list/set/dict but not for generators used in comprehensions.
  • Double evaluation: In [f(x) for x in data if f(x) > 0], f(x) is called twice. Use walrus: [y for x in data if (y := f(x)) > 0].
Real-world examplesTransforming API responses ([user["name"] for user in resp.json()]), building lookup dicts ({u.id: u for u in users}), filtering data ([r for r in rows if r.active]). pandas uses vectorized ops for speed, but for non-numeric data, comprehensions are the Pythonic default.
# ── List ──
squares = [x**2 for x in range(10)]
evens = [x for x in range(20) if x % 2 == 0]
labels = ["even" if x%2==0 else "odd" for x in range(5)]

# Nested — flatten matrix
matrix = [[1,2], [3,4], [5,6]]
flat = [n for row in matrix for n in row]
# [1, 2, 3, 4, 5, 6]

# ── Dict ──
word_lens = {w: len(w) for w in ["hello", "world"]}
swapped = {v: k for k, v in {"a": 1}.items()}

# ── Set ──
unique_lens = {len(w) for w in ["hi", "hey", "hello"]}  # {2, 3, 5}

# ── Generator expression (lazy — no memory for full list) ──
total = sum(x**2 for x in range(1_000_000))  # memory efficient!

14 Iterators & Generators

What is itAn iterator in Python is any object implementing the iterator protocol — two dunder methods: __iter__() (returns self) and __next__() (returns the next value or raises StopIteration). Anything that Python's for loop can walk is an iterable, which is an object with __iter__() that returns an iterator. A generator is a special kind of iterator created with a function containing yield — Python automatically handles the protocol, state-saving, and resumption for you. Generators are the backbone of Python's lazy evaluation and streaming data processing.
Key features
  • Lazy evaluation: Values produced one at a time, on demand — perfect for infinite sequences or large datasets.
  • yield: Pauses execution, remembers state, resumes on next __next__() call.
  • yield from: Delegates to another iterable — yield from sub_gen().
  • Generator expressions: (x*2 for x in nums) — comprehension syntax for generators.
  • Two-way communication: gen.send(value) passes data back into the generator — enabled coroutines before async/await.
  • itertools module: Extensive library of iterator combinators — chain, zip_longest, groupby, takewhile, cycle, product, combinations.
How it differs
  • vs JavaScript: ES6 added generators with function* and yield — directly inspired by Python. JS also has iterators/iterables via Symbol.iterator.
  • vs Java: Java has Iterator<T> and Iterable<T> interfaces but no yield — you must manually implement state machines. Java 8+ Streams offer lazy evaluation.
  • vs C#: C# has yield return, nearly identical to Python. LINQ is heavily lazy.
  • vs Go: Go traditionally uses channels + goroutines for streaming. Go 1.23 added range-over-func iterators — very recent.
  • vs Rust: Rust's Iterator trait is extremely rich and zero-cost. No yield keyword (uses explicit state machines), though gen blocks are experimental.
  • vs C++: C++20 added coroutines with co_yield — similar idea but more complex API.
Why use itGenerators let you process infinite or gigabyte-scale streams with constant memory — e.g., reading a 100GB log file line by line. They decouple production from consumption (the producer doesn't care how much the consumer will read). They're also the foundation of Python's async/await system, which is a special form of generator.
Common gotchas
  • Single-use: After iteration, a generator is exhausted. gen = (...); list(gen); list(gen) — second list is empty.
  • No len(): Generators don't know their length ahead of time.
  • Forgetting to iterate: Calling a generator function returns a generator object, not a value — gen() alone does nothing.
  • Generator inside class: Misremembering that generator functions become iterators, not iterables.
  • Mutating source during iteration: Often raises RuntimeError.
Real-world examplesReading CSV files row-by-row (csv.reader), streaming database query results, Django's QuerySet iteration, HTTP streaming responses, log processors, infinite random number generators, tokenizers in NLP libraries. asyncio itself is built on generators (originally; now on native coroutines).

The Iterator Protocol

How for loops actually work under the hood — using __iter__() and __next__() to walk through items one by one.

# How `for x in something` ACTUALLY works:
nums = [1, 2, 3]
iterator = iter(nums)        # calls __iter__()
print(next(iterator))        # 1  — calls __next__()
print(next(iterator))        # 2
print(next(iterator))        # 3
# next(iterator)              # StopIteration!

Generators — The Easy Way

yield pauses the function, returns a value, and resumes on next call.

def countdown(n):
    while n > 0:
        yield n     # pause here, return n
        n -= 1

for i in countdown(5):
    print(i)  # 5, 4, 3, 2, 1

# WHY? Memory efficiency.
# This yields billion values without storing them all:
def infinite():
    n = 0
    while True:
        yield n
        n += 1

# Real-world: process huge files with constant memory
def read_large_file(path):
    with open(path) as f:
        for line in f:
            yield line.strip()

# ── Generator pipeline (data processing pattern) ──
def parse(lines):
    for line in lines:
        yield line.split(",")

def filter_valid(rows):
    for row in rows:
        if len(row) == 3:
            yield row

# Chain them — nothing runs until you iterate!
pipeline = filter_valid(parse(read_large_file("data.csv")))
for row in pipeline:
    print(row)

# ── yield from ──
def chain(*iterables):
    for it in iterables:
        yield from it

list(chain([1,2], [3,4]))  # [1, 2, 3, 4]

15 Decorators

What is itA decorator is a callable that takes a function (or class) and returns a new callable, typically one that wraps the original with extra behavior. Python's @decorator syntax is purely sugar: @log\ndef f():... is exactly equivalent to def f():...; f = log(f). Decorators leverage Python's first-class functions and closures to implement cross-cutting concerns like logging, caching, authentication, rate-limiting, timing, and retries — without modifying the decorated function's source.
Key features
  • Function decorators: Wrap a function to add behavior — @timer, @cache, @log_calls.
  • Class decorators: Modify a class — @dataclass, @total_ordering, @attr.s.
  • Decorators with arguments: @retry(times=3) — a "decorator factory" that returns the actual decorator.
  • functools.wraps: Preserves the wrapped function's name, docstring, and signature — always use this in your decorators.
  • Built-in decorators: @property, @classmethod, @staticmethod, @functools.lru_cache, @functools.cached_property, @dataclass.
  • Stacking: Multiple decorators apply bottom-up — @a\n@b\ndef f() is f = a(b(f)).
How it differs
  • vs Java: Java has annotations (@Override, @Deprecated) but they are metadata — behavior requires a framework like Spring AOP to act on them. Python decorators execute immediately at definition time.
  • vs C#: C# attributes are metadata-only, like Java. Behavior injection requires reflection or PostSharp.
  • vs JavaScript: JS decorators are a TC39 Stage 3 proposal finalized in 2023. TypeScript/Angular have used them for years. Less ergonomic than Python's.
  • vs Go: Go has NO decorators or AOP — you must manually wrap functions. This is a deliberate simplicity choice.
  • vs Rust: Rust has procedural macros (#[derive(Debug)]) which are compile-time. More powerful but vastly more complex.
Why use itDecorators keep your core function logic clean while adding cross-cutting concerns declaratively. One @login_required line replaces 5 lines of auth-check boilerplate repeated across 50 view functions. They're also the mechanism behind Python's major web frameworks: the @app.route("/") pattern for Flask/FastAPI is just a decorator that registers the function with the router.
Common gotchas
  • Forgetting @functools.wraps: The wrapped function loses its __name__, __doc__, and signature — breaks introspection and Sphinx docs.
  • Wrong argument order: Decorators with args need three levels of nesting: factory → decorator → wrapper.
  • Mutable state in closures: A decorator that counts calls needs nonlocal or a mutable container.
  • Class vs instance methods: Applying a function decorator to a method and forgetting self breaks things.
  • Ordering matters: @staticmethod must be applied last (topmost).
Real-world examplesFlask/FastAPI: @app.get("/users/{id}"). pytest: @pytest.fixture, @pytest.mark.parametrize. Click: @click.command(), @click.option(...). Django: @login_required, @cache_page(60). Celery: @app.task. functools: @lru_cache(maxsize=128). Decorators are everywhere.

A decorator takes a function, wraps it, and returns the wrapped version. @ is just syntactic sugar.

import time
from functools import wraps

# ── Basic decorator ──
def timer(func):
    @wraps(func)   # preserves original name/docstring
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        print(f"{func.__name__} took {time.perf_counter()-start:.4f}s")
        return result
    return wrapper

@timer
def slow():
    time.sleep(1)

slow()  # slow took 1.0012s

# @timer on slow is the same as: slow = timer(slow)

# ── Decorator WITH arguments ──
def retry(max_attempts=3):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Attempt {attempt} failed: {e}")
                    if attempt == max_attempts: raise
        return wrapper
    return decorator

@retry(max_attempts=5)
def flaky_api():
    pass

# ── Built-in cache decorator ──
from functools import lru_cache

@lru_cache(maxsize=128)
def fib(n):
    if n < 2: return n
    return fib(n-1) + fib(n-2)

print(fib(100))  # instant!

16 Closures & Scope

What is itA closure is a function that captures variables from its enclosing lexical scope — these captured variables remain alive even after the enclosing function returns. Python resolves names using the LEGB rule: Local → Enclosing → Global → Built-in. To modify (not just read) a variable in an enclosing scope, you must use the nonlocal keyword (3.0+). To modify a module-level variable from inside a function, use global. Closures are the mechanism that makes decorators, factories, and many functional patterns work.
The LEGB lookup order
  • Local: Names defined inside the current function.
  • Enclosing: Names in any enclosing function's scope (for nested functions).
  • Global: Names at the module level.
  • Built-in: Names in the built-in namespace (len, print, range, etc.).
  • nonlocal: Rebinds a name to the nearest enclosing scope.
  • global: Rebinds a name to the module scope.
  • __closure__: A tuple of cell objects holding captured variables — you can inspect what a closure captured.
How it differs
  • vs JavaScript: JS closures are very similar — variables from outer scope are captured. JS has no nonlocal equivalent; you can rebind directly (no implicit shadowing on assignment).
  • vs Java: Java lambdas can only capture effectively final variables — you can't mutate them. Python's nonlocal bypasses this.
  • vs C++: C++ lambdas explicitly declare what they capture with [=] (by value), [&] (by reference), or named captures.
  • vs Go: Go closures capture by reference automatically. Unlike Python, you can modify captured variables without a special keyword.
  • vs Rust: Rust closures are strictly typed (Fn, FnMut, FnOnce) and use the borrow checker to ensure safety.
Why use itClosures enable function factories (functions that build customized functions), decorators (which capture the wrapped function), stateful callbacks without classes, and data hiding (the closed-over variables are inaccessible from outside). They're also the backbone of partial application and currying in functional Python.
Common gotchas
  • Late binding in loops: funcs = [lambda: i for i in range(3)] — all lambdas return 2! Fix with lambda i=i: i (default arg trick) or a generator function.
  • Assignment creates local: def f(): x = 10 — even if x exists globally, this creates a local. Use global x or nonlocal x.
  • UnboundLocalError: Reading then writing a variable in a function marks it local for the whole function, so the initial read fails.
  • Forgetting nonlocal: Decorator counters (count += 1) break without it.
  • Mutable captures: Captured lists/dicts can be modified in place without nonlocal, which can surprise.
Real-world examplesEvery Python decorator is a closure. Flask's @app.route captures app via closure. functools.partial is a closure factory. Event callbacks in Tkinter, stateful pytest fixtures, and memoization patterns all use closures. Configuration-bound loggers (logger = get_logger("myapp")) often capture config via closure.

LEGB Rule

The order Python searches for variable names — Local, Enclosing, Global, then Built-in.

# Python looks up variables in this order:
# L — Local (inside current function)
# E — Enclosing (in outer function, for nested)
# G — Global (module level)
# B — Built-in (print, len, etc.)

x = "global"
def outer():
    x = "enclosing"
    def inner():
        x = "local"
        print(x)    # "local"
    inner()

# global / nonlocal keywords
count = 0
def inc():
    global count      # access module-level count
    count += 1

def outer():
    x = 0
    def inner():
        nonlocal x    # access enclosing scope's x
        x += 1
    inner()
    print(x)         # 1

Closures

A function that remembers variables from the scope where it was created, even after that scope is gone.

# A function that "closes over" variables from its enclosing scope
def make_multiplier(factor):
    def multiply(x):
        return x * factor   # factor is remembered!
    return multiply

double = make_multiplier(2)
triple = make_multiplier(3)
print(double(5))   # 10
print(triple(5))   # 15

# Real-world: config factories
def make_logger(prefix):
    def log(msg):
        print(f"[{prefix}] {msg}")
    return log

error = make_logger("ERROR")
error("Something broke")  # [ERROR] Something broke

17 Context Managers

What is itA context manager is an object that implements the __enter__ and __exit__ dunder methods, enabling it to be used with Python's with statement. The __enter__ method runs when the block starts, and __exit__ runs when the block ends — guaranteed, even if an exception is raised inside. It's Python's answer to RAII (Resource Acquisition Is Initialization) from C++: acquire resource on entry, release on exit. The contextlib module provides helpers — most importantly @contextmanager, which lets you build context managers from generator functions.
Key features
  • Class-based: Implement __enter__(self) and __exit__(self, exc_type, exc_val, tb).
  • Generator-based: @contextlib.contextmanager on a generator with one yield — setup before yield, cleanup after.
  • Multiple managers: with open(a) as f, open(b) as g: — nested but flat syntax.
  • Suppressing exceptions: __exit__ can return True to suppress the exception (rarely recommended).
  • contextlib.suppress: Context manager that swallows specific exceptions — replaces try/except/pass.
  • contextlib.ExitStack: Dynamically push and pop context managers at runtime.
  • Async variants: async with, __aenter__, __aexit__ for asyncio.
How it differs
  • vs Java: Java 7 added try-with-resources try (Resource r = ...). Very similar but Java requires implementing AutoCloseable.
  • vs C#: C# has using (var x = ...) — same concept, implementing IDisposable.
  • vs C++: C++ RAII uses stack-allocated objects whose destructors run on scope exit. More automatic than Python but tied to object lifetime.
  • vs Go: Go uses defer f.Close() — runs on function return. Simpler but less lexically scoped.
  • vs JavaScript: JS has no built-in context manager construct. You use try/finally manually, or the Explicit Resource Management proposal (using).
  • vs Rust: Rust's Drop trait auto-runs cleanup when an object goes out of scope — no explicit with needed.
Why use itContext managers guarantee cleanup regardless of success or failure, preventing resource leaks (file handles, locks, database connections, network sockets). They make error-safe code look like normal code — no explicit try/finally cruft. The with block clearly delimits the scope of a resource.
Common gotchas
  • Forgetting with: f = open("x"); f.read() leaks the file until GC.
  • Returning from inside with: Cleanup still happens — __exit__ runs before the value is returned.
  • Exceptions in __enter__: __exit__ does NOT run if __enter__ itself raises.
  • Generator context managers with multiple yields: Raises RuntimeError — only one yield allowed.
  • Swallowing exceptions silently: Accidentally returning truthy from __exit__.
Real-world examplesFile I/O (with open(...)), database transactions (with conn.transaction()), thread locks (with lock:), tempfile management (with tempfile.TemporaryDirectory()), redirecting stdout (with contextlib.redirect_stdout()), timing blocks (custom with timer():), pytest's pytest.raises, SQLAlchemy's session.begin().
# ── Class-based ──
class Timer:
    def __enter__(self):
        self.start = time.perf_counter()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        print(f"Elapsed: {time.perf_counter()-self.start:.4f}s")
        return False   # don't suppress exceptions

with Timer():
    time.sleep(1)

# ── Generator-based (simpler) ──
from contextlib import contextmanager

@contextmanager
def timer(label):
    start = time.perf_counter()
    try:
        yield      # `with` block runs here
    finally:
        print(f"{label}: {time.perf_counter()-start:.4f}s")

with timer("query"):
    time.sleep(0.5)

18 *args & **kwargs

What is it*args and **kwargs are Python's mechanism for writing functions that accept a variable number of arguments. *args captures extra positional arguments as a tuple, while **kwargs captures extra keyword arguments as a dict. The names args and kwargs are just convention — the * and ** are the syntax. The same operators also work at the call site for argument unpacking: f(*my_list, **my_dict) expands collections into positional/keyword arguments.
Key uses
  • Accepting arbitrary args: def printer(*args): for a in args: print(a)
  • Accepting arbitrary kwargs: def config(**kwargs): self.__dict__.update(kwargs)
  • Unpacking: pairs = [(1,2),(3,4)]; list(zip(*pairs)) transposes to [(1,3),(2,4)].
  • Forwarding args: def wrapper(*args, **kwargs): return original(*args, **kwargs) — the decorator pattern.
  • Mixed: def f(a, b, *args, x=1, **kwargs): — positional, then *args, then keyword-only, then **kwargs.
  • Bare *: def f(a, *, b): — forces b to be keyword-only.
  • Positional-only: def f(a, /, b): — forces a to be positional-only (3.8+).
How it differs
  • vs JavaScript: JS rest params function f(...args) collect extras as an array. Spread f(...arr) unpacks. JS has no keyword arguments — object destructuring function f({a, b}) is the workaround.
  • vs Java: Java varargs void f(String... names) accept variable positional args. No keyword args at all — must use a builder or a Map.
  • vs C#: C# has params string[] args and named arguments f(name: "x"), closer to Python.
  • vs Go: Go has variadic args func f(nums ...int) and spread f(slice...). No keyword args.
  • vs C/C++: C uses stdarg.h with va_list — very manual and type-unsafe. C++ has variadic templates, much more powerful but harder.
Why use itEssential for writing flexible APIs, decorators, wrappers, and delegators. Any decorator that doesn't know the signature of the wrapped function needs *args, **kwargs to forward all calls transparently. Framework code (like print, str.format, or logging.info) uses them to accept arbitrary inputs.
Common gotchas
  • Order matters: def f(*args, x, **kwargs) — after *args, all named params become keyword-only.
  • Key collisions in **kwargs: f(x=1, **{"x": 2}) raises TypeError: multiple values for keyword argument 'x'.
  • Mutable kwargs: Mutating kwargs inside the function doesn't affect the caller — it's a new dict.
  • Unpacking a set: f(*my_set) works but order is undefined.
  • Type-hinting varargs: def f(*args: int) means each arg is int, not that args is List[int].
Real-world examplesEvery decorator in Python: def wrapper(*args, **kwargs): return func(*args, **kwargs). print(*args, sep=', '), dict(**kwargs), Flask's app.route(**options), pytest fixtures' request.param injection. The pattern super().__init__(*args, **kwargs) is ubiquitous in class hierarchies.
# *args  → extra POSITIONAL args as a tuple
# **kwargs → extra KEYWORD args as a dict

def add_all(*args):
    return sum(args)       # args is a tuple
print(add_all(1,2,3,4))  # 10

def profile(**kwargs):
    return kwargs          # kwargs is a dict
profile(name="Yatin", age=25)  # {'name':'Yatin','age':25}

# Combining — ORDER MATTERS
def full(required, default="hi", *args, **kwargs):
    print(required, default, args, kwargs)

# ── Unpacking ──
def add(a, b, c): return a + b + c
nums = [1, 2, 3]
print(add(*nums))                # 6 — unpack list
print(add(**{"a":1,"b":2,"c":3}))  # 6 — unpack dict

# ── Keyword-only (after *) ──
def connect(host, port, *, timeout=30, ssl=True):
    pass   # timeout & ssl MUST be keyword arguments

# ── Positional-only (before /) — 3.8+ ──
def greet(name, /, greeting="Hello"):
    pass   # name MUST be positional

19 Type Hints

What is itType hints (added in PEP 484, Python 3.5) are optional annotations on variables, parameters, and return values that describe the expected types. They are ignored at runtime by CPython — they exist for static type checkers like mypy, pyright, pyre, and pytype, as well as for IDE autocomplete, documentation, and runtime frameworks like Pydantic and FastAPI that read them via typing.get_type_hints(). Modern Python (3.10+) uses the | operator for unions and lowercase built-ins (list[int], dict[str, int]) without importing typing.
Key features
  • Basic: def add(x: int, y: int) -> int:
  • Collections: list[int], dict[str, float], tuple[int, ...], set[str].
  • Unions: int | str (3.10+) or Union[int, str].
  • Optional: str | None (shortcut for Optional[str]).
  • Generics: def first[T](items: list[T]) -> T: (3.12+ native syntax) or TypeVar.
  • Callable: Callable[[int, int], bool] for a function taking two ints, returning bool.
  • Protocol: Structural typing — "any class with a .read() method".
  • TypedDict: Typed dict-like records.
  • Literal: Literal["GET", "POST"] restricts to specific values.
  • Self type: def clone(self) -> Self: (3.11+).
How it differs
  • vs TypeScript: Conceptually very similar — both are gradual, structural, and bolted onto a dynamic language. But TS runs its checker in the same toolchain as the compiler; Python requires a separate tool.
  • vs Java/C#: Java/C# types are enforced by the compiler and the runtime. Python types are advisory: f("hello") when f(x: int) expects int will run fine at runtime, only flagged by mypy.
  • vs Go: Go is statically typed with compile-time enforcement. Python's type hints are closer to documentation than enforcement.
  • vs Rust: Rust types are strongly enforced with exhaustive pattern matching. Python types can be gradually adopted file-by-file.
  • vs Ruby: Ruby has RBS and Sorbet — similar optional typing retrofits.
Why use itType hints catch bugs before they run, dramatically improve IDE autocomplete and refactoring, and serve as inline documentation that can't drift out of date. In large codebases, mypy --strict provides confidence approaching Java-level safety. Frameworks like FastAPI and Pydantic leverage hints at runtime for auto-validation, serialization, and API doc generation.
Common gotchas
  • Forward references: Types that don't exist yet need string literals: def f(x: "SomeClass"): — or use from __future__ import annotations.
  • Not runtime-enforced: Passing wrong types doesn't raise — only mypy warns.
  • Any escape hatch: Any disables all checking; overuse defeats the purpose.
  • Optional vs default: x: int = None is wrong — must be x: int | None = None.
  • Mutable defaults still bite: Types don't fix the def f(x: list = []) sharing bug.
Real-world examplesFastAPI reads type hints to auto-generate OpenAPI schemas and validate incoming requests. Pydantic v2 (used by FastAPI) is entirely hint-driven. SQLAlchemy 2.0 uses typed mapped attributes. Django 4.1+ ships stubs. Every modern Python library (httpx, structlog, typer) is fully typed — type hints are now table stakes.

Type hints don't affect runtime. They exist for docs, IDE support, and tools like mypy.

def greet(name: str, times: int = 1) -> str:
    return (name + "! ") * times

# Collections
def process(
    items: list[int],
    mapping: dict[str, float],
    coords: tuple[float, float],
) -> None: ...

# Optional / Union (3.10+)
def find(id: int) -> str | None: ...
def parse(data: int | str) -> str: ...

# Callable
from typing import Callable
def apply(f: Callable[[int, int], int], a: int, b: int) -> int:
    return f(a, b)

# TypeAlias
Vector = list[float]
Matrix = list[Vector]

20 Data Structures from Scratch

What is itThis section builds classic algorithms-and-data-structures (DSA) primitives — linked lists, stacks, queues, trees, graphs, hash tables, heaps — from scratch in Python, using only the basic language features (classes, references, recursion). This is not how you'd build production code (use collections.deque, heapq, or a library), but it's essential for interview preparation, understanding the costs hidden behind Python's built-ins, and building a mental model of how references and objects really work. It also teaches you when to reach for a specialized structure over a list.
Structures typically covered
  • Linked List: Singly, doubly, circular — O(1) insertion/removal at known positions, O(n) lookup.
  • Stack: LIFO — Python lists already do this with .append() / .pop().
  • Queue: FIFO — collections.deque is O(1) at both ends, unlike list's O(n) .pop(0).
  • Binary Tree / BST: Hierarchical structure for sorted data; average O(log n) ops, worst O(n) if unbalanced.
  • Heap: Priority queue — Python's heapq provides min-heap as function calls on a list.
  • Hash Table: The backbone of Python's own dict and set.
  • Graph: Adjacency list (dict of lists) or matrix representation.
  • Trie: Prefix tree for string lookup.
How it differs from other languages
  • vs C/C++: No manual memory management — new Node() is automatic and garbage collected. No pointer arithmetic. But ~50-100× slower than C.
  • vs Java: No generics boilerplate — just use duck typing. Python's built-in dynamic containers make DSA code shorter.
  • vs JavaScript: Very similar — both use class-based implementations with object references. Python's integer arithmetic is arbitrary-precision, JS's is IEEE 754.
  • vs Go: Go has no generics until 1.18 and no inheritance — you'd use interfaces + structs. Verbose but faster.
  • vs Rust: Linked lists in Rust are infamously hard because of the borrow checker — you need Box, Rc<RefCell>, or unsafe. Python makes them trivial.
Why implement from scratchUnderstanding DSA internals is essential for LeetCode-style interviews (FAANG+, HFT, top startups), for making time/space complexity trade-offs, and for reasoning about performance. Most Python coders reach for lists and dicts by default, but knowing when a deque, heap, or set dramatically speeds things up is a senior-level skill.
Common gotchas
  • Recursion depth: Python's default recursion limit is ~1000 — recursive tree traversals can hit it on deep trees.
  • List .pop(0): O(n), not O(1) — use deque for FIFO.
  • Dict / set ordering: Dict is ordered (3.7+), set is not.
  • Hashability: Custom objects need __hash__ and __eq__ to go into sets or dict keys.
  • Heap is a min-heap: For max-heap, negate values.
Real-world relevanceEven though you won't hand-write a linked list in production, the concepts show up: LRU caches (doubly-linked list + dict), Dijkstra's algorithm (heap-backed priority queue in routing), Trie (autocomplete/spellcheck), BST (database indices, though B-trees dominate). Interview questions at Google, Meta, Amazon routinely test these.

Building classic data structures in Python teaches you OOP, references, recursion, and how the language actually works under the hood.

Linked List

A chain of nodes where each one points to the next. Good for fast insertions/deletions, but slow random access.

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None       # pointer to next node

class LinkedList:
    def __init__(self):
        self.head = None

    def append(self, data):
        new_node = Node(data)
        if not self.head:           # empty list
            self.head = new_node
            return
        current = self.head
        while current.next:          # walk to the end
            current = current.next
        current.next = new_node      # link it

    def prepend(self, data):
        new_node = Node(data)
        new_node.next = self.head    # point to old head
        self.head = new_node         # become the new head

    def delete(self, data):
        if not self.head:
            return
        if self.head.data == data:
            self.head = self.head.next
            return
        current = self.head
        while current.next:
            if current.next.data == data:
                current.next = current.next.next  # skip over it
                return
            current = current.next

    def __iter__(self):             # make it iterable!
        current = self.head
        while current:
            yield current.data
            current = current.next

    def __repr__(self):
        return " → ".join(str(x) for x in self) + " → None"

# Usage
ll = LinkedList()
ll.append(1)
ll.append(2)
ll.append(3)
ll.prepend(0)
print(ll)           # 0 → 1 → 2 → 3 → None
ll.delete(2)
print(ll)           # 0 → 1 → 3 → None

# Iterate like any Python collection
for val in ll:
    print(val)       # 0, 1, 3

Stack (LIFO)

Last In, First Out — like a stack of plates. The last item you add is the first one you remove.

class Stack:
    def __init__(self):
        self._items = []

    def push(self, item):
        self._items.append(item)

    def pop(self):
        if self.is_empty():
            raise IndexError("Stack is empty")
        return self._items.pop()

    def peek(self):
        return self._items[-1]

    def is_empty(self):
        return len(self._items) == 0

    def __len__(self):
        return len(self._items)

    def __repr__(self):
        return f"Stack({self._items})"

s = Stack()
s.push(1); s.push(2); s.push(3)
print(s.pop())   # 3 (last in, first out)
print(s.peek())  # 2

Queue (FIFO)

First In, First Out — like a real queue. The first item added is the first one processed.

from collections import deque

class Queue:
    def __init__(self):
        self._items = deque()   # deque is O(1) for both ends

    def enqueue(self, item):
        self._items.append(item)

    def dequeue(self):
        return self._items.popleft()

    def __len__(self):
        return len(self._items)

    def __repr__(self):
        return f"Queue({list(self._items)})"

q = Queue()
q.enqueue("first"); q.enqueue("second")
print(q.dequeue())  # "first" (first in, first out)

Binary Tree

A hierarchical structure where each node has at most two children. A BST keeps values sorted for fast search.

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = None
        self.right = None

class BinarySearchTree:
    def __init__(self):
        self.root = None

    def insert(self, val):
        if not self.root:
            self.root = TreeNode(val)
            return
        self._insert(self.root, val)

    def _insert(self, node, val):
        if val < node.val:
            if node.left:
                self._insert(node.left, val)
            else:
                node.left = TreeNode(val)
        else:
            if node.right:
                self._insert(node.right, val)
            else:
                node.right = TreeNode(val)

    def search(self, val):
        return self._search(self.root, val)

    def _search(self, node, val):
        if not node:
            return False
        if val == node.val:
            return True
        if val < node.val:
            return self._search(node.left, val)
        return self._search(node.right, val)

    def inorder(self, node="DEFAULT"):
        """In-order traversal: left → root → right (sorted order!)"""
        if node == "DEFAULT":
            node = self.root
        if not node:
            return []
        return self.inorder(node.left) + [node.val] + self.inorder(node.right)

# Usage
bst = BinarySearchTree()
for val in [5, 3, 7, 1, 4, 6, 8]:
    bst.insert(val)

print(bst.inorder())     # [1, 3, 4, 5, 6, 7, 8] — sorted!
print(bst.search(4))     # True
print(bst.search(99))    # False

# The tree looks like:
#        5
#       / \
#      3   7
#     / \ / \
#    1  4 6  8

Hash Map (Dictionary internals)

How Python's dict works under the hood — hashing keys to bucket indices for O(1) average lookup.

class HashMap:
    """Simplified version of how Python's dict works."""
    def __init__(self, size=16):
        self.size = size
        self.buckets = [[] for _ in range(size)]

    def _hash(self, key):
        return hash(key) % self.size   # map key to bucket index

    def __setitem__(self, key, value):   # hm[key] = value
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.buckets[idx]):
            if k == key:
                self.buckets[idx][i] = (key, value)
                return
        self.buckets[idx].append((key, value))

    def __getitem__(self, key):          # hm[key]
        idx = self._hash(key)
        for k, v in self.buckets[idx]:
            if k == key:
                return v
        raise KeyError(key)

hm = HashMap()
hm["name"] = "Yatin"
print(hm["name"])    # Yatin

21 Concurrency & The GIL

What is itPython offers three distinct concurrency models: threading (threading module — OS threads, limited by the GIL for CPU work), multiprocessing (multiprocessing — separate processes with their own memory, bypasses the GIL, gives true parallelism), and asyncio (asyncio, async/await — single-threaded cooperative scheduling for I/O-bound code). The GIL (Global Interpreter Lock) is a CPython-specific mutex that serializes bytecode execution — it's the most famous performance limitation of Python and the reason threading can't speed up CPU-bound code. Python 3.13 added an experimental no-GIL / free-threading build; 3.14+ continues to mature it.
The three models
  • threading: OS-level threads sharing memory. Good for I/O-bound work (network, disk). The GIL is released during blocking I/O calls. Use for: web scraping, API calls, parallel HTTP.
  • multiprocessing: Separate Python processes, each with their own GIL. True parallel CPU execution. Communication via pipes/queues. Use for: numerical computation, image processing, ML preprocessing.
  • asyncio: Single-threaded event loop. async def functions yield at await points. Ultra-cheap "coroutines" — millions possible. Use for: high-concurrency servers, streaming, WebSockets.
  • concurrent.futures: High-level API unifying thread and process pools via ThreadPoolExecutor and ProcessPoolExecutor.
How it differs
  • vs Go: Go has goroutines (lightweight M:N scheduled) and channels built into the language — no GIL, true parallelism by default. Python's closest equivalent is multiprocessing, which is much heavier.
  • vs Java: Java has real threads with no GIL — ExecutorService, ForkJoinPool, and virtual threads (Project Loom, Java 21) give Python-like ergonomics with true parallelism.
  • vs JavaScript: JS is single-threaded with an event loop — very similar to Python's asyncio. Node added Worker Threads for parallelism. JS never had Python's GIL pain because it never promised threading.
  • vs Rust: Rust has fearless concurrency — the borrow checker prevents data races at compile time. No GIL.
  • vs Ruby: Ruby MRI has a GVL (global VM lock), very similar to Python's GIL. Ractor (Ruby 3) offers true parallelism.
Why the GIL existsThe GIL simplifies CPython's reference counting garbage collector and makes writing C extensions easier (no need for per-object locks). Removing it breaks most existing C extensions. Efforts to remove it date back to the 1990s; the current PEP 703 "Making the GIL Optional" is the first serious path forward, landing as an opt-in build in 3.13.
Common gotchas
  • Using threads for CPU work: No speedup — the GIL serializes. Use multiprocessing.
  • Mixing sync and async: Calling a blocking function inside an async coroutine blocks the entire event loop.
  • Race conditions: Even with the GIL, individual operations like x += 1 are NOT atomic.
  • Multiprocessing pickling: Functions/objects sent to workers must be picklable — lambdas and local functions aren't.
  • asyncio.run inside running loop: Raises RuntimeError: asyncio.run() cannot be called from a running event loop.
Real-world examplesFastAPI uses asyncio for high-throughput APIs. aiohttp, httpx, Starlette are async HTTP. Celery uses multiprocessing/eventlet workers for background jobs. Dask, Ray, and multiprocessing.Pool parallelize data science workloads. NumPy/pandas release the GIL during C-level array operations — that's why multithreading can still help for numeric work.
The GIL (Global Interpreter Lock) — A mutex allowing only one thread to execute Python bytecode at a time. Threads DON'T give true parallelism for CPU work. They DO help with I/O (network, files) because the GIL is released during I/O.
# ── Threading — for I/O-bound ──
import threading, time

def download(url):
    time.sleep(2)  # simulate I/O

# Sequential: 6s  |  Threaded: ~2s
threads = [threading.Thread(target=download, args=(u,))
           for u in ["u1", "u2", "u3"]]
for t in threads: t.start()
for t in threads: t.join()

# ── Multiprocessing — for CPU-bound (bypasses GIL) ──
from multiprocessing import Pool

def compute(n):
    return sum(i**2 for i in range(n))

with Pool(4) as pool:
    results = pool.map(compute, [10**6]*4)

# ── asyncio — modern I/O-bound ──
import asyncio

async def fetch(url):
    await asyncio.sleep(2)
    return f"Data from {url}"

async def main():
    results = await asyncio.gather(
        fetch("api/users"), fetch("api/posts")
    )

asyncio.run(main())  # ~2s, not ~4s
ApproachBest ForGIL?
ThreadingI/O-boundLimited
asyncioMany I/O connectionsSingle thread
MultiprocessingCPU-boundBypassed

22 Memory Model

What is itPython's memory model is built on two layers of garbage collection: (1) reference counting — each object tracks how many names refer to it, and it's freed immediately when that count hits zero; and (2) a cyclic garbage collector (the gc module) that periodically finds and frees unreachable cycles that refcounts alone can't. Every variable in Python is a name bound to an object, not a slot holding a value — this is why a = b never copies data, only creates a second name for the same object. The id() function returns a unique identity (CPython uses the memory address).
Key concepts
  • Everything is an object: Integers, strings, functions, classes, modules — all are heap-allocated objects with a type and refcount.
  • Reference counting: sys.getrefcount(x) shows current refs. Most memory is freed the instant the count hits zero.
  • Cyclic GC: Handles reference cycles (a list containing itself). Configurable via gc.set_threshold().
  • Small-int caching: Integers from -5 to 256 are pre-allocated singletons. a = 10; b = 10; a is b is True.
  • String interning: Short strings and identifiers are interned — "foo" is "foo".
  • Mutable vs immutable: int, str, tuple, frozenset are immutable; list, dict, set are mutable.
  • Object header: Each object has a refcount, type pointer, and size — ~28+ bytes of overhead per object (vs 4 bytes for a raw int in C).
How it differs
  • vs Java: Java uses tracing GC (generational mark-sweep). No refcounting. Higher latency, higher throughput. Python is more predictable for simple cases.
  • vs JavaScript: JS uses mark-sweep GC — no refcounting. Similar semantics for references.
  • vs Go: Go uses a low-latency concurrent tri-color mark-sweep GC. No refcounting.
  • vs C/C++: Manual memory management. Python is vastly safer but uses ~3-10× more memory per primitive.
  • vs Rust: Rust's ownership system gives deterministic cleanup at compile time — no runtime GC at all. Lowest memory footprint, but highest learning curve.
Why it mattersUnderstanding the memory model is essential for avoiding accidental aliasing bugs (two names modifying the same list), writing memory-efficient code (use __slots__, generators, arrays instead of lists), debugging leaks (cycles holding large objects), and understanding why a = b and a += x behave differently for lists vs ints.
Common gotchas
  • Shared mutable state: a = [1]; b = a; b.append(2) — now a == [1,2].
  • Default argument persistence: def f(x=[]): — the list is shared across all calls because it's created once.
  • Memory leaks via closures/cycles: Long-lived closures holding references prevent GC.
  • is vs ==: is checks identity; == checks equality. Never use is for comparing values except to None.
  • Object overhead: A million-element list of integers uses ~30× more RAM than a C array.
Real-world examplesNumPy arrays bypass Python's boxed-integer overhead by storing raw C values — 100× less memory than lists. __slots__ on classes eliminates the per-instance dict, saving ~50% memory for high-volume objects (e.g., game entities, network packets). Profilers like memory_profiler, tracemalloc, and objgraph diagnose leaks in production.
import sys

# Everything is heap-allocated
print(sys.getsizeof(0))     # 28 bytes — even an int!
print(sys.getsizeof(""))    # 49 bytes — empty string!
print(sys.getsizeof([]))   # 56 bytes — empty list!

# Reference counting — primary GC
a = [1, 2]   # refcount = 1
b = a        # refcount = 2
del b        # refcount = 1
del a        # refcount = 0 → freed immediately

# Circular refs → generational GC handles them
import gc
gc.collect()

# ── __slots__ — save memory ──
class Point:
    __slots__ = ("x", "y")   # no __dict__ per instance
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Regular class: ~152 bytes/instance
# With __slots__: ~56 bytes/instance
# 1M points = ~100MB saved!

23 Metaclasses & Descriptors

What is itA metaclass is the "class of a class" — the thing that creates class objects. By default, Python uses type as the metaclass for every class. Writing a custom metaclass lets you intercept class creation to validate, modify, or register classes as they're defined. A descriptor is an object that defines __get__, __set__, or __delete__ — when stored as a class attribute, these methods are called instead of ordinary attribute access. Descriptors are how @property, @classmethod, @staticmethod, and methods themselves actually work. Together, metaclasses and descriptors are Python's ultimate metaprogramming toolkit.
Key features
  • Metaclass basics: class Foo(metaclass=Meta):Meta.__new__ creates Foo.
  • type dual role: type(obj) returns an object's type; type(name, bases, dict) creates a new class dynamically.
  • Data descriptors: Define both __get__ and __set__ — higher priority than instance dict.
  • Non-data descriptors: Only __get__ — lower priority than instance dict.
  • __init_subclass__: A simpler alternative to metaclasses for customizing subclass creation (PEP 487).
  • __set_name__: Lets a descriptor know its owner attribute name at class creation time.
How it differs
  • vs Java/C#: Java has annotations + reflection APIs. To achieve metaclass-level power, you need annotation processors or bytecode weaving (ASM, Byte Buddy). Python's is runtime and simpler.
  • vs Ruby: Ruby has singleton classes and class_eval — similar metaprogramming ability. Ruby's is arguably more ergonomic; Python's is more explicit.
  • vs JavaScript: JS has Proxies (ES6) for intercepting property access — spiritually similar to descriptors.
  • vs C++: C++ templates and concepts give compile-time metaprogramming — vastly more complex.
  • vs Go/Rust: Neither has runtime class modification. Go has zero metaprogramming. Rust has macros, but they're compile-time.
Why use themMetaclasses and descriptors are rarely needed in application code, but they're how major frameworks work their magic: Django ORM turns class User(Model): name = CharField() into a SQL-backed class with migrations. Pydantic turns type hints into validators. ABC turns abstract methods into enforcement. You usually don't write metaclasses — you use frameworks that do. But understanding them demystifies what "magic" is actually happening.
Common gotchas
  • Metaclass conflicts: Inheriting from classes with different metaclasses raises TypeError.
  • Overengineering: "Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't." — Tim Peters.
  • Descriptor only works as class attribute: Assigning to an instance bypasses it.
  • Order of descriptor vs instance dict: Data descriptors win; non-data descriptors lose.
  • __init_subclass__ is easier: Most "I need a metaclass" cases are better solved with this hook.
Real-world examplesDjango Model meta: class User(Model)ModelBase metaclass registers fields. SQLAlchemy declarative base: similar. ABC: ABCMeta enforces abstract methods. Pydantic: uses ModelMetaclass + field descriptors. Python's own @property: implemented as a descriptor. Enum: uses EnumMeta.

A metaclass is the "class of a class." type is the default metaclass.

print(type(42))      # <class 'int'>
print(type(int))     # <class 'type'> — int is an instance of type!
print(type(type))    # <class 'type'> — type is its own metaclass

# Singleton metaclass
class SingletonMeta(type):
    _instances = {}
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super().__call__(*args, **kwargs)
        return cls._instances[cls]

class Database(metaclass=SingletonMeta):
    pass

print(Database() is Database())  # True — always same object

Descriptors

The low-level mechanism behind @property, @classmethod, and @staticmethod — controls how attribute access works.

# The mechanism behind @property, @classmethod, @staticmethod
class Positive:
    def __set_name__(self, owner, name):
        self.name = name
        self.storage = f"_{name}"

    def __get__(self, obj, objtype=None):
        return getattr(obj, self.storage, None)

    def __set__(self, obj, value):
        if value <= 0:
            raise ValueError(f"{self.name} must be positive")
        setattr(obj, self.storage, value)

class Product:
    price = Positive()      # descriptor on the CLASS
    quantity = Positive()

    def __init__(self, name, price, qty):
        self.name = name
        self.price = price       # triggers Positive.__set__
        self.quantity = qty

24 Design Patterns

What is itDesign patterns are reusable solutions to common object-oriented design problems, first catalogued in the 1994 "Gang of Four" book. Python's flexibility — first-class functions, duck typing, decorators, context managers, multiple inheritance — means many classic patterns become one-liners or disappear entirely. The Strategy pattern is "just pass a function". The Iterator pattern is built into for. Singleton becomes "just use a module". In idiomatic Python, patterns often look nothing like their Java counterparts.
Key patterns in Python
  • Strategy: Pass a function or lambda as a parameter. sorted(xs, key=lambda x: x.name).
  • Factory: A classmethod or plain function that returns an instance — Path.home(), datetime.now().
  • Singleton: A module-level instance. Python modules are already singletons.
  • Observer: Simple callback lists or blinker/PyDispatcher. Frameworks like Django use signals.
  • Decorator pattern: Python's @decorator syntax — more general than the GoF decorator.
  • Adapter: Duck typing often makes adapters unnecessary; when needed, a simple wrapper class.
  • Context Manager: A Python-specific pattern replacing try/finally.
  • Dependency Injection: Pass dependencies explicitly as constructor args, or use libraries like dependency-injector or FastAPI's Depends.
  • Command: First-class functions make this trivial.
How it differs
  • vs Java: Java's inflexibility (no first-class functions pre-8, strict types) forces verbose pattern implementations. Python's flexibility replaces many with 2-liners.
  • vs C++: Similar to Java — patterns tend to be more involved and templated.
  • vs JavaScript: JS also has first-class functions so patterns are similarly compact.
  • vs Go: Go deliberately minimizes OOP, pushing you toward composition-based patterns (small interfaces, struct embedding).
  • vs Ruby: Very similar to Python — dynamic typing and blocks make most patterns natural.
Why use themDesign patterns give you shared vocabulary with other developers ("this is the Strategy pattern"), proven solutions to tricky problems, and architectural blueprints for new features. But in Python the rule is: don't over-pattern. A function often replaces a class; a dict often replaces a class hierarchy.
Common gotchas
  • Java-style overengineering: Porting Java-heavy patterns to Python creates unnecessary classes.
  • Singletons in multi-process code: Each process gets its own module state.
  • Global state: Signals and observers can create hard-to-trace coupling.
  • Abstract Factory overkill: Often a dict of classes is enough.
Real-world examplesDjango signals (observer), Django ORM objects.filter(...) (builder/chainable), Flask app factory (factory), pytest fixtures (dependency injection), FastAPI Depends() (DI), SQLAlchemy session (unit of work), Python's @contextmanager (template method).
# ── Strategy (first-class functions) ──
def sort_by_name(items):
    return sorted(items, key=lambda x: x["name"])

def display(items, strategy):
    return strategy(items)

# ── Observer ──
class EventEmitter:
    def __init__(self):
        self._listeners = {}
    def on(self, event, cb):
        self._listeners.setdefault(event, []).append(cb)
    def emit(self, event, *args):
        for cb in self._listeners.get(event, []):
            cb(*args)

# ── Registry (common in ML frameworks) ──
REGISTRY = {}
def register(name):
    def dec(cls):
        REGISTRY[name] = cls
        return cls
    return dec

@register("linear")
class LinearModel: pass

model = REGISTRY["linear"]()  # create by name

25 Testing

What is itPython's testing ecosystem is dominated by pytest — a de-facto standard third-party framework that replaces the stdlib unittest (which still exists for xUnit-style tests). Pytest uses plain functions and assert statements with powerful introspection to produce helpful failure messages, plus a fixture system for setup/teardown and dependency injection, a parametrize decorator for data-driven tests, and a rich plugin ecosystem. Python also ships with doctest (tests embedded in docstrings), mock/unittest.mock for faking dependencies, and hypothesis for property-based testing.
Key tools
  • pytest: def test_foo(): assert result == expected — zero boilerplate.
  • Fixtures: @pytest.fixture functions injected into tests by name.
  • Parametrize: @pytest.mark.parametrize("x,expected", [(1,2),(3,6)]) runs the same test with multiple inputs.
  • unittest.mock: @patch("module.func") or Mock() for test doubles.
  • Coverage.py: pytest --cov=myapp reports which lines aren't tested.
  • Hypothesis: Property-based testing — generates random inputs to find edge cases.
  • tox / nox: Run tests across multiple Python versions and environments.
  • doctest: Embed examples in docstrings; they're tested automatically.
How it differs
  • vs Java JUnit: JUnit requires classes, annotations (@Test), and assertion methods (assertEquals). Pytest uses plain functions and Python's assert — way less ceremony.
  • vs JavaScript Jest: Jest uses describe()/it() blocks and expect(x).toBe(y). Pytest is flatter and rely on bare asserts.
  • vs Go testing: Go's testing package is minimal — just func TestFoo(t *testing.T). Similar philosophy to pytest but even more spartan.
  • vs Ruby RSpec: RSpec uses a DSL (describe, it, expect). Pytest is simpler and less nested.
  • vs C++ Google Test: GTest requires macros (TEST_F, EXPECT_EQ). Pytest is dramatically cleaner.
Why use itTesting catches bugs before they reach production, enables confident refactoring, and documents expected behavior. Pytest's low ceremony means developers actually enjoy writing tests — the biggest factor in achieving high coverage. The parametrize feature makes it trivial to test 20 variants in 5 lines.
Common gotchas
  • Test interdependence: Tests sharing state through global modules fail unpredictably. Use fixtures.
  • Mocking the wrong thing: Mock where the name is looked up, not where it's defined.
  • Slow tests: Real database / network calls make test suites useless. Use fixtures and in-memory replacements.
  • Over-mocking: Tests become change detectors — they break on refactoring without catching real bugs.
  • Fixture scope: Forgetting scope="session" can re-run expensive setup for every test.
Real-world examplesEvery serious Python project uses pytest. Django/Flask/FastAPI apps pair pytest with pytest-django, pytest-flask, httpx.AsyncClient. pandas, numpy, scikit-learn all use pytest for their massive test suites. CI platforms (GitHub Actions, GitLab CI, CircleCI) trivially run pytest on every commit.
import pytest

def add(a, b): return a + b

# Test functions start with test_
def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

# Test exceptions
def test_divide_zero():
    with pytest.raises(ZeroDivisionError):
        1 / 0

# Fixtures — setup/teardown
@pytest.fixture
def sample_data():
    data = {"users": ["Alice", "Bob"]}
    yield data    # test runs at yield point
    # cleanup after

def test_users(sample_data):
    assert len(sample_data["users"]) == 2

# Parametrize — run same test with different inputs
@pytest.mark.parametrize("a,b,expected", [
    (1,2,3), (-1,1,0), (0,0,0),
])
def test_add_many(a, b, expected):
    assert add(a, b) == expected

# Run: pytest test_file.py -v

26 Dataclasses

What is itdataclasses (PEP 557, Python 3.7) is a stdlib decorator that auto-generates boilerplate (__init__, __repr__, __eq__, optionally __hash__, __lt__, etc.) based on class-level type-annotated attributes. It replaces 95% of plain Python classes used as data containers, removing the tedious self.x = x; self.y = y; self.z = z pattern. It's pure Python, zero dependencies — your classes remain normal classes, just with auto-generated methods. For validation, serialization, and JSON support, Pydantic or attrs extends the idea further.
Key features
  • Basic: @dataclass class Point: x: int; y: int — gets __init__, __repr__, __eq__.
  • Defaults: x: int = 0; for mutable defaults use field(default_factory=list).
  • frozen=True: Makes instances immutable (attempting to set raises FrozenInstanceError).
  • order=True: Auto-generates __lt__, __le__, etc. based on field order.
  • slots=True: Adds __slots__ (3.10+) — saves memory, speeds up attribute access.
  • kw_only=True: All fields become keyword-only (3.10+).
  • __post_init__: Called after __init__ for custom logic or validation.
  • asdict/astuple: Convert to dict or tuple for serialization.
How it differs
  • vs regular classes: No more def __init__(self, x, y): self.x = x; self.y = y. Dataclass gives you this for free plus __repr__ and __eq__.
  • vs Java records: Java 16+ records are conceptually identical — immutable data carriers with auto-generated methods.
  • vs C# records: C# 9 records — same idea.
  • vs Kotlin data classes: Kotlin's data class is essentially the prototype for Python's dataclasses.
  • vs TypeScript interfaces: TS interfaces are compile-time only. Dataclasses are real runtime classes.
  • vs Go structs: Go structs are similar but lack auto-generated methods; you write constructor funcs manually.
  • vs Pydantic: Pydantic adds runtime validation, JSON schema, and better performance (v2 is Rust-backed). Dataclasses don't validate.
  • vs attrs: attrs predates dataclasses and has more features (validators, converters, slots by default).
Why use itDataclasses eliminate tedious boilerplate while keeping your classes pure Python (no dependencies). Perfect for configuration objects, DTOs, domain entities, event payloads, tree/graph nodes. They also pair naturally with type hints for full mypy support.
Common gotchas
  • Mutable defaults: items: list = [] raises ValueError — must use field(default_factory=list).
  • No runtime validation: Type hints are documentation only. Point(x="not an int") just works.
  • Inheritance ordering: Non-default fields can't follow default fields — including through inheritance.
  • __eq__ and unhashable: By default, @dataclass sets __hash__ to None (unhashable). Use frozen=True to make it hashable.
  • InitVar: Special field type for init-only args not stored on the instance.
Real-world examplesConfiguration (@dataclass class Config: host: str; port: int = 8080), API response schemas, message queue payloads, AST nodes, event sourcing events. Many internal tools migrate from NamedTuple to dataclasses for mutability + type hints. Projects like Pydantic models, FastAPI request bodies, and SQLModel ORM build on or parallel the same idea.
from dataclasses import dataclass, field, asdict

# Auto-generates __init__, __repr__, __eq__
@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0

p = Point(1, 2)
print(p)                      # Point(x=1, y=2, z=0.0)
print(p == Point(1, 2))       # True

# frozen=True → immutable (hashable, can be dict keys)
@dataclass(frozen=True)
class Color:
    r: int; g: int; b: int

# slots=True → memory efficient (3.10+)
@dataclass(slots=True)
class Particle:
    x: float; y: float; mass: float

# Mutable defaults need field(default_factory=...)
@dataclass
class Config:
    name: str
    tags: list[str] = field(default_factory=list)

    def __post_init__(self):
        # Runs AFTER auto-generated __init__ — for validation
        if not self.name:
            raise ValueError("Name required")

print(asdict(Config("app", ["ml"])))
# {'name': 'app', 'tags': ['ml']}

27 Functional Programming

What is itPython is not a pure functional language — it's multi-paradigm — but it supports many functional idioms: first-class functions, closures, higher-order functions, map/filter/reduce, immutable data (tuples, frozensets), comprehensions, generators, and the functools / itertools / operator standard library modules. Guido van Rossum famously wanted to remove map, filter, and lambda from Python 3, preferring comprehensions; they stayed but comprehensions remain the Pythonic default for transformations.
Key tools
  • map(fn, iterable): Lazy application of fn to each item. Pythonic alternative: comprehension.
  • filter(pred, iterable): Lazy filtering by predicate.
  • functools.reduce: Left-fold. reduce(operator.add, [1,2,3]) == 6.
  • functools.partial: Partial application — freeze some arguments to create a new function.
  • functools.lru_cache: Memoization decorator.
  • functools.reduce, cache, singledispatch: Other functional helpers.
  • itertools: Lazy combinators — chain, accumulate, groupby, dropwhile, takewhile, starmap.
  • operator: Functions mirroring operators — operator.add, operator.itemgetter("name").
  • Immutable types: tuple, frozenset, @dataclass(frozen=True).
How it differs
  • vs Haskell: Haskell is pure — no side effects, lazy by default, rich type system. Python is impure but borrows ideas (comprehensions, pattern matching).
  • vs JavaScript: JS has Array.map/filter/reduce as methods. Python's are free functions (or comprehensions).
  • vs Java: Java 8+ Streams give map/filter/reduce. Verbose but type-safe.
  • vs Scala: Scala is a true functional-OOP hybrid with pattern matching, case classes, and immutability. More expressive than Python.
  • vs Clojure: Clojure is immutable-by-default with persistent data structures. Very different mindset from Python.
  • vs Elixir/Erlang: Purely functional, actor-based. Python's async is spiritually similar to processes.
Why use itFunctional idioms encourage data pipelines, immutability, and pure functions — all of which improve testability, reasoning, and parallelism. In Python, the "sweet spot" is hybrid: use comprehensions/generators for transformations, immutable dataclasses for domain objects, and keep side effects at the edges.
Common gotchas
  • lambda limitations: Only expressions, no statements. Multi-line logic needs a real def.
  • reduce readability: Complex reduces are hard to read — often a loop is clearer.
  • Closure capture: Lambdas in loops capture variables by reference, not by value.
  • No tail-call optimization: Deeply recursive functional code blows the stack.
  • Performance: map/filter + lambda is often slower than a comprehension due to per-call function overhead.
Real-world examplespandas' .apply(lambda row: ...), toolz library (functional utilities), PySpark RDD .map().filter().reduce(), dask graph computation, immutables library for persistent dicts. Event sourcing architectures use immutable events + pure reducers.
from functools import reduce, partial
from itertools import chain, islice, groupby, product

# ── partial — lock in some arguments ──
def power(base, exp): return base ** exp

square = partial(power, exp=2)
print(square(5))   # 25

# ── itertools ──
list(chain([1,2], [3,4]))        # [1,2,3,4]  flatten
list(islice(infinite(), 5))     # [0,1,2,3,4] lazy slice
list(product([1,2], ["a","b"])) # all combinations

# reduce — accumulate
print(reduce(lambda a,b: a+b, [1,2,3,4]))  # 10

28 NumPy Foundation

What is itNumPy (Numerical Python) is the foundation of Python's scientific and machine-learning ecosystem. Its core object is the ndarray — a fixed-type, multi-dimensional, densely-packed array stored in contiguous memory, manipulated by C and Fortran code under the hood (via BLAS, LAPACK, and SIMD). It's typically 50-100× faster than pure Python lists for numeric work, uses far less memory (4 bytes per int32 vs ~28 bytes per boxed Python int), and enables vectorization — expressing computations as whole-array operations instead of element-by-element loops.
Key features
  • ndarray: N-dimensional homogeneous array with a single dtype (int32, float64, complex128, etc.).
  • Vectorization: a + b adds entire arrays element-wise without Python-level loops.
  • Broadcasting: Rules for combining arrays of different shapes — a.shape=(3,1) + b.shape=(1,4) produces (3,4).
  • Slicing and views: a[::2] returns a view (no copy), so modifying it affects the original.
  • Fancy indexing: Boolean and integer array indexing — a[a > 0], a[[1,3,5]].
  • Reductions: .sum(), .mean(), .std(), .max() along any axis.
  • Linear algebra: np.dot, np.linalg.inv, np.linalg.eig, SVD, QR, etc.
  • FFT, random, polynomial: Entire mathematical libraries built in.
How it differs
  • vs Python lists: Lists are arrays of boxed heterogeneous objects with pointer indirection. NumPy arrays are packed C primitives — 10-100× faster, dramatically less memory.
  • vs MATLAB: NumPy was designed as a Python alternative to MATLAB. Syntax is similar but Python is free, general-purpose, and has a richer ecosystem.
  • vs R: R's vectors and matrices are similar in spirit; NumPy is more general and integrates better with non-stats code.
  • vs Julia: Julia is designed from scratch for scientific computing and can match or beat NumPy. But Python's ecosystem (sklearn, pytorch, TF) is deeper.
  • vs C++ Eigen/Armadillo: Those are pure C++ template libraries — faster compile-time, more cumbersome API.
Why use itNumPy is the de-facto numeric standard in Python. Every major library — pandas, scikit-learn, PyTorch, TensorFlow, SciPy, Matplotlib, OpenCV — either uses NumPy arrays internally or interoperates with them via the __array__ protocol. Mastering NumPy's vectorization is the single biggest performance win for Python data code.
Common gotchas
  • Views vs copies: Slicing creates a view; modifying it mutates the original. Use .copy() to be safe.
  • Integer overflow: int32 wraps silently at 2³¹ — NumPy does NOT upcast automatically.
  • dtype upcasting: int32 + float32 becomes float64.
  • Broadcasting surprises: Shapes that don't align raise ValueError.
  • Loops over arrays: for x in arr defeats vectorization — use whole-array ops.
  • NaN propagation: np.nan != np.nan; use np.isnan.
Real-world examplesMachine learning: Every gradient descent, matrix multiplication, and activation in neural nets runs on NumPy-like arrays. Image processing: OpenCV stores images as NumPy arrays. Scientific computing: Physics simulations, signal processing, finance quant models. PyTorch tensors and TensorFlow tensors both conceptually mirror NumPy arrays (with GPU support added).

NumPy is the foundation of ALL Python ML/data science. Every ML library uses NumPy arrays internally.

import numpy as np

# ── Creation ──
a = np.array([1, 2, 3])
b = np.zeros((3, 4))             # 3x4 zeros
c = np.ones((2, 3))              # 2x3 ones
d = np.arange(0, 10, 2)          # [0, 2, 4, 6, 8]
e = np.linspace(0, 1, 5)         # [0, 0.25, 0.5, 0.75, 1.0]
f = np.random.randn(3, 3)        # 3x3 random normal
g = np.eye(3)                     # identity matrix

# ── Shape ──
print(b.shape)   # (3, 4)
print(b.dtype)   # float64
print(b.ndim)    # 2
x = np.arange(12).reshape(3, 4)  # reshape

# ── Vectorized ops (100x faster than Python loops) ──
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
print(a + b)       # [11, 22, 33]
print(a * b)       # [10, 40, 90]
print(a ** 2)      # [1, 4, 9]

# ── Boolean indexing ──
data = np.array([10, 25, 3, 40])
print(data[data > 10])   # [25, 40]

# ── Broadcasting ──
matrix = np.ones((3, 3))
row = np.array([1, 2, 3])
print(matrix + row)  # row is broadcast across all rows

# ── Linear algebra ──
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
print(A @ B)                    # matrix multiply
print(np.linalg.inv(A))         # inverse
print(np.linalg.det(A))         # determinant

# ── Aggregations ──
m = np.array([[1,2,3],[4,5,6]])
print(m.sum(axis=0))   # [5,7,9]  per column
print(m.sum(axis=1))   # [6, 15]  per row
print(m.mean(), m.std())

29 Pandas Foundation

What is itPandas is Python's premier library for tabular data manipulation and analysis. Built on top of NumPy, it provides two core data structures: the Series (a labeled 1-D array) and the DataFrame (a 2-D table with labeled rows and columns, like a SQL table or Excel sheet). Pandas gives you vectorized operations, SQL-like group-by / join / filter, and connectors to read/write CSV, JSON, Parquet, SQL, Excel, HDF5. Created by Wes McKinney at AQR Capital in 2008, it's now the lingua franca of data science and analytics in Python.
Key features
  • Series: 1-D labeled array — think of it as a NumPy array with a name and an index.
  • DataFrame: 2-D table — dict of Series sharing an index.
  • I/O: pd.read_csv, read_parquet, read_sql, read_excel, read_json, etc.
  • Filtering: Boolean indexing — df[df.age > 30].
  • Group-by: df.groupby("city").agg({"sales":"sum"}).
  • Join/Merge: pd.merge(left, right, on="id", how="inner") — SQL-like joins.
  • Reshape: pivot, pivot_table, melt, stack, unstack.
  • Time-series: Rich datetime indexing, resampling, rolling windows.
  • Missing data: NaN support with .fillna, .dropna, .isna.
How it differs
  • vs R data.frame: Direct inspiration. Pandas is more Python-integrated; R's dplyr is arguably cleaner.
  • vs SQL: Pandas does in-memory what SQL does on disk. SQL is better for huge tables; pandas for iterative analysis.
  • vs Excel: Pandas scales to millions of rows, is version-controllable, and reproducible. Excel is better for ad-hoc exploration and charts.
  • vs Polars: Polars is a newer Rust-based DataFrame library — much faster, lazy evaluation, type-safer API. The pandas-killer, arguably.
  • vs Dask/PySpark: Dask and PySpark handle out-of-memory and distributed data. Pandas is single-machine.
  • vs JS (Data-Forge / Danfo.js): JS tabular libraries exist but are vastly less mature.
Why use itPandas is the default Python tool for any data manipulation task. Loading a CSV, cleaning it, joining with another table, aggregating, and exporting — pandas does it all in 5-10 lines. Every Python data scientist, ML engineer, quant analyst, and data analyst uses pandas daily.
Common gotchas
  • SettingWithCopyWarning: Modifying a slice may or may not affect the original — always use .loc.
  • Chained indexing: df[df.x > 0]["y"] = 5 may silently fail. Use df.loc[df.x > 0, "y"] = 5.
  • Memory hungry: Loads entire dataset into RAM — not great for 50+ GB files.
  • Integer vs object dtypes: A column with NaNs gets promoted to float.
  • Iterating rows: for row in df.iterrows() is slow — vectorize instead.
  • Inconsistent API: Some methods return views, others copies; some mutate in place, others don't.
Real-world examplesEvery Jupyter notebook in data science uses pandas. ETL pipelines load → clean → transform → export with pandas. Feature engineering for ML is almost always in pandas. Financial analysis, A/B test evaluation, log analysis, reporting dashboards. Libraries like scikit-learn, plotly, statsmodels, seaborn all accept DataFrames natively.
import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000],
    "dept": ["Eng", "Mkt", "Eng"]
})

# ── Explore ──
df.head()          # first 5 rows
df.info()          # types, non-null counts
df.describe()      # statistics

# ── Select ──
df["name"]                   # one column (Series)
df[["name", "age"]]          # multiple columns
df.loc[0]                     # row by label
df.iloc[0:2]                  # rows by position

# ── Filter (SQL WHERE) ──
df[df["age"] > 28]
df[(df["dept"] == "Eng") & (df["salary"] > 55000)]
# Use & not 'and', | not 'or'

# ── New columns ──
df["bonus"] = df["salary"] * 0.1

# ── GroupBy (SQL GROUP BY) ──
df.groupby("dept")["salary"].mean()

# ── Missing data ──
df.dropna()                # drop rows with NaN
df.fillna(0)               # replace NaN
df.isna().sum()            # count NaNs per column

# ── Merge (SQL JOIN) ──
orders = pd.DataFrame({"uid": [1,2], "product": ["A","B"]})
users = pd.DataFrame({"uid": [1,2], "name": ["Alice","Bob"]})
merged = pd.merge(orders, users, on="uid")

# ── Read/Write ──
# pd.read_csv("data.csv")
# df.to_csv("out.csv", index=False)

30 The ML Bridge — Everything Together

What is itThis section is the capstone that fuses everything — OOP, NumPy vectorization, iterators, closures, type hints, dataclasses, decorators — into a small end-to-end machine learning example (linear regression from scratch). It's meant to show how Python's language features come together in real ML code, and how Python became the undisputed king of ML because it blends fast C-backed numerics (NumPy, BLAS) with ergonomic high-level syntax for rapid iteration.
The Python ML stack
  • NumPy: Raw tensor/array math, the base layer.
  • pandas: Data loading, cleaning, feature engineering.
  • Matplotlib / Seaborn / Plotly: Visualization.
  • scikit-learn: Classical ML — linear models, trees, SVMs, clustering, preprocessing.
  • PyTorch / TensorFlow / JAX: Deep learning with GPU/TPU acceleration and autograd.
  • Hugging Face Transformers: Pre-trained LLMs, vision models, audio models.
  • MLflow / Weights & Biases: Experiment tracking and model registry.
  • FastAPI / BentoML: Model serving.
Why Python dominates ML
  • NumPy + BLAS: Fast C-level math with an easy Python interface.
  • Jupyter notebooks: Interactive, visual, and reproducible exploration.
  • Ecosystem: Decade-plus head start over all competitors.
  • Researcher-friendly syntax: Code reads like pseudocode.
  • Glue language: Easy to wrap C/C++/CUDA kernels (pybind11, Cython).
How it differs
  • vs R: R excels at statistics and visualization but loses on production deployment and deep learning.
  • vs Julia: Julia is designed for scientific computing, faster than Python, but its ecosystem is 1/10 the size.
  • vs C++ (Caffe, LibTorch): C++ is used for inference and embedded ML, but training code is almost always Python.
  • vs JavaScript (TensorFlow.js): For in-browser inference only; not for training.
  • vs Swift for TensorFlow: Google's experiment; now discontinued.
Common gotchas
  • Mixing NumPy + PyTorch tensors: Each has its own conventions; use .numpy() / torch.from_numpy to bridge.
  • Random seeds: Reproducibility requires seeding NumPy, Python's random, and PyTorch/TensorFlow separately.
  • Data leakage: Fitting preprocessing (e.g., StandardScaler) on the full dataset instead of just the train fold.
  • GPU memory: Keeping tensors on GPU and forgetting to detach gradients leads to out-of-memory.
  • Python speed vs GPU: The Python loop overhead doesn't matter once the GPU kernel is running.
Real-world examplesEvery major AI system you use — ChatGPT, Claude, Midjourney, Stable Diffusion, AlphaFold, DALL-E — was trained with Python. Netflix recommendations, Uber ETAs, Tesla autopilot training, Google Search ranking all use Python ML pipelines. Python is the language of modern AI.

This example uses almost every Python concept from this guide to build Linear Regression from scratch.

import numpy as np
import pandas as pd
from dataclasses import dataclass
from abc import ABC, abstractmethod
from functools import wraps
import time

# Decorator
def log_time(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        print(f"[{func.__name__}] {time.perf_counter()-start:.2f}s")
        return result
    return wrapper

# Dataclass for config
@dataclass
class Config:
    lr: float = 0.01
    epochs: int = 100
    reg: float = 0.001

# ABC for model interface
class BaseModel(ABC):
    def __init__(self, config: Config):
        self.config = config
        self.weights = None

    @abstractmethod
    def fit(self, X, y): ...

    @abstractmethod
    def predict(self, X): ...

    def __repr__(self):
        return f"{self.__class__.__name__}(lr={self.config.lr})"

# Concrete model
class LinearRegression(BaseModel):
    @log_time
    def fit(self, X, y):
        X_b = np.c_[np.ones(X.shape[0]), X]
        self.weights = np.zeros(X_b.shape[1])

        for _ in range(self.config.epochs):
            preds = X_b @ self.weights
            errors = preds - y
            grad = (2 / len(y)) * X_b.T @ errors
            self.weights -= self.config.lr * grad
        return self

    def predict(self, X):
        X_b = np.c_[np.ones(X.shape[0]), X]
        return X_b @ self.weights

# Generator for mini-batches
def batch_gen(X, y, size=32):
    idx = np.random.permutation(len(X))
    for i in range(0, len(X), size):
        yield X[idx[i:i+size]], y[idx[i:i+size]]

# Run it
if __name__ == "__main__":
    np.random.seed(42)
    X = np.random.randn(100, 3)
    true_w = np.array([2.0, -1.0, 0.5])
    y = X @ true_w + np.random.randn(100) * 0.1

    model = LinearRegression(Config(lr=0.01, epochs=1000))
    model.fit(X, y)

    print(f"Learned: {model.weights[1:]}")
    print(f"True:    {true_w}")

Concept → ML Usage Map

How every Python concept you learned maps directly to real machine learning workflows.

ConceptUsed in ML for
__init__Model hyperparameters, weights
Inheritance / ABCsBase model interfaces
DecoratorsTiming, caching, validation
GeneratorsData loading, mini-batches
DataclassesConfig objects, experiment tracking
NumPyALL numerical computation
PandasData loading, cleaning, features
Context managersGPU memory, file handling
ClosuresLoss functions, LR schedules
ComprehensionsFeature extraction, transforms

31 Real-World Project Structure

What is itPython has no single mandated project layout — the community has converged on a handful of conventions based on project type: single-script (one .py file), flat layout (modules at project root), src layout (code under src/ — now recommended), and framework-specific layouts (Django's manage.py + apps, Flask blueprints, FastAPI routers). Modern Python uses pyproject.toml for metadata, build config, and tooling settings — replacing the older setup.py, setup.cfg, requirements.txt sprawl.
Typical layout (src style)
  • pyproject.toml — project metadata, deps, tool configs (ruff, black, mypy, pytest).
  • README.md, LICENSE, .gitignore.
  • src/myapp/ — the installable package (modules, subpackages).
  • src/myapp/__init__.py — package marker and exports.
  • tests/ — pytest tests mirroring package structure.
  • docs/ — Sphinx or MkDocs documentation.
  • .github/workflows/ — CI pipelines.
  • .venv/ — virtual environment (gitignored).
Modern tooling
  • uv / poetry / hatch / pdm: Modern dependency/environment managers.
  • ruff: Ultra-fast linter + formatter (replaces flake8, isort, black for many projects).
  • mypy / pyright: Static type checking.
  • pytest: Test runner.
  • pre-commit: Git hooks to run linters before commit.
  • tox / nox: Test against multiple Python versions.
How it differs
  • vs Node.js: package.json + node_modules is much more standardized than Python historically was. Python is catching up with uv and pyproject.toml.
  • vs Java: Maven/Gradle enforce strict layouts (src/main/java). Python is more flexible but less predictable.
  • vs Go: Go has go.mod and a strict layout — much less to decide.
  • vs Rust: Cargo is the gold standard — unified tooling, predictable layouts. Python's uv is Rust-inspired.
Why it mattersA clean layout makes onboarding new contributors, running tests in CI, building and publishing to PyPI, and managing dependencies straightforward. The src/ layout in particular prevents "it works on my machine" bugs where tests import the local directory instead of the installed package.
Common gotchas
  • Importing from tests/: Forgetting to install the package in editable mode (pip install -e .).
  • Not pinning versions: requirements.txt without hashes leads to irreproducible builds.
  • Committing .venv: Large, platform-specific, pointless.
  • Overusing __init__.py: Side effects in init files create hidden imports.
  • Mixing flat and src layouts: Confuses tooling.
Real-world examplesFastAPI, Pydantic, httpx, Pandas, Polars, Poetry itself — all use a modern src/ layout with pyproject.toml. Django projects follow their own layout (manage.py, apps). Data science projects often use Cookiecutter Data Science template with data/, notebooks/, models/ dirs.

Basic Script Project

The simplest layout — just a script, dependencies, and a gitignore.

my-script/ ├── main.py # entry point ├── requirements.txt # pip dependencies └── .gitignore

Standard Package

The proper layout for reusable Python packages with src directory, tests, and config files.

my-project/ ├── src/ │ └── mypackage/ │ ├── __init__.py # package init, public API │ ├── core.py # main logic │ ├── models.py # data models / classes │ ├── utils.py # helper functions │ └── config.py # configuration ├── tests/ │ ├── __init__.py │ ├── test_core.py │ └── test_models.py ├── pyproject.toml # modern project config (replaces setup.py) ├── requirements.txt ├── .gitignore └── README.md

ML / Data Science Project

Organized layout for ML work — separating data, notebooks, source code, and trained models.

ml-project/ ├── data/ │ ├── raw/ # original, immutable data │ ├── processed/ # cleaned data │ └── external/ # third-party data ├── notebooks/ │ ├── 01_exploration.ipynb │ └── 02_modeling.ipynb ├── src/ │ └── ml_project/ │ ├── __init__.py │ ├── data/ │ │ ├── __init__.py │ │ ├── loader.py # data loading │ │ └── preprocess.py # cleaning, feature engineering │ ├── models/ │ │ ├── __init__.py │ │ ├── base.py # ABC for all models │ │ ├── linear.py │ │ └── tree.py │ ├── training/ │ │ ├── __init__.py │ │ ├── trainer.py # training loop │ │ └── evaluate.py # metrics │ └── config.py # dataclass configs ├── tests/ ├── models/ # saved model weights (.pkl, .pt) ├── pyproject.toml ├── requirements.txt └── .gitignore

Web API Project (FastAPI)

Production-ready API structure with routes, models, services, and database layers separated cleanly.

api-project/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI app entry │ ├── api/ │ │ ├── __init__.py │ │ ├── routes.py # endpoint definitions │ │ └── dependencies.py # shared deps (auth, db) │ ├── models/ │ │ ├── __init__.py │ │ ├── schemas.py # Pydantic models (request/response) │ │ └── database.py # ORM models │ ├── services/ │ │ ├── __init__.py │ │ └── user_service.py # business logic │ └── config.py ├── tests/ ├── Dockerfile ├── pyproject.toml └── .env # environment variables (NOT in git!)

Virtual Environments

Isolated Python environments per project — keeps dependencies separate so projects don't conflict.

# Create a virtual environment
python -m venv .venv

# Activate it
source .venv/bin/activate      # macOS/Linux
.venv\Scripts\activate         # Windows

# Install packages
pip install numpy pandas scikit-learn
pip freeze > requirements.txt  # save dependencies

# Recreate environment elsewhere
pip install -r requirements.txt

# Deactivate
deactivate

32 VS Code Setup

What is itVS Code (with Microsoft's official Python extension and Pylance language server) is now the most popular Python IDE, overtaking PyCharm in community surveys. It's free, open-source, lightweight, and its extension ecosystem covers everything: IntelliSense, type checking (Pylance uses Pyright internally), debugging, Jupyter notebooks, refactoring, Git, remote development, and Docker. Proper setup includes selecting the right interpreter, enabling format-on-save, integrating a linter, and configuring launch.json for debugging.
Essential extensions
  • Python (Microsoft): Core interpreter management, debugging, test runner.
  • Pylance: Fast type checking and IntelliSense (uses Pyright).
  • Ruff: Linting and formatting with the blazing-fast Rust-based linter.
  • Jupyter: Native notebook support inside VS Code.
  • Python Debugger: Enhanced debugging via debugpy.
  • Python Test Explorer: GUI for pytest/unittest.
  • autoDocstring: Auto-generates docstring templates.
  • Error Lens: Shows errors inline.
  • GitLens: Enhanced git integration.
  • Dev Containers / Remote SSH: Develop inside Docker or on remote machines.
Key settings
  • Interpreter: Cmd/Ctrl+Shift+P → Python: Select Interpreter — pick your venv.
  • Format on save: "editor.formatOnSave": true with Ruff or Black.
  • Type-check mode: "python.analysis.typeCheckingMode": "strict".
  • launch.json: Debug configurations for scripts, Django, FastAPI, pytest.
  • tasks.json: Run linters, tests, build commands with one keystroke.
  • Notebook support: Directly run .ipynb cells without opening Jupyter.
How it differs
  • vs PyCharm: PyCharm is a heavier, paid (Pro) IDE with deeper refactoring, built-in database tools, and scientific mode. VS Code is free, lighter, and better for polyglot projects.
  • vs Vim/Neovim: Much steeper learning curve but extreme customizability. Tools like jedi, coc.nvim, and LSP mode give similar features.
  • vs Jupyter Lab: Jupyter is notebook-native; VS Code handles both scripts and notebooks in one app.
  • vs Spyder: Spyder is science-focused (like MATLAB). VS Code is more general-purpose.
  • vs Cursor / Windsurf: AI-first forks of VS Code optimized for Copilot-style coding.
Why use itVS Code gives you almost PyCharm-level Python power for free, with a much smaller memory footprint and faster startup. The remote development features (SSH, containers, WSL, Codespaces) make it the best tool for ML/data work where you need to edit locally but run on a GPU box.
Common gotchas
  • Wrong interpreter: Linting errors because VS Code is using system Python, not your venv.
  • Pylance false positives: Dynamic code confuses the type checker. Use # type: ignore sparingly.
  • Slow on huge monorepos: Disable unnecessary extensions per-workspace.
  • Conflicting formatters: Only enable one (Ruff or Black, not both).
Real-world examplesNearly every Python developer — from FAANG engineers to ML researchers at OpenAI — uses VS Code. It's the default for GitHub Codespaces, Microsoft's Azure ML, AWS Cloud9 successor, and Gitpod. The Remote - Containers extension is the basis of the "dev containers" workflow used in production projects.

Must-Have Extensions

The core extensions every Python developer needs — install these first.

Python Must
ms-python.python
The core extension. IntelliSense, linting, debugging, Jupyter support, virtual env detection — everything starts here.
Pylance Must
ms-python.vscode-pylance
Fast, feature-rich language server. Type checking, auto-imports, semantic highlighting, go-to-definition. Way faster than the old Jedi.
Python Debugger Must
ms-python.debugpy
Step-through debugging, breakpoints, watch variables, call stack inspection. Installed automatically with Python extension.
Ruff Must
charliermarsh.ruff
Ultra-fast Python linter + formatter (replaces Flake8, isort, Black). Written in Rust, 100x faster. The modern standard.

Highly Recommended

Extensions that significantly boost productivity — you'll want these soon after starting.

Jupyter Rec
ms-toolsai.jupyter
Run .ipynb notebooks directly in VS Code. Interactive cells, variable explorer, plots inline. Essential for data science.
autoDocstring Rec
njpwerner.autodocstring
Type """ and hit Enter — auto-generates docstrings with params, return types, and descriptions from your function signature.
Python Indent Rec
KevinRose.vsc-python-indent
Fixes VS Code's auto-indent for Python. Correctly handles multi-line statements, brackets, and if/else blocks.
GitLens Rec
eamodio.gitlens
Inline git blame, file history, diff views. See who wrote each line and when. Invaluable in any team project.
Error Lens Rec
usernamehw.errorlens
Shows errors and warnings inline right next to the code. No more hovering or squinting at squiggly underlines.
indent-rainbow Rec
oderwat.indent-rainbow
Color-codes indentation levels. In Python where indentation IS syntax, this prevents so many bugs.

Nice to Have

Optional extras that add polish — install these when you want to fine-tune your workflow.

Python Type Stubs Opt
ms-python.python-type-stubs
Better type information for third-party libraries. Improves Pylance auto-complete for numpy, pandas, etc.
Python Test Explorer Opt
LittleFoxTeam.vscode-python-test-adapter
Visual test runner with tree view. Run individual tests with click, see pass/fail status at a glance.
Path Intellisense Opt
christian-kohler.path-intellisense
Autocompletes file paths in your code. No more typos in open("data/somefil.csv").
Better Comments Opt
aaron-bond.better-comments
Color-codes comments by type: ! alert, ? question, TODO, * highlight. Makes comments scannable.

Essential settings.json

Copy-paste these settings to get the best Python development experience out of the box.

// Add to your VS Code settings.json (Cmd+Shift+P → "Open User Settings JSON")
{
    // ── Python core ──
    "python.analysis.typeCheckingMode": "basic",      // catch type errors
    "python.analysis.autoImportCompletions": true,   // auto-import on autocomplete
    "python.analysis.inlayHints.functionReturnTypes": true,

    // ── Formatting (Ruff) ──
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.fixAll": "explicit",
            "source.organizeImports": "explicit"
        }
    },

    // ── Editor comfort ──
    "editor.rulers": [88],                         // line length guide
    "editor.bracketPairColorization.enabled": true,
    "editor.guides.bracketPairs": "active",
    "editor.stickyScroll.enabled": true,            // sticky class/function headers
    "files.autoSave": "afterDelay",

    // ── Testing ──
    "python.testing.pytestEnabled": true,
    "python.testing.pytestArgs": ["tests"]
}

Keyboard Shortcuts You Need

The shortcuts that will save you hours — learn these and you'll fly through code.

ShortcutAction
F5Start debugging
F9Toggle breakpoint
F12Go to definition
Shift+F12Find all references
Cmd+Shift+PCommand palette
Cmd+PQuick open file
Cmd+DSelect next occurrence
Cmd+Shift+LSelect all occurrences
Alt+Up/DownMove line up/down
Shift+Alt+Up/DownDuplicate line
Cmd+/Toggle comment
Cmd+Shift+KDelete line
Ctrl+`Toggle terminal
Cmd+BToggle sidebar
Cmd+Shift+EExplorer panel
Cmd+Shift+FSearch across files

33 Common Methods Cheatsheet

What is itA consolidated reference of the most-used built-in methods, functions, and stdlib helpers you'll reach for daily — grouped by type (strings, lists, dicts, sets, tuples, numbers, files, itertools, functools). Python's standard library is so vast that even experienced devs don't remember every method; this is your quick-lookup for "how do I reverse a list" or "how do I sort a dict by value". Keeping a mental index of these saves countless Google trips.
Why keep a cheatsheet
  • Muscle memory: The top 20% of methods cover 80% of real work — mastering them is a force multiplier.
  • Pythonic idioms: Knowing any(), all(), zip(), enumerate(), sorted(key=...) makes code 3× shorter.
  • Interview prep: Whiteboard questions expect you to know these cold.
  • Code review: Recognizing that a 5-line loop can be replaced with sum(x for x in nums if x > 0).
Common must-know methods
  • String: .strip(), .split(), .join(), .replace(), .startswith(), .format(), f-strings.
  • List: .append(), .extend(), .pop(), .sort(), sorted(), slicing, comprehensions.
  • Dict: .get(), .items(), .keys(), .values(), .update(), .setdefault().
  • Set: .add(), .remove(), |, &, -, ^.
  • Built-ins: len(), range(), enumerate(), zip(), map(), filter(), sorted(), reversed(), sum(), min(), max(), any(), all().
  • functools: reduce, partial, lru_cache, wraps.
  • itertools: chain, zip_longest, groupby, product, combinations, permutations.
How it differs
  • vs JavaScript: JS has similar methods on Array/Object but they're methods, not free functions. Python mixes both styles.
  • vs Ruby: Ruby everything-is-a-method; Python prefers free functions for some (e.g., len(obj) not obj.length).
  • vs Java: Java's Collections/Stream APIs are vastly more verbose.
Real-world examplesEvery Python script, notebook, and production app uses this core set of methods constantly. Mastering them is the difference between a 10-line script and a 40-line one.

Every method you'll reach for daily. Grouped by type so you can scan fast.

String Methods

s = "  Hello, World!  "

# ── Cleaning ──
s.strip()              # "Hello, World!"       remove whitespace both ends
s.lstrip()             # "Hello, World!  "     left only
s.rstrip()             # "  Hello, World!"     right only
s.strip("! ")          # "Hello, World"        strip specific chars

# ── Case ──
"hello".upper()        # "HELLO"
"HELLO".lower()        # "hello"
"hello world".title()  # "Hello World"
"hello world".capitalize()  # "Hello world"
"Hello".swapcase()     # "hELLO"

# ── Search ──
"hello".find("ll")      # 2         index of first match, -1 if not found
"hello".index("ll")     # 2         same but raises ValueError if not found
"hello".rfind("l")      # 3         search from the right
"hello".count("l")      # 2         count occurrences
"hello".startswith("he")  # True
"hello".endswith("lo")    # True

# ── Replace & Transform ──
"hello".replace("l", "L")       # "heLLo"    replace ALL occurrences
"hello".replace("l", "L", 1)    # "heLlo"    replace first N only
"a-b-c".split("-")              # ['a','b','c']
"a b  c".split()                # ['a','b','c']  split on any whitespace
"line1\nline2".splitlines()     # ['line1', 'line2']
"-".join(["a", "b", "c"])       # "a-b-c"
"hello".center(20, "*")        # "*******hello********"
"42".zfill(5)                   # "00042"    pad with zeros

# ── Checks ──
"42".isdigit()        # True      all digits?
"abc".isalpha()       # True      all letters?
"abc123".isalnum()    # True      letters or digits?
"   ".isspace()       # True      all whitespace?
"Hello".isupper()     # False
"hello".islower()     # True

# ── Encoding ──
"hello".encode("utf-8")   # b'hello'    str → bytes
b"hello".decode("utf-8")  # "hello"     bytes → str

List Methods

lst = [3, 1, 4, 1, 5, 9]

# ── Add ──
lst.append(2)            # [3,1,4,1,5,9,2]       add to end
lst.insert(0, 99)        # [99,3,1,4,1,5,9,2]    add at index
lst.extend([6, 7])       # [..., 6, 7]           add multiple

# ── Remove ──
lst.remove(1)            # remove FIRST occurrence of value 1
lst.pop()                # remove & return LAST element
lst.pop(0)               # remove & return at index 0
lst.clear()              # remove everything

# ── Sort & Order ──
lst.sort()               # in-place, ascending (returns None!)
lst.sort(reverse=True)  # in-place, descending
lst.sort(key=len)       # sort by custom key
lst.reverse()            # reverse in-place

# ── Search ──
lst.index(4)             # index of first occurrence (ValueError if missing)
lst.count(1)             # count occurrences

# ── Copy ──
lst.copy()               # shallow copy (same as lst[:])

# ── Built-in functions that work with lists ──
sorted(lst)              # NEW sorted list (original unchanged)
reversed(lst)            # iterator in reverse
len(lst)                 # length
sum(lst)                 # sum all elements
min(lst), max(lst)       # min and max
any([False,True])       # True   — any element truthy?
all([True,True])        # True   — all elements truthy?
enumerate(lst)           # pairs of (index, value)
zip(lst, other)          # pairs from two lists

Dictionary Methods

d = {"name": "Yatin", "age": 25, "role": "dev"}

# ── Access ──
d["name"]                      # "Yatin"    KeyError if missing
d.get("name")                   # "Yatin"    None if missing
d.get("email", "N/A")          # "N/A"      custom default

# ── Add / Update ──
d["email"] = "y@dev.com"       # add or overwrite
d.update({"age": 26, "city": "NYC"})  # merge in another dict
d.setdefault("lang", "Python")  # set only if key doesn't exist, return value

# ── Remove ──
d.pop("age")                    # remove & return value (KeyError if missing)
d.pop("age", None)              # safe pop — returns None if missing
d.popitem()                     # remove & return last (key, value) pair
del d["role"]                   # delete key
d.clear()                       # remove everything

# ── Views ──
d.keys()                        # dict_keys(['name', 'age', 'role'])
d.values()                      # dict_values(['Yatin', 25, 'dev'])
d.items()                       # dict_items([('name','Yatin'), ...])

# ── Copy & Merge ──
d.copy()                        # shallow copy
{**d, "new": True}             # merge via unpacking
d | {"new": True}              # merge operator (3.9+)

# ── Useful patterns ──
if "name" in d:                 # check key exists (O(1))
    print(d["name"])

# dict comprehension from two lists
keys = ["a", "b", "c"]
vals = [1, 2, 3]
dict(zip(keys, vals))            # {'a': 1, 'b': 2, 'c': 3}

Set Methods

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# ── Add / Remove ──
a.add(5)                   # add one element
a.update([6, 7])           # add multiple
a.remove(5)                # remove (KeyError if missing)
a.discard(99)              # remove (no error if missing)
a.pop()                    # remove & return arbitrary element

# ── Set Math ──
a | b                      # union         {1,2,3,4,5,6}
a.union(b)                 # same thing
a & b                      # intersection  {3, 4}
a.intersection(b)          # same thing
a - b                      # difference    {1, 2}
a.difference(b)            # same thing
a ^ b                      # symmetric diff {1,2,5,6}
a.symmetric_difference(b)  # same thing

# ── Checks ──
a.issubset(b)              # is a ⊆ b?
a.issuperset(b)            # is a ⊇ b?
a.isdisjoint(b)            # no common elements?
3 in a                     # membership test — O(1)!

Built-in Functions (Most Used)

# ── Type & Conversion ──
int("42")           # 42
float("3.14")       # 3.14
str(42)             # "42"
bool(0)             # False
list("abc")         # ['a', 'b', 'c']
tuple([1,2])        # (1, 2)
set([1,1,2])        # {1, 2}
dict(a=1, b=2)      # {'a': 1, 'b': 2}

# ── Math ──
abs(-5)             # 5
round(3.14159, 2)   # 3.14
pow(2, 10)           # 1024
divmod(17, 5)        # (3, 2)     quotient and remainder
min(3, 1, 4)         # 1
max(3, 1, 4)         # 4
sum([1, 2, 3])       # 6

# ── Iteration ──
range(5)             # 0,1,2,3,4
range(2, 10, 3)      # 2,5,8
enumerate(lst)       # (0,a), (1,b), ...
zip(a, b)            # (a1,b1), (a2,b2), ...
map(func, lst)       # apply func to each element
filter(func, lst)    # keep elements where func returns True
sorted(lst)          # new sorted list
reversed(lst)        # reverse iterator
next(iterator)       # get next value from iterator
iter(lst)            # get iterator from iterable

# ── Introspection ──
type(obj)            # what type is it?
isinstance(obj, int) # is it an int (or subclass)?
id(obj)              # memory address
dir(obj)             # list all attributes/methods
help(obj)            # interactive help
hasattr(obj, "x")   # does obj.x exist?
getattr(obj, "x")   # get obj.x (can set default)
setattr(obj, "x", 5) # set obj.x = 5
callable(obj)        # can you call obj()?
vars(obj)            # obj.__dict__

# ── I/O ──
print("hello", end="")   # no newline at end
print("a", "b", sep="-") # custom separator → "a-b"
input("Enter: ")         # read user input as string
open("f.txt")            # open file

# ── Logic ──
any([False,False,True])  # True   at least one truthy?
all([True,True,False])   # False  all truthy?

os & pathlib — File System

from pathlib import Path        # modern way (preferred)
import os

# ── pathlib (use this!) ──
p = Path("data/raw/file.csv")
p.exists()               # True/False
p.is_file()              # is it a file?
p.is_dir()               # is it a directory?
p.name                   # "file.csv"
p.stem                   # "file"
p.suffix                 # ".csv"
p.parent                 # Path("data/raw")
p.resolve()              # absolute path
p.read_text()            # read entire file as string
p.write_text("data")     # write string to file

# Directory operations
Path("output").mkdir(parents=True, exist_ok=True)
list(Path(".").glob("*.py"))         # all .py files
list(Path(".").rglob("*.py"))        # recursive

# Path joining (/ operator!)
config_path = Path("project") / "config" / "settings.json"

# ── os module (older but still useful) ──
os.getcwd()              # current working directory
os.listdir(".")          # list directory contents
os.path.join("a", "b")   # "a/b"  (use Path / instead)
os.path.exists("f.txt")  # True/False
os.environ["HOME"]       # environment variable
os.environ.get("API_KEY", "default")  # safe access

itertools — Iteration Power Tools

from itertools import (
    chain, islice, cycle, repeat, count,
    product, permutations, combinations,
    groupby, accumulate, starmap, zip_longest
)

# ── Combining ──
list(chain([1,2], [3,4]))            # [1,2,3,4]  flatten iterables
list(chain.from_iterable([[1,2],[3]]))  # [1,2,3]  flatten nested
list(zip_longest([1,2], ["a"], fillvalue="-"))
# [(1,'a'), (2,'-')]  — zip but pads shorter

# ── Slicing ──
list(islice(count(), 5))              # [0,1,2,3,4]  take first 5 from infinite
list(islice(count(), 2, 6))           # [2,3,4,5]    skip 2, take until 6

# ── Infinite ──
count(10)                             # 10, 11, 12, ... forever
cycle(["a", "b"])                    # a, b, a, b, ... forever
repeat("hi", 3)                      # "hi", "hi", "hi"

# ── Combinatorics ──
list(product([1,2], ["a","b"]))      # [(1,'a'),(1,'b'),(2,'a'),(2,'b')]
list(permutations([1,2,3], 2))       # [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]
list(combinations([1,2,3], 2))       # [(1,2),(1,3),(2,3)]

# ── Grouping ──
data = [("a",1), ("a",2), ("b",3)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a',1), ('a',2)]
# b [('b',3)]
# NOTE: data must be sorted by grouping key first!

# ── Accumulate ──
list(accumulate([1,2,3,4]))           # [1,3,6,10]  running sum
list(accumulate([1,2,3], lambda a,b: a*b))  # [1,2,6]  running product

collections — Specialized Containers

from collections import (
    Counter, defaultdict, OrderedDict,
    deque, namedtuple, ChainMap
)

# ── Counter ──
c = Counter("abracadabra")
c.most_common(3)            # [('a',5), ('b',2), ('r',2)]
c["a"]                       # 5
c["z"]                       # 0  (never KeyError!)
c.update("aaa")              # add more counts
Counter("aab") + Counter("bcc")  # Counter({'b':2,'a':2,'c':2})

# ── defaultdict ──
dd = defaultdict(list)      # missing keys auto-create empty list
dd["fruits"].append("apple")  # no KeyError!

dd = defaultdict(int)       # missing keys = 0
dd["count"] += 1

dd = defaultdict(set)       # missing keys = empty set
dd["tags"].add("python")

# ── deque (double-ended queue) ──
dq = deque([1, 2, 3])
dq.appendleft(0)            # deque([0,1,2,3])  O(1) left append!
dq.popleft()                # 0                 O(1) left pop!
dq.rotate(1)                # deque([3,1,2])    rotate right
dq.rotate(-1)               # deque([1,2,3])    rotate left
# deque is O(1) for both ends, list is O(n) for left operations

# ── namedtuple ──
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
p.x, p.y                    # 3, 4  (access by name)
p[0], p[1]                  # 3, 4  (still works by index)

# ── ChainMap — search multiple dicts ──
defaults = {"color": "blue", "size": 10}
user_prefs = {"color": "red"}
config = ChainMap(user_prefs, defaults)
config["color"]              # "red"   (user overrides default)
config["size"]               # 10      (falls through to default)

functools — Function Tools

from functools import (
    lru_cache, cache, partial, reduce, wraps, total_ordering
)

# ── lru_cache — memoization ──
@lru_cache(maxsize=128)
def fib(n):
    if n < 2: return n
    return fib(n-1) + fib(n-2)
fib(100)                     # instant! cached results
fib.cache_info()             # hits, misses, size
fib.cache_clear()            # reset cache

# @cache — same but unlimited (3.9+)

# ── partial — freeze some arguments ──
def power(base, exp): return base ** exp
square = partial(power, exp=2)
square(5)                    # 25

# ── reduce — accumulate ──
reduce(lambda a, b: a + b, [1,2,3,4])  # 10
reduce(lambda a, b: a * b, [1,2,3,4])  # 24

# ── wraps — preserve function metadata in decorators ──
def my_decorator(func):
    @wraps(func)        # without this, wrapper.__name__ = "wrapper"
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

# ── total_ordering — auto-generate comparison methods ──
@total_ordering
class Score:
    def __init__(self, val): self.val = val
    def __eq__(self, other): return self.val == other.val
    def __lt__(self, other): return self.val < other.val
# Now <=, >, >= all work automatically!

datetime — Date & Time

from datetime import datetime, date, timedelta

# ── Now ──
now = datetime.now()             # 2026-03-18 14:30:45.123456
today = date.today()             # 2026-03-18

# ── Create ──
dt = datetime(2026, 3, 18, 14, 30)
d = date(2026, 3, 18)

# ── Format & Parse ──
now.strftime("%Y-%m-%d %H:%M")   # "2026-03-18 14:30"
now.strftime("%B %d, %Y")        # "March 18, 2026"
datetime.strptime("2026-03-18", "%Y-%m-%d")  # parse string → datetime

# ── Arithmetic ──
tomorrow = today + timedelta(days=1)
last_week = today - timedelta(weeks=1)
diff = datetime(2026,12,31) - datetime.now()
print(diff.days)                 # days until new year

# ── Attributes ──
now.year, now.month, now.day
now.hour, now.minute, now.second
now.weekday()                    # 0=Monday, 6=Sunday
now.isoformat()                  # "2026-03-18T14:30:45.123456"

re — Regular Expressions

import re

text = "Call me at 123-456-7890 or 987-654-3210"

# ── Search (first match) ──
m = re.search(r"\d{3}-\d{3}-\d{4}", text)
if m:
    print(m.group())         # "123-456-7890"
    print(m.start(), m.end()) # 11 23

# ── Find all ──
phones = re.findall(r"\d{3}-\d{3}-\d{4}", text)
# ['123-456-7890', '987-654-3210']

# ── Replace ──
cleaned = re.sub(r"\d", "X", text)
# "Call me at XXX-XXX-XXXX or XXX-XXX-XXXX"

# ── Split ──
re.split(r"[,;\s]+", "a, b; c   d")
# ['a', 'b', 'c', 'd']

# ── Groups ──
m = re.search(r"(\d{3})-(\d{3})-(\d{4})", text)
m.group(1)                   # "123"  (area code)
m.groups()                    # ('123', '456', '7890')

# ── Named groups ──
m = re.search(r"(?P<area>\d{3})-(?P<rest>\d{3}-\d{4})", text)
m.group("area")              # "123"

# ── Compile for reuse ──
phone_pattern = re.compile(r"\d{3}-\d{3}-\d{4}")
phone_pattern.findall(text)  # faster when used multiple times

# ── Common patterns ──
# r"\d+"           digits
# r"\w+"           word chars (letters, digits, _)
# r"\s+"           whitespace
# r"[a-zA-Z]+"     letters only
# r"^...$"         start and end of string
# r"\.py$"         ends with .py
# r"(?i)hello"     case insensitive

json — JSON Encoding/Decoding

import json

# ── Python → JSON string ──
data = {"name": "Yatin", "scores": [90, 85], "active": True}
json_str = json.dumps(data)               # compact string
json_str = json.dumps(data, indent=2)     # pretty print
json_str = json.dumps(data, sort_keys=True) # sorted keys

# ── JSON string → Python ──
parsed = json.loads(json_str)              # dict

# ── File I/O ──
with open("data.json", "w") as f:
    json.dump(data, f, indent=2)          # write to file

with open("data.json") as f:
    loaded = json.load(f)                  # read from file

# dump/load = files,  dumps/loads = strings (s = string)

34 Interview Questions

What is itA curated set of the most-asked Python interview questions — from FAANG-style technical screens to startup pairing sessions. These questions test both language-level knowledge (GIL, MRO, decorators, generators, memory model) and practical problem-solving (implement a data structure, fix a bug, explain output). Expect questions ranging from "what's the difference between a list and tuple" to "explain how async/await works under the hood".
Topics you'll be asked
  • Language internals: GIL, reference counting, MRO, how is differs from ==, mutable vs immutable.
  • Data structures: When to use dict vs list, how sets work, hashability, deque.
  • OOP: __init__ vs __new__, classmethod vs staticmethod, properties, multiple inheritance.
  • Functions: Mutable default args, closures, decorators, *args/**kwargs.
  • Iteration: Generators vs iterators, yield from, lazy evaluation.
  • Concurrency: GIL, threading vs multiprocessing vs asyncio, when to use each.
  • Error handling: EAFP vs LBYL, exception chaining, custom exceptions.
  • Coding challenges: Reverse a linked list, balance parentheses, LRU cache, two-sum, binary search, tree traversal.
  • Debug output: "What does this code print?" gotcha questions on closures and mutable state.
How it differs
  • vs Java interviews: Java interviews emphasize JVM internals (GC, classloaders) and Spring. Python interviews focus on GIL, metaclasses, and duck typing.
  • vs JavaScript interviews: JS interviews dive into closures, this, event loop. Python overlaps on closures but adds GIL and comprehensions.
  • vs Go interviews: Go interviews emphasize goroutines, channels, and simplicity. Python focuses on flexibility and protocols.
How to prepare
  • Practice LeetCode in Python — focus on arrays, strings, trees, DP, graphs.
  • Build a small project with pytest, typing, and a web framework.
  • Read CPython docs for language internals.
  • Prepare "tell me about a bug you fixed" stories.
  • Know your stdlib: itertools, collections, functools, contextlib are frequent sources of "did you know this exists" questions.
Common gotcha questions
  • Mutable default args: "What does this function print after being called twice?"
  • Late binding in closures: "Why do all these lambdas return the same value?"
  • is vs ==: "Why does a is b return True for small ints but False for large ones?"
  • Dict ordering: "Does Python dict preserve insertion order?"
  • += on mutable in tuple: "t = ([1,2],); t[0] += [3] — does it raise? What's the state of t[0] after?"
Real-world relevanceThese questions appear routinely in backend, data engineering, ML engineer, and SRE interviews at companies like Google, Meta, Netflix, Dropbox, Stripe, Airbnb, and countless startups. Mastering them signals both language fluency and production experience.
Expand All

Fundamentals

What's the difference between a list and a tuple?

List is mutable — you can add, remove, change. Tuple is immutable — once created, can't change.

Tuples can be dict keys and in sets (lists can't). Tuples are slightly faster and use less memory.

lst = [1, 2, 3]
lst[0] = 99        # OK

tup = (1, 2, 3)
tup[0] = 99        # TypeError!
What is the difference between == and is?

== checks value equality. is checks if they're the same object in memory (same id()).

a = [1, 2]
b = [1, 2]
print(a == b)   # True  — same value
print(a is b)   # False — different objects

c = a
print(a is c)   # True  — same object

# Always use `is` for None:
if x is None:   # correct
What are *args and **kwargs?

*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict.

def f(*args, **kwargs):
    print(args)     # (1, 2, 3)
    print(kwargs)   # {'x': 10}

f(1, 2, 3, x=10)

Common use: wrapper functions that forward all arguments to another function.

What does if __name__ == "__main__" do?

When Python runs a file directly, __name__ is "__main__". When imported, it's the module name. This guard runs code only when executed directly.

def helper(): ...

if __name__ == "__main__":
    # Only with `python utils.py`, NOT on import
    helper()
Explain the mutable default argument trap

Default arguments are evaluated once at function definition, not each call. Mutable defaults are shared across calls.

# BUG:
def f(items=[]):
    items.append(1)
    return items
f()  # [1]
f()  # [1, 1]  — same list!

# FIX:
def f(items=None):
    if items is None: items = []
    items.append(1)
    return items

OOP & Classes

What is __init__ and why do we need it?

__init__ is the initializer. Called automatically after object creation to set up attributes. Without it you'd manually set every attribute — fragile and error-prone.

class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.is_active = True

__init__ is NOT the constructor — __new__ allocates memory. __init__ just fills in data.

What is self?

A reference to the current instance. obj.method() becomes Class.method(obj)self is that obj. It's how each instance knows which data belongs to it.

It's a convention — you could name it anything, but never do.

Difference between @classmethod, @staticmethod, and regular methods?

Regular — gets self (instance). @classmethod — gets cls (class), used for factories. @staticmethod — gets nothing, just a namespaced function.

class Date:
    def display(self):               # regular
        return f"{self.y}-{self.m}"

    @classmethod
    def from_string(cls, s):         # factory
        return cls(*s.split("-"))

    @staticmethod
    def is_valid(s):                 # utility
        return len(s.split("-")) == 3
What is MRO and how does Python resolve diamond inheritance?

MRO (Method Resolution Order) — the order Python searches for methods. Uses C3 linearization.

class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass

print(D.__mro__)  # D → B → C → A → object

A class always appears before its parents. Multiple parents keep left-to-right order. super() follows MRO, not just the immediate parent.

Advanced Concepts

What is a decorator and how does it work?

A function that wraps another function. @dec is sugar for func = dec(func).

def loud(func):
    def wrapper(*args, **kwargs):
        print("CALLING!")
        result = func(*args, **kwargs)
        print("DONE!")
        return result
    return wrapper

@loud
def greet(name):
    print(f"Hi {name}")

Always use @functools.wraps(func) to preserve the original name and docstring.

What is a generator? How is it different from a list?

Generators produce values lazily with yield. Lists store everything in memory. Generators are memory efficient but single-use.

# List: all in memory
[x**2 for x in range(10_000_000)]

# Generator: ~0 bytes
(x**2 for x in range(10_000_000))

def squares(n):
    for i in range(n):
        yield i ** 2
What is the GIL? Why does Python have it?

The Global Interpreter Lock — a mutex allowing only one thread to run Python bytecode at a time. Exists because CPython's reference counting isn't thread-safe.

Workarounds:

  • I/O-bound → threading or asyncio
  • CPU-bound → multiprocessing (separate processes)
  • C extensions (NumPy) release the GIL
What is a closure?

A function that remembers variables from its enclosing scope even after that scope finishes.

def make_counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

c = make_counter()
c()  # 1
c()  # 2 — remembers count!

Stored in c.__closure__. Decorators are a common use of closures.

Explain Python's memory management

1. Reference counting: Each object tracks how many names point to it. Drops to 0 → freed immediately.

2. Generational GC: Handles circular references. 3 generations — new objects checked more often.

import sys
a = [1, 2]
print(sys.getrefcount(a))  # 2
b = a                       # refcount = 3
del b                       # refcount = 2
del a                       # refcount = 0 → freed

Even int(0) takes 28 bytes. Use __slots__ to save memory on many instances.

What is a metaclass?

A class whose instances are classes. type is the default metaclass. class Foo: calls type("Foo", bases, namespace).

print(type(42))    # <class 'int'>
print(type(int))   # <class 'type'>
print(type(type))  # <class 'type'>

Rarely needed. __init_subclass__ and class decorators solve most cases more simply.

Data Structures & Algorithms

When would you use a dict vs a list?

List for ordered, sequential access. Dict for O(1) key lookup.

  • Lookup "user_123"? → Dict
  • Iterate in order? → List
  • Count things? → Counter
  • Unique values? → Set

Dict is O(1), list in is O(n). For 1M elements, dict is ~100,000x faster for lookups.

How does a Python dict work internally?

It's a hash table. hash(key) → bucket index → store key-value pair. Collisions use open addressing.

Keys must be hashable (immutable). Lists can't be keys. Average O(1) for get/set/delete.

Implement a function to reverse a linked list
def reverse(head):
    prev = None
    current = head
    while current:
        next_node = current.next
        current.next = prev
        prev = current
        current = next_node
    return prev

Three pointers. At each step, flip current → next to current → prev. O(n) time, O(1) space.

Pythonic & Practical

List comprehension vs map/filter?

Same result. Comprehensions are more Pythonic.

[x**2 for x in range(5)]           # preferred over map
[x for x in range(10) if x > 3]  # preferred over filter

Use map when you already have a named function: list(map(str, nums)).

What is a context manager? When do you use one?

Wraps code with setup/teardown via with. Implements __enter__/__exit__. Used for files, DB connections, locks, timing.

with open("f.txt") as f:
    data = f.read()
# auto-closed, even on exception

from contextlib import contextmanager

@contextmanager
def timer():
    start = time.time()
    yield
    print(f"Took {time.time()-start:.2f}s")
Shallow copy vs Deep copy?

Shallow — new outer object, shared inner objects. Deep — fully independent at every level.

import copy
original = [[1, 2], [3, 4]]
shallow = original.copy()
deep = copy.deepcopy(original)

original[0][0] = 999
shallow[0][0]  # 999 — shared!
deep[0][0]     # 1   — independent
How would you make a class iterable?

Implement __iter__ as a generator. Easiest way.

class Range:
    def __init__(self, start, end):
        self.start = start
        self.end = end

    def __iter__(self):
        current = self.start
        while current < self.end:
            yield current
            current += 1

for i in Range(1, 5):
    print(i)  # 1, 2, 3, 4
What is the difference between @property and a regular attribute?

@property lets you define computed attributes with getter/setter logic while keeping the obj.attr syntax. Useful for validation, caching, and derived values.

class Circle:
    def __init__(self, radius):
        self._radius = radius

    @property
    def radius(self):
        return self._radius

    @radius.setter
    def radius(self, val):
        if val < 0: raise ValueError("Negative")
        self._radius = val

    @property
    def area(self):  # computed on access
        return 3.14 * self._radius ** 2
What is duck typing?

"If it walks like a duck and quacks like a duck, it's a duck." Python doesn't care about an object's type — only that it has the right methods. You don't need to inherit from an interface; just implement the expected methods.

class Cat:
    def speak(self): return "Meow"
class Dog:
    def speak(self): return "Woof"

# No shared base class needed:
for a in [Cat(), Dog()]:
    print(a.speak())  # just works

Concurrency & Performance

Threading vs multiprocessing vs asyncio — when to use which?

Threading — I/O-bound tasks (API calls, file reads). GIL limits CPU parallelism but releases during I/O.

Multiprocessing — CPU-bound tasks (number crunching). Separate processes bypass the GIL entirely.

asyncio — Many concurrent I/O operations (web servers, scrapers). Single-threaded, event loop, very low overhead.

Rule of thumb: waiting on network? → asyncio. Crunching numbers? → multiprocessing. Simple I/O parallelism? → threading.

What is async/await and how does it work?

async def creates a coroutine. await pauses it until the awaited thing completes, letting other coroutines run. It's cooperative multitasking on a single thread.

import asyncio

async def fetch(url):
    await asyncio.sleep(1)  # yields control
    return f"data from {url}"

async def main():
    # Runs 3 fetches concurrently, not sequentially
    results = await asyncio.gather(
        fetch("a"), fetch("b"), fetch("c")
    )  # ~1s total, not ~3s
How would you optimize a slow Python function?

1. Profile firstcProfile, line_profiler, or timeit. Never guess.

2. Algorithm — O(n) vs O(n^2) matters more than any micro-optimization.

3. Data structures — dict/set for O(1) lookups instead of list scans.

4. Built-inssum(), map(), comprehensions are C-speed.

5. Caching@lru_cache for expensive repeated computations.

6. Vectorize — NumPy instead of Python loops for numerical work.

7. Last resort — C extensions, Cython, or multiprocessing.

What is a race condition? How do you prevent it in Python?

When two threads access shared state simultaneously and the result depends on timing. Even with the GIL, race conditions happen between bytecode instructions.

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    with lock:           # only one thread at a time
        counter += 1    # now safe

Prevention: Lock, RLock, Queue, or avoid shared state entirely (use multiprocessing).

Explain Python's garbage collection

Reference counting (primary): Each object tracks references. Drops to 0 → freed instantly.

Generational GC (cycle detector): Catches circular refs. 3 generations — gen0 (new, checked often), gen1, gen2 (old, checked rarely).

gc.collect() forces a collection. gc.disable() turns off the cycle detector (ref counting still works). weakref creates references that don't increase refcount.

Design & Architecture

What are SOLID principles? How do they apply to Python?

S — Single Responsibility. One class = one job.

O — Open/Closed. Open for extension, closed for modification. Use inheritance or composition.

L — Liskov Substitution. Subclasses should work anywhere the parent does.

I — Interface Segregation. Many small protocols, not one giant ABC.

D — Dependency Inversion. Depend on abstractions (Protocol/ABC), not concrete classes.

Python favors duck typing and Protocols over heavy interface hierarchies. SOLID applies but with a lighter touch.

When would you use composition over inheritance?

Almost always. Inheritance = "is-a" (Dog is an Animal). Composition = "has-a" (Car has an Engine). Composition is more flexible because you can swap components at runtime.

# Inheritance (rigid)
class ElectricCar(Car): ...

# Composition (flexible)
class Car:
    def __init__(self, engine):
        self.engine = engine  # swap gas/electric/hybrid

Use inheritance for true "is-a" relationships and when you need polymorphism. Use composition for everything else.

How do you structure a large Python project?

Separate concerns into packages: models/, services/, api/, utils/. Keep __init__.py files to control public APIs. Use pyproject.toml for project config.

Key principles: avoid circular imports (use dependency injection), keep modules focused, use absolute imports, write tests alongside code, use src/ layout for installable packages.

What are Abstract Base Classes vs Protocols?

ABC — nominal typing. You must class Foo(MyABC): explicitly. Enforced at instantiation.

Protocol — structural typing. Just implement the right methods. No inheritance needed. Checked by type checkers (mypy), not at runtime (unless @runtime_checkable).

from typing import Protocol

class Drawable(Protocol):
    def draw(self) -> str: ...

class Circle:  # no inheritance!
    def draw(self) -> str: return "O"

# Circle satisfies Drawable structurally

Prefer Protocols for Python — they match duck typing philosophy.

Explain the descriptor protocol

An object with __get__, __set__, or __delete__. It's the mechanism behind @property, @classmethod, @staticmethod, and even plain methods.

When you access obj.x and x is a descriptor on the class, Python calls x.__get__(obj, type(obj)) instead of returning x directly.

class Validated:
    def __set_name__(self, owner, name):
        self.name = name
    def __set__(self, obj, val):
        if val < 0: raise ValueError
        obj.__dict__[self.name] = val
    def __get__(self, obj, cls):
        return obj.__dict__.get(self.name)

Real-World & Debugging

How do you debug a Python application?

1. Read the traceback — bottom is where the error happened, top is what called it.

2. print() / f-strings for quick checks.

3. breakpoint() (Python 3.7+) drops into pdb debugger.

4. VS Code debugger — set breakpoints, inspect variables, step through code.

5. logging module for production — levels (DEBUG, INFO, WARNING, ERROR).

6. pdb.post_mortem() to debug after an exception.

# Quick debug:
breakpoint()  # drops into pdb

# In pdb:
# n (next), s (step into), c (continue)
# p variable (print), l (list code), q (quit)
What is the difference between a module, package, and library?

Module — a single .py file.

Package — a directory with __init__.py containing modules.

Library — a collection of packages distributed together (e.g., requests, numpy). Installed via pip.

Framework — a library that controls the flow (Django, Flask). You write code that the framework calls.

How do you handle circular imports?

Circular imports happen when A imports B and B imports A. Fixes:

1. Move the import inside the function that needs it (lazy import).

2. Restructure — extract shared code into a third module.

3. Use TYPE_CHECKING for type hints only:

from __future__ import annotations
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from .models import User  # only for type checkers
What is monkey patching?

Dynamically modifying a class or module at runtime. Possible because Python is dynamic, but use sparingly — makes code unpredictable.

import math
math.pi = 3  # monkey patched! (don't do this)

# Legitimate use: mocking in tests
from unittest.mock import patch

@patch("module.expensive_api_call")
def test_something(mock_call):
    mock_call.return_value = "fake"
Explain the difference between deepcopy, copy, and assignment

Assignment (b = a) — both names point to the same object. No copy at all.

Shallow copy (a.copy()) — new outer object, but inner objects are shared.

Deep copy (copy.deepcopy(a)) — everything is fully independent.

a = [[1]]
b = a            # same object
c = a.copy()     # new list, shared inner
d = copy.deepcopy(a)  # fully new

a[0].append(2)
# b[0] = [1,2]  — same object
# c[0] = [1,2]  — shared inner list
# d[0] = [1]    — independent
What are dataclasses and when do you use them?

@dataclass auto-generates __init__, __repr__, __eq__. Use when you have a class that's mostly data. Cleaner than writing boilerplate.

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

# Auto-generates __init__, __repr__, __eq__
# frozen=True for immutable, slots=True for memory

Use dataclass for data containers. Use regular class when you need heavy custom behavior. Use NamedTuple for immutable records.

Python Internals

How does Python's import system work?

1. Check sys.modules cache — if already imported, return cached version.

2. Search sys.path — list of directories to look in.

3. Found → execute the module file, store in sys.modules.

4. .pyc files (bytecode cache) in __pycache__/ speed up subsequent imports.

This is why the first import is slow and repeated imports are instant.

What is __slots__ and when should you use it?

By default, each instance has a __dict__ (~100 bytes). __slots__ replaces it with a fixed struct, saving ~60% memory per instance.

class Point:
    __slots__ = ("x", "y")
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Can't add arbitrary attributes:
# p.z = 3  → AttributeError

Use when creating millions of instances (particles, data points, graph nodes). Don't use everywhere — it limits flexibility.

What are dunder methods you should always implement?

__repr__ — always. Unambiguous string for debugging. Should ideally be valid Python.

__str__ — when you need a user-friendly display different from repr.

__eq__ — when equality should be value-based, not identity-based.

__hash__ — if you implement __eq__ and want instances as dict keys/set members.

__len__ — if your object has a meaningful size.

__iter__ — if your object is a collection.

__enter__/__exit__ — if your object manages a resource.

What happens when you type python script.py?

1. OS starts the CPython interpreter process.

2. Lexer tokenizes source code into tokens.

3. Parser builds an AST (Abstract Syntax Tree).

4. Compiler converts AST → bytecode (.pyc).

5. PVM (Python Virtual Machine) executes bytecode instruction by instruction.

You can inspect bytecode with dis.dis(func) and the AST with ast.parse(code).

What is the difference between iterators and iterables?

Iterable — has __iter__ that returns an iterator. Examples: list, str, dict, file.

Iterator — has __next__ that returns the next value (and raises StopIteration when done). Also has __iter__ returning itself.

You can iterate an iterable multiple times (fresh iterator each time). An iterator is consumed — single use.

lst = [1, 2, 3]    # iterable
it = iter(lst)      # iterator
next(it)             # 1
next(it)             # 2
What is the walrus operator and when is it useful?

:= assigns AND returns a value in one expression. Avoids computing something twice or needing a separate line.

# Without:
line = f.readline()
while line:
    process(line)
    line = f.readline()

# With walrus:
while (line := f.readline()):
    process(line)

# In comprehensions:
results = [y for x in data if (y := expensive(x)) > 0]

Tricky & Brain Teasers

Implement infinite currying: add(1)(2)(3)...(n) that returns the sum when printed

Python equivalent of JS infinite currying — use a class with __call__ and __repr__/__str__/__int__.

class add:
    def __init__(self, val):
        self.val = val

    def __call__(self, n):
        self.val += n
        return self          # return self for chaining

    def __repr__(self):
        return str(self.val)

    def __int__(self):
        return self.val

print(add(1)(2)(3))       # 6
print(add(5)(10)(15)(20))  # 50
print(int(add(1)(2)) + 3)  # 6

Alternate — pure function approach with closures:

def add(x):
    def inner(y):
        return add(x + y)
    inner.val = x
    inner.__repr__ = lambda: str(x)
    inner.__str__ = lambda: str(x)
    return inner

print(add(1)(2)(3))  # 6

Key concept: __call__ makes instances callable like functions. Returning self enables chaining.

What will this print? — The classic closure-in-loop trap
funcs = []
for i in range(5):
    funcs.append(lambda: i)

print([f() for f in funcs])
# Expected: [0, 1, 2, 3, 4]
# Actual:   [4, 4, 4, 4, 4]  😱

Why? Closures capture the variable, not its value. By the time you call them, the loop is done and i = 4.

Fix 1 — default argument (captures value at creation):

funcs = [lambda i=i: i for i in range(5)]
print([f() for f in funcs])  # [0, 1, 2, 3, 4]

Fix 2 — functools.partial:

from functools import partial
funcs = [partial(lambda i: i, i) for i in range(5)]

Same trap exists in JS — exact same concept, exact same fix pattern.

What's the output? — Integer caching gotcha
a = 256
b = 256
print(a is b)  # True

a = 257
b = 257
print(a is b)  # False (in REPL) ⚠️

Why? CPython pre-caches integers from -5 to 256. Same value in that range → same object. Outside → new objects each time.

# These are all the SAME object:
print(id(100) == id(100))  # True
print(id(300) == id(300))  # False (usually)

# String interning too:
a = "hello"
b = "hello"
print(a is b)  # True — Python interns short strings

a = "hello world!"
b = "hello world!"
print(a is b)  # False — not interned

Lesson: Never use is to compare values. Always use ==. is is only for None, True, False.

What will this print? — Class variable mutation trap
class Student:
    grades = []  # class variable — shared!

    def __init__(self, name):
        self.name = name

s1 = Student("Alice")
s2 = Student("Bob")

s1.grades.append(90)

print(s2.grades)  # [90]  😱 — Bob has Alice's grade!
print(Student.grades)  # [90] — it's shared

Why? grades = [] at class level creates ONE list shared by ALL instances. Mutating it via any instance changes it everywhere.

Fix — initialize in __init__:

class Student:
    def __init__(self, name):
        self.name = name
        self.grades = []  # each instance gets its own list

Gotcha within the gotcha: Reassignment (s1.grades = [90]) creates a new instance variable and doesn't affect others. Only mutation (.append) is shared.

Implement a function pipe: pipe(add1, double, square)(5) → square(double(add1(5)))

Compose multiple functions left-to-right — a common functional programming pattern.

from functools import reduce

def pipe(*fns):
    def inner(x):
        return reduce(lambda acc, f: f(acc), fns, x)
    return inner

# Usage:
add1   = lambda x: x + 1
double = lambda x: x * 2
square = lambda x: x ** 2

transform = pipe(add1, double, square)
print(transform(5))  # (5+1)*2 = 12, 12² = 144

Reverse (compose — right-to-left):

def compose(*fns):
    return pipe(*reversed(fns))
What's the output? — Tuple gotcha and chained comparison trick
# Trap 1: Single-element tuple
a = (1)
b = (1,)
print(type(a))  # <class 'int'>   — just parentheses!
print(type(b))  # <class 'tuple'> — the comma makes it

# Trap 2: Chained comparisons
print(1 < 2 < 3)    # True  — means (1<2) and (2<3)
print(1 < 2 > 0)    # True  — means (1<2) and (2>0)
print(True == 1 == 1.0)  # True!
print(False == 0 == 0.0) # True!

# Trap 3: This is NOT a tuple comparison
print( (0, 1) == 0, 1 )  # False 1  — it's print((0,1)==0, 1)

Why? Python chains comparisons implicitly. a < b < c is a < b and b < c. And True/False are subclasses of int (True == 1, False == 0).

What's the output? — for/else, try/else, while/else
for i in range(5):
    if i == 3:
        break
else:
    print("Completed!")

# Prints nothing! `else` runs only if loop didn't `break`

for i in range(5):
    if i == 99:
        break
else:
    print("Completed!")  # Prints! No break happened

Real use case — search pattern:

for item in items:
    if item.matches(query):
        result = item
        break
else:
    raise ValueError("Not found")  # only if no break

Think of else as "no break". Works on while too.

Implement a memoize decorator from scratch
def memoize(func):
    cache = {}
    def wrapper(*args):
        if args not in cache:
            cache[args] = func(*args)
        return cache[args]
    wrapper.cache = cache  # expose for inspection
    return wrapper

@memoize
def fib(n):
    if n < 2: return n
    return fib(n - 1) + fib(n - 2)

print(fib(100))  # instant — without memoize, heat death of universe
print(fib.cache)  # see all cached results

Production version: just use @functools.lru_cache(maxsize=128) — it handles kwargs, has max size, and is C-optimized.

What's the output? — Mutable objects as dict keys
# This works:
d = {(1, 2): "tuple key"}

# This crashes:
d = {[1, 2]: "list key"}  # TypeError: unhashable type: 'list'

# This is the EVIL gotcha:
class BadKey:
    def __init__(self, val):
        self.val = val
    def __hash__(self):
        return hash(self.val)
    def __eq__(self, other):
        return self.val == other.val

key = BadKey(1)
d = {key: "found"}
key.val = 2          # mutate the key!

print(d[key])         # KeyError! hash changed, can't find it
print(d[BadKey(1)])   # KeyError! right hash, but original key ≠ BadKey(1) now

Rule: Dict keys must be immutable OR their hash must never change. Mutating a key corrupts the entire dict bucket.

What's the output? — Name binding vs mutation
def f(x, lst):
    x = x + 1           # rebinds local x — original NOT affected
    lst.append(99)      # mutates the SAME list object

a = 10
b = [1, 2]
f(a, b)
print(a)  # 10  — unchanged
print(b)  # [1, 2, 99]  — changed!

Why? Python is "pass by object reference." Reassigning a name (x = ...) creates a new local binding. Mutating an object (lst.append) changes it everywhere.

# Another gotcha:
def g(lst):
    lst += [4]  # for lists, += is .extend() — MUTATES in place!

def h(tup):
    tup += (4,)  # for tuples, += creates a NEW tuple — no mutation

a = [1, 2, 3]
g(a)
print(a)  # [1, 2, 3, 4] — mutated!

b = (1, 2, 3)
h(b)
print(b)  # (1, 2, 3) — unchanged
Implement __missing__ — auto-creating nested dicts

Like JS's optional chaining but for auto-creating deeply nested structures.

class AutoDict(dict):
    def __missing__(self, key):
        self[key] = AutoDict()
        return self[key]

d = AutoDict()
d["a"]["b"]["c"] = 42  # no KeyError!
print(d)  # {'a': {'b': {'c': 42}}}

# Same thing with collections:
from collections import defaultdict

tree = lambda: defaultdict(tree)
d = tree()
d["a"]["b"]["c"] = 42  # works!

Key concept: __missing__ is called by dict.__getitem__ when a key doesn't exist. Only works with [] access, not .get().

What's the output? — Scope and the UnboundLocalError trap
x = 10

def foo():
    print(x)   # UnboundLocalError! 😱
    x = 20

foo()

Why? Python sees x = 20 anywhere in the function and marks x as local for the entire function. So the print(x) before the assignment references a local x that doesn't exist yet.

Fixes:

# Fix 1 — use global
def foo():
    global x
    print(x)
    x = 20

# Fix 2 — use a different name
def foo():
    print(x)
    y = 20   # doesn't shadow x

# Nested scope version — use nonlocal
def outer():
    x = 10
    def inner():
        nonlocal x
        x += 1
    inner()
    print(x)  # 11
What's the output? — Boolean arithmetic and short-circuit tricks
# True/False are ints!
print(True + True)      # 2
print(True * 10)        # 10
print(sum([True, False, True]))  # 2

# and/or return VALUES, not True/False
print(0 or "hello")     # "hello"
print("hi" and "bye")   # "bye"
print("" or "default")  # "default"
print(None or 0 or [] or "found")  # "found"

# Common pattern — default values before walrus operator:
name = user_input or "Anonymous"

How and/or actually work:

  • or returns the first truthy value, or the last value
  • and returns the first falsy value, or the last value
Flatten an arbitrarily nested list using recursion and generators
def flatten(lst):
    for item in lst:
        if isinstance(item, list):
            yield from flatten(item)  # recursive generator!
        else:
            yield item

nested = [1, [2, [3, 4], 5], [6, [7, [8]]]]
print(list(flatten(nested)))
# [1, 2, 3, 4, 5, 6, 7, 8]

Key concept: yield from delegates to another generator — flattens one level of generator nesting. Without it you'd need a loop + yield.

One-liner (for fixed depth):

# Only 1 level deep:
flat = [x for sub in nested for x in sub]
What's the output? — Multiple assignment and swap gotchas
# Python swap — no temp variable needed:
a, b = 1, 2
a, b = b, a
print(a, b)  # 2 1

# But watch this:
a = [1, 2, 3]
a[0], a[a[0]] = a[a[0]], a[0]
print(a)  # [2, 1, 3]?  or [2, 2, 3]? 🤔

# The unpacking trick:
a, *b, c = [1, 2, 3, 4, 5]
print(a)  # 1
print(b)  # [2, 3, 4]
print(c)  # 5

# Nested unpacking:
(a, b), c = [1, 2], 3
print(a, b, c)  # 1 2 3

# Swallowing with *_:
first, *_, last = range(100)
print(first, last)  # 0 99
Build a class that behaves like both a list and a callable
class FuncList:
    def __init__(self, *items):
        self._data = list(items)

    def __call__(self, *items):
        self._data.extend(items)
        return self

    def __getitem__(self, idx):
        return self._data[idx]

    def __len__(self):
        return len(self._data)

    def __repr__(self):
        return repr(self._data)

fl = FuncList(1, 2)(3, 4)(5)
print(fl)       # [1, 2, 3, 4, 5]
print(fl[2])    # 3
print(len(fl))   # 5

Key concept: Dunder methods let you make objects behave like any built-in type. __call__ = callable, __getitem__ = indexable, __len__ = sizable.

Next Guide FastAPI — The Complete Guide Routes, Pydantic, async, auth, databases, deployment — everything you need.