Python Learning Guide: Data Structures & Algorithms Basics

Who this guide is for

  • Learners who already understand Python basics and OOP fundamentals
  • Developers who want to write faster and more scalable code
  • Anyone preparing for technical interviews or performance-aware coding tasks

What you’ll learn

  • How Python built-in data structures differ in behavior and performance
  • Big O basics for time and space complexity analysis
  • How basic sorting and searching algorithms work in Python
  • When to use built-in operations versus custom algorithm implementations
  • Common performance pitfalls and how to avoid them

Why this topic matters

Good code is not only correct; it is also efficient enough for the problem size. Data structures and algorithms help you reason about performance before your application becomes slow or expensive to run.

In Python, you often get excellent performance from built-ins, but you still need algorithmic thinking. Knowing complexity and trade-offs helps you make better choices when handling large datasets, repeated operations, or time-sensitive tasks.

Core concepts

Built-in data structures and their common complexity

Python gives you powerful defaults: list, tuple, set, and dict.

Quick intuition:

  • list: ordered, mutable, good for iteration and append
  • tuple: ordered, immutable, useful for fixed records
  • set: unique values, fast membership checks
  • dict: key-value mapping, fast key lookups

Common complexity patterns (average case):

  • list.append() -> $O(1)$
  • x in list -> $O(n)$
  • tuple[index] -> $O(1)$
  • x in set -> $O(1)$
  • dict[key] lookup -> $O(1)$
items = [10, 20, 30, 40]
lookup_set = set(items)

print(30 in items)      # linear scan
print(30 in lookup_set) # hash lookup

Big O notation: compare growth, not exact milliseconds

Big O describes how runtime or memory grows as input size increases.

Useful categories:

  • $O(1)$ constant
  • $O(log n)$ logarithmic
  • $O(n)$ linear
  • $O(n log n)$ efficient sorting class
  • $O(n^2)$ quadratic

Example intuition:

  • Linear search scans one by one -> $O(n)$
  • Binary search halves search space each step -> $O(log n)$ (requires sorted data)

Focus on growth trends; exact runtime depends on hardware and implementation details.

Space complexity quick examples:

  • Linear search uses constant extra memory -> $O(1)$ space
  • Merge-style approaches may allocate extra arrays -> often $O(n)$ space
  • In-place operations (when possible) can reduce memory overhead

Sorting and searching fundamentals

Python has built-in tools (sorted, .sort()) that are highly optimized. Learn manual algorithms for understanding and interviews, but prefer built-ins in production unless you have a special requirement.

Basic linear search:

def linear_search(values, target):
	for index, value in enumerate(values):
		if value == target:
			return index
	return -1

Basic binary search (sorted input required):

def binary_search(values, target):
	left, right = 0, len(values) - 1
	while left <= right:
		mid = (left + right) // 2
		if values[mid] == target:
			return mid
		if values[mid] < target:
			left = mid + 1
		else:
			right = mid - 1
	return -1

Step-by-step walkthrough

Step 1 — Choose the right data structure for the job

Start with problem intent, not syntax.

Use this simple decision model:

  • Need ordered mutable collection -> list
  • Need unique items / fast membership -> set
  • Need labeled fields by key -> dict
  • Need fixed immutable record -> tuple
emails = ["[email protected]", "[email protected]", "[email protected]"]
unique_emails = set(emails)
print(unique_emails)

Choosing set immediately removes duplicates with clean code.

Step 2 — Implement and test searching approaches

Use linear search for unsorted data or one-off checks. Use binary search for sorted collections with repeated lookups.

values = [3, 7, 11, 19, 25, 31]
print(linear_search(values, 19))
print(binary_search(values, 19))

Expected output:

3
3

Algorithm choice should match data shape and lookup frequency.

Step 3 — Compare sorting implementations

Learn simple manual sorting, then compare with Python built-ins.

def bubble_sort(values):
	arr = values[:]
	n = len(arr)
	for i in range(n):
		for j in range(0, n - i - 1):
			if arr[j] > arr[j + 1]:
				arr[j], arr[j + 1] = arr[j + 1], arr[j]
	return arr


data = [9, 4, 7, 1, 3]
print(bubble_sort(data))
print(sorted(data))

Expected output:

[1, 3, 4, 7, 9]
[1, 3, 4, 7, 9]

bubble_sort is educational ($O(n^2)$), while sorted() is practical and optimized.

Insertion sort (educational, often better than bubble sort on nearly sorted small lists):

def insertion_sort(values):
	arr = values[:]
	for i in range(1, len(arr)):
		key = arr[i]
		j = i - 1
		while j >= 0 and arr[j] > key:
			arr[j + 1] = arr[j]
			j -= 1
		arr[j + 1] = key
	return arr

Quicksort (divide-and-conquer concept):

def quicksort(values):
	if len(values) <= 1:
		return values
	pivot = values[len(values) // 2]
	left = [v for v in values if v < pivot]
	middle = [v for v in values if v == pivot]
	right = [v for v in values if v > pivot]
	return quicksort(left) + middle + quicksort(right)

Visual complexity cheat sheet:

| Complexity | Typical pattern | |—|—| | $O(1)$ | Hash lookup | | $O(log n)$ | Binary search | | $O(n)$ | Single pass scan | | $O(n log n)$ | Efficient comparison sort | | $O(n^2)$ | Nested loops (bubble/insertion worst-case) |

Memory perspective (space complexity) snapshot:

| Algorithm/Pattern | Typical extra space | |—|—| | Linear search | $O(1)$ | | Binary search (iterative) | $O(1)$ | | Bubble sort | $O(1)$ | | Insertion sort | $O(1)$ | | Quicksort (recursive, average) | $O(log n)$ stack |

Practical examples

Example 1 — Fast duplicate detection in large inputs

When checking repeated elements, set is usually the best first tool.

def has_duplicates(values):
	seen = set()
	for value in values:
		if value in seen:
			return True
		seen.add(value)
	return False


print(has_duplicates([1, 2, 3, 4]))
print(has_duplicates([1, 2, 3, 2]))

Expected output:

False
True

This pattern is common in validation, event processing, and data cleaning pipelines.

Example 2 — Frequency counting with dictionary

Counting occurrences is a core operation in analytics and logs.

def frequency_map(words):
	counts = {}
	for word in words:
		counts[word] = counts.get(word, 0) + 1
	return counts


tokens = ["python", "data", "python", "api", "data", "python"]
print(frequency_map(tokens))

Expected output:

{'python': 3, 'data': 2, 'api': 1}

You can later replace this with collections.Counter, but this version builds core understanding.

Common mistakes and how to avoid them

  • Using list for frequent membership checks -> Use set or dict keys for average $O(1)$ lookups.
  • Applying binary search to unsorted data -> Sort first or choose linear search.
  • Rewriting built-in algorithms without need -> Prefer sorted() and standard library tools unless requirements are special.
  • Ignoring input size in design decisions -> Estimate complexity before choosing an algorithm.
  • Optimizing too early without measurement -> Start with clear code, then profile bottlenecks.

Quick practice

  • Implement insertion_sort in Python and compare its output with sorted().
  • Write both linear and binary search, then test them on the same sorted list.
  • Build a script that counts word frequencies from a sentence and prints top 3 words.

Key takeaways

  • Data structure choice often gives bigger speed gains than micro-optimizations.
  • Big O helps you predict scalability and avoid poor growth patterns.
  • Understand basic algorithms for reasoning and interviews, but use Python built-ins in production by default.
  • Performance decisions should be informed by both complexity and real-world measurement.

Next step

Continue to Python Package Management & Ecosystem. In the next guide, you will learn modern tooling choices (pip, Poetry, PDM, uv, and others) and how to manage dependencies cleanly.

No Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.