Who this guide is for
- Learners who already understand Python basics and OOP fundamentals
- Developers who want to write faster and more scalable code
- Anyone preparing for technical interviews or performance-aware coding tasks
What you’ll learn
- How Python built-in data structures differ in behavior and performance
- Big O basics for time and space complexity analysis
- How basic sorting and searching algorithms work in Python
- When to use built-in operations versus custom algorithm implementations
- Common performance pitfalls and how to avoid them
Why this topic matters
Good code is not only correct; it is also efficient enough for the problem size. Data structures and algorithms help you reason about performance before your application becomes slow or expensive to run.
In Python, you often get excellent performance from built-ins, but you still need algorithmic thinking. Knowing complexity and trade-offs helps you make better choices when handling large datasets, repeated operations, or time-sensitive tasks.
Core concepts
Built-in data structures and their common complexity
Python gives you powerful defaults: list, tuple, set, and dict.
Quick intuition:
list: ordered, mutable, good for iteration and appendtuple: ordered, immutable, useful for fixed recordsset: unique values, fast membership checksdict: key-value mapping, fast key lookups
Common complexity patterns (average case):
list.append()-> $O(1)$x in list-> $O(n)$tuple[index]-> $O(1)$x in set-> $O(1)$dict[key]lookup -> $O(1)$
items = [10, 20, 30, 40]
lookup_set = set(items)
print(30 in items) # linear scan
print(30 in lookup_set) # hash lookup
Big O notation: compare growth, not exact milliseconds
Big O describes how runtime or memory grows as input size increases.
Useful categories:
- $O(1)$ constant
- $O(log n)$ logarithmic
- $O(n)$ linear
- $O(n log n)$ efficient sorting class
- $O(n^2)$ quadratic
Example intuition:
- Linear search scans one by one -> $O(n)$
- Binary search halves search space each step -> $O(log n)$ (requires sorted data)
Focus on growth trends; exact runtime depends on hardware and implementation details.
Space complexity quick examples:
- Linear search uses constant extra memory -> $O(1)$ space
- Merge-style approaches may allocate extra arrays -> often $O(n)$ space
- In-place operations (when possible) can reduce memory overhead
Sorting and searching fundamentals
Python has built-in tools (sorted, .sort()) that are highly optimized. Learn manual algorithms for understanding and interviews, but prefer built-ins in production unless you have a special requirement.
Basic linear search:
def linear_search(values, target):
for index, value in enumerate(values):
if value == target:
return index
return -1
Basic binary search (sorted input required):
def binary_search(values, target):
left, right = 0, len(values) - 1
while left <= right:
mid = (left + right) // 2
if values[mid] == target:
return mid
if values[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Step-by-step walkthrough
Step 1 — Choose the right data structure for the job
Start with problem intent, not syntax.
Use this simple decision model:
- Need ordered mutable collection ->
list - Need unique items / fast membership ->
set - Need labeled fields by key ->
dict - Need fixed immutable record ->
tuple
emails = ["[email protected]", "[email protected]", "[email protected]"]
unique_emails = set(emails)
print(unique_emails)
Choosing set immediately removes duplicates with clean code.
Step 2 — Implement and test searching approaches
Use linear search for unsorted data or one-off checks. Use binary search for sorted collections with repeated lookups.
values = [3, 7, 11, 19, 25, 31]
print(linear_search(values, 19))
print(binary_search(values, 19))
Expected output:
3
3
Algorithm choice should match data shape and lookup frequency.
Step 3 — Compare sorting implementations
Learn simple manual sorting, then compare with Python built-ins.
def bubble_sort(values):
arr = values[:]
n = len(arr)
for i in range(n):
for j in range(0, n - i - 1):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
return arr
data = [9, 4, 7, 1, 3]
print(bubble_sort(data))
print(sorted(data))
Expected output:
[1, 3, 4, 7, 9]
[1, 3, 4, 7, 9]
bubble_sort is educational ($O(n^2)$), while sorted() is practical and optimized.
Insertion sort (educational, often better than bubble sort on nearly sorted small lists):
def insertion_sort(values):
arr = values[:]
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
Quicksort (divide-and-conquer concept):
def quicksort(values):
if len(values) <= 1:
return values
pivot = values[len(values) // 2]
left = [v for v in values if v < pivot]
middle = [v for v in values if v == pivot]
right = [v for v in values if v > pivot]
return quicksort(left) + middle + quicksort(right)
Visual complexity cheat sheet:
| Complexity | Typical pattern | |—|—| | $O(1)$ | Hash lookup | | $O(log n)$ | Binary search | | $O(n)$ | Single pass scan | | $O(n log n)$ | Efficient comparison sort | | $O(n^2)$ | Nested loops (bubble/insertion worst-case) |
Memory perspective (space complexity) snapshot:
| Algorithm/Pattern | Typical extra space | |—|—| | Linear search | $O(1)$ | | Binary search (iterative) | $O(1)$ | | Bubble sort | $O(1)$ | | Insertion sort | $O(1)$ | | Quicksort (recursive, average) | $O(log n)$ stack |
Practical examples
Example 1 — Fast duplicate detection in large inputs
When checking repeated elements, set is usually the best first tool.
def has_duplicates(values):
seen = set()
for value in values:
if value in seen:
return True
seen.add(value)
return False
print(has_duplicates([1, 2, 3, 4]))
print(has_duplicates([1, 2, 3, 2]))
Expected output:
False
True
This pattern is common in validation, event processing, and data cleaning pipelines.
Example 2 — Frequency counting with dictionary
Counting occurrences is a core operation in analytics and logs.
def frequency_map(words):
counts = {}
for word in words:
counts[word] = counts.get(word, 0) + 1
return counts
tokens = ["python", "data", "python", "api", "data", "python"]
print(frequency_map(tokens))
Expected output:
{'python': 3, 'data': 2, 'api': 1}
You can later replace this with collections.Counter, but this version builds core understanding.
Common mistakes and how to avoid them
- Using
listfor frequent membership checks -> Usesetordictkeys for average $O(1)$ lookups. - Applying binary search to unsorted data -> Sort first or choose linear search.
- Rewriting built-in algorithms without need -> Prefer
sorted()and standard library tools unless requirements are special. - Ignoring input size in design decisions -> Estimate complexity before choosing an algorithm.
- Optimizing too early without measurement -> Start with clear code, then profile bottlenecks.
Quick practice
- Implement
insertion_sortin Python and compare its output withsorted(). - Write both linear and binary search, then test them on the same sorted list.
- Build a script that counts word frequencies from a sentence and prints top 3 words.
Key takeaways
- Data structure choice often gives bigger speed gains than micro-optimizations.
- Big O helps you predict scalability and avoid poor growth patterns.
- Understand basic algorithms for reasoning and interviews, but use Python built-ins in production by default.
- Performance decisions should be informed by both complexity and real-world measurement.
Next step
Continue to Python Package Management & Ecosystem. In the next guide, you will learn modern tooling choices (pip, Poetry, PDM, uv, and others) and how to manage dependencies cleanly.
No Comments