Python Data Structures


Why Performance Matters in Python Data Structures

When solving real-world problems or working with large datasets, choosing the right data structure can significantly improve performance and reduce memory usage.



Big-O Time Complexity of Python Built-in Data Structures

Operation List Tuple Set Dictionary
Indexing (x[i]) O(1) O(1) N/A O(1)
Append (x.append(v)) O(1) N/A O(1) O(1)
Insert/Delete (middle) O(n) N/A O(1) avg O(1) avg
Search (x in structure) O(n) O(n) O(1) avg O(1) avg
Iterate O(n) O(n) O(n) O(n)
Sort O(n log n) N/A N/A N/A

Common Data Structures in Python (With Best Use Cases)


List

  • Best for ordered collections where duplicates are allowed
  • Easy to add, remove, update elements
  • Not suitable for frequent searches in large data

Tuple

  • Immutable, lightweight, and hashable
  • Ideal for fixed data structures, function return values, and dictionary keys

Set

  • Unordered, no duplicates
  • Great for fast membership tests, unions, intersections, and difference

Dictionary

  • Key-value mapping
  • Ideal for fast lookups, JSON-like structures, and indexed data access


Performance Tips for Python Data Structures


1. Use List Comprehensions Over Loops

# Faster
squares = [x*x for x in range(100)]

# Slower
squares = []
for x in range(100):
    squares.append(x*x)

2. Prefer Sets for Membership Tests

nums = set([1, 2, 3, 4])
print(3 in nums)  # O(1) vs O(n) in list

3. Use defaultdict or Counter for Counting Items

from collections import Counter

words = ["apple", "banana", "apple"]
count = Counter(words)
print(count["apple"])  # 2

4. Avoid Repeated Concatenation in Loops

# Slow
s = ""
for word in ["a", "b", "c"]:
    s += word

# Fast
s = "".join(["a", "b", "c"])

5. Use Generators Instead of Lists When Possible

# Saves memory
def gen():
    for i in range(1000000):
        yield i

6. Use enumerate() Instead of Manual Indexing

for i, value in enumerate(["a", "b", "c"]):
    print(i, value)

7. Use zip() for Pairing Data

names = ["Alice", "Bob"]
scores = [90, 80]

for name, score in zip(names, scores):
    print(name, score)

Advanced Structures for High Performance

Structure Module Use Case
deque collections Fast append/pop from both ends
defaultdict collections Simplified dict with default values
Counter collections Counting hashable items
heapq heapq Priority queues / Min-heaps
namedtuple collections Lightweight object-like tuples

Summary of Optimization Tips

Tip Benefit
Use set for lookups O(1) average time
Use list comprehensions Cleaner & faster code
Use Counter and defaultdict Cleaner logic for counting
Avoid list resizing in loops Better time and space efficiency
Prefer tuple when immutability needed Faster and memory-efficient