Hello there
My current technology stack: .NET 9, Python, TypeScript, and Azure.
I develop microservices and terraform of different sizes. Sharing my challenges and key learning.
About
The views expressed in this blog are my own and do not reflect my employer's. I am not responsible for any consequences of using the information provided. This blog is for educational purposes only, not for commercial use. Readers should apply their own judgment.
I Used Python's `id()` to Visualize Memory Bias in My Pandas Pipelines — and It Was Shocking
📈 Overview
Most data scientists rely on Pandas daily, but few peek under the hood to understand when objects are copied and when they're reused. In this post, I trace memory identity using Python's built-in id() function to uncover how Pandas operations affect memory, performance, and object references.
🔧 Setup
import pandas as pd
# Initial DataFrame
df = pd.DataFrame({
"x": [1, 2, 3],
"y": ["a", "b", "c"]
})
print("Original DataFrame ID:", id(df))
print("Original Column IDs:", {col: id(df[col]) for col in df.columns})
🚀 The Memory Trail
Let's trace what happens when we apply common Pandas operations.
1. .copy() creates new DataFrame and Series objects
df2 = df.copy()
print("\nCopy DataFrame ID:", id(df2))
print("Copy Column IDs:", {col: id(df2[col]) for col in df2.columns})
Result:
- DataFrame ID is new
- All column IDs are new
2. .assign() may reuse untouched columns
df3 = df.assign(z=5)
print("\nAssign DataFrame ID:", id(df3))
print("Assign Column IDs:", {col: id(df3[col]) for col in df3.columns if col in df.columns})
Result:
- DataFrame ID is new
- Existing column IDs are sometimes preserved
3. Column slicing returns original Series (unless copied)
x_col = df['x']
print("\nOriginal x ID:", id(x_col))
print("x == df['x']:", x_col is df['x'])
4. .apply() and .map() often create new Series
mapped = df['x'].map(lambda x: x)
print("\nMapped x ID:", id(mapped))
print("Same as original?:", mapped is df['x'])
🔄 Summary Table
| Operation | DataFrame ID | x Column ID | y Column ID |
|---|---|---|---|
| Original | 10001 | 20001 | 20002 |
copy() |
NEW | NEW | NEW |
assign() |
NEW | SAME | SAME |
map() |
N/A | NEW | N/A |
(Color-coding and visuals optional depending on platform.)
🔍 Observations
- Chained operations create multiple new objects
- Column-level operations often preserve references unless modified
- Mutating objects in place (e.g.,
df["x"] += 1) changes the object if not copied first - Copy-on-write (Pandas 3.x) will change this drastically
💡 Takeaways
- If you're memory-constrained, trace your transformations!
- Use
id()to debug hidden object duplication - Be cautious when caching or reusing Series references across pipelines