Hello there

My current technology stack: .NET 9, Python, TypeScript, and Azure.

I develop microservices and terraform of different sizes. Sharing my challenges and key learning.

About

The views expressed in this blog are my own and do not reflect my employer's. I am not responsible for any consequences of using the information provided. This blog is for educational purposes only, not for commercial use. Readers should apply their own judgment.

I Used Python's `id()` to Visualize Memory Bias in My Pandas Pipelines — and It Was Shocking

April 20, 2025 Dipankar Haldar 0 people like this post 31 people viewed this post

📈 Overview

Most data scientists rely on Pandas daily, but few peek under the hood to understand when objects are copied and when they're reused. In this post, I trace memory identity using Python's built-in id() function to uncover how Pandas operations affect memory, performance, and object references.

🔧 Setup

import pandas as pd

# Initial DataFrame
df = pd.DataFrame({
    "x": [1, 2, 3],
    "y": ["a", "b", "c"]
})

print("Original DataFrame ID:", id(df))
print("Original Column IDs:", {col: id(df[col]) for col in df.columns})

🚀 The Memory Trail

Let's trace what happens when we apply common Pandas operations.

1. `.copy()` creates new DataFrame and Series objects

df2 = df.copy()
print("\nCopy DataFrame ID:", id(df2))
print("Copy Column IDs:", {col: id(df2[col]) for col in df2.columns})

Result:

DataFrame ID is new
All column IDs are new

2. `.assign()` may reuse untouched columns

df3 = df.assign(z=5)
print("\nAssign DataFrame ID:", id(df3))
print("Assign Column IDs:", {col: id(df3[col]) for col in df3.columns if col in df.columns})

Result:

DataFrame ID is new
Existing column IDs are sometimes preserved

3. Column slicing returns original Series (unless copied)

x_col = df['x']
print("\nOriginal x ID:", id(x_col))
print("x == df['x']:", x_col is df['x'])

4. `.apply()` and `.map()` often create new Series

mapped = df['x'].map(lambda x: x)
print("\nMapped x ID:", id(mapped))
print("Same as original?:", mapped is df['x'])

🔄 Summary Table

Operation	DataFrame ID	x Column ID	y Column ID
Original	10001	20001	20002
`copy()`	NEW	NEW	NEW
`assign()`	NEW	SAME	SAME
`map()`	N/A	NEW	N/A

(Color-coding and visuals optional depending on platform.)

🔍 Observations

Chained operations create multiple new objects
Column-level operations often preserve references unless modified
Mutating objects in place (e.g., df["x"] += 1) changes the object if not copied first
Copy-on-write (Pandas 3.x) will change this drastically

💡 Takeaways

If you're memory-constrained, trace your transformations!
Use id() to debug hidden object duplication
Be cautious when caching or reusing Series references across pipelines