Hello there

My current technology stack: .NET 9, Python, TypeScript, and Azure.

I develop microservices and terraform of different sizes. Sharing my challenges and key learning.

About

The views expressed in this blog are my own and do not reflect my employer's. I am not responsible for any consequences of using the information provided. This blog is for educational purposes only, not for commercial use. Readers should apply their own judgment.

I Used Python's `id()` to Visualize Memory Bias in My Pandas Pipelines — and It Was Shocking

April 20, 2025 Dipankar Haldar 31 people viewed this post

📈 Overview

Most data scientists rely on Pandas daily, but few peek under the hood to understand when objects are copied and when they're reused. In this post, I trace memory identity using Python's built-in id() function to uncover how Pandas operations affect memory, performance, and object references.

🔧 Setup

import pandas as pd

# Initial DataFrame
df = pd.DataFrame({
    "x": [1, 2, 3],
    "y": ["a", "b", "c"]
})

print("Original DataFrame ID:", id(df))
print("Original Column IDs:", {col: id(df[col]) for col in df.columns})

🚀 The Memory Trail

Let's trace what happens when we apply common Pandas operations.

1. .copy() creates new DataFrame and Series objects
df2 = df.copy()
print("\nCopy DataFrame ID:", id(df2))
print("Copy Column IDs:", {col: id(df2[col]) for col in df2.columns})

Result:

  • DataFrame ID is new
  • All column IDs are new
2. .assign() may reuse untouched columns
df3 = df.assign(z=5)
print("\nAssign DataFrame ID:", id(df3))
print("Assign Column IDs:", {col: id(df3[col]) for col in df3.columns if col in df.columns})

Result:

  • DataFrame ID is new
  • Existing column IDs are sometimes preserved
3. Column slicing returns original Series (unless copied)
x_col = df['x']
print("\nOriginal x ID:", id(x_col))
print("x == df['x']:", x_col is df['x'])
4. .apply() and .map() often create new Series
mapped = df['x'].map(lambda x: x)
print("\nMapped x ID:", id(mapped))
print("Same as original?:", mapped is df['x'])

🔄 Summary Table

Operation DataFrame ID x Column ID y Column ID
Original 10001 20001 20002
copy() NEW NEW NEW
assign() NEW SAME SAME
map() N/A NEW N/A

(Color-coding and visuals optional depending on platform.)

🔍 Observations

  • Chained operations create multiple new objects
  • Column-level operations often preserve references unless modified
  • Mutating objects in place (e.g., df["x"] += 1) changes the object if not copied first
  • Copy-on-write (Pandas 3.x) will change this drastically

💡 Takeaways

  • If you're memory-constrained, trace your transformations!
  • Use id() to debug hidden object duplication
  • Be cautious when caching or reusing Series references across pipelines