Originally posted on towardsdatascience
Python is an object oriented programming language but can behave strangely. If you come from other OOP languages, this post may benefit you
In chapter 8 of Fluent Python, Luciano Ramalho discusses how python objects under the hood. Here will define the fundamental concept behind variable storage in python and explore some relevant notes.
Without further ado, let’s dive in.
1 — Python Variables are not Boxes
Tip: python variables are labels for a value.
In programming 101, we’re often taught that variables are boxes that store a value. For instance, the box
a stores the list
[1,2,3]. However, in python this is not the case.
In figure 1, variable names (blue) are “labels” for a location in memory. And note there can be multiple variable names per object, as shown on the left. Locations in memory (black) are the “box,” not the name itself. Those boxes store values (red). Let’s walk through an example.
var_name = 1var_name # 1 id(var_name) # 4443642160
In the above snippet, the variable
var_name is assigned the value of
1. The “box” that stores the value is the memory id
Now, let’s add another variable to the same location in memory…
another_var_name = var_nameid(var_name) # 4443642160 var_name # 1 id(var_name) == id(another_var_name) # True
As we can see, we have two python variables that reference the same object. However, as soon as we mutate our second variable, for example by running
y += 1 , our memory location changes to
4443642192, making the objects different.
In python, it’s more conceptually intuitive to think that variable names are assigned to locations in memory. And those locations in memory store values.
2 — Object Equality
== to determine if two variables have equal values and
is to determine if two variables share locations in memory.
With the setup in section 1, this section will be brief.
== determines if variables have the same value — it’s like the
.equals() function in java. However, if you are truly looking to determine if variables are the same object, you should use
is to test memory ids.
a = 1 b = 1 c = aa == b == c # True a is c # True
Now here’s an interesting side note. We’d expect
a is b to evaluate to
False. However, sometimes it’s
Amazingly, to save space python sometimes assigns variables to the same object if the object values are the same. This saves memory because, in our case, you don’t need to create separate instances of the integer
1 — you can just reference the same object.
Now as soon as we mutate any of these variables, it will get its own location in memory.
3 — Copy vs. Deep Copy
Tip: if you don’t know what you’re doing, use
If you’ve worked with pandas, you’ve probably seen and/or used
deepcopy only impacts you when working with compound objects (objects that contain other objects, like lists or class instances). But for some programming paradigms, compound objects are common.
Here’s the difference between a deep copy and shallow copy
- A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
- A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
An example should help clear this up…
from copy import copy, deepcopy class Bus: def __init__(self, passengers=None): if passengers is None: self.passengers =  else: self.passengers = list(passengers) def pick(self, name): self.passengers.append(name) def drop(self, name): self.passengers.remove(name) bus1 = Bus(['Alice', 'Bill', 'Claire', 'David']) bus2 = copy(bus1) bus3 = deepcopy(bus1) bus1.drop('Bill') bus1.passengers # ['Alice', 'Claire', 'David'] bus2.passengers # ['Alice', 'Claire', 'David'] bus3.passengers # ['Alice', 'Bill', 'Claire', 'David'] # Example 8.9 from Fluent Python
Bus class creates a list of passengers, thereby making each class instance a compound object.
As we can see,
bus2 is a shallow copy of
bus3 is a deep copy. When objects in
bus1 are changed, that will effect
bus3 is completely unaffected.
That’s why it’s safer to use
deepcopy — you will get a completely new copy of an object’s values.
4 — Function Parameters
Tip: if you set an empty mutable variable as a python function default, be careful.
Python parameters are passed using a system called call by sharing, which means that passed values are shallow copies.
If you pass an empty list as a default, for instance
def my_func(x=): and then mutate
x, you will be editing the built-in
list() functionality and not the variable
The recommended method in the book is to specify defaults at the beginning of the function through case logic…
def my_func(x): if x is None: x = 
Personally, I think the above method is overkill and isn’t great for style, however it’s your call.
5 — Garbage Collection
Tip: python variables are “garbage collected” when they run out of references.
For our final tip, let’s talk about how python garbage collection works.
Each time we assign a variable name to an object, we give it a reference. So
a=1; b=a means that the memory location storing
1 has a reference count of 2.
Now let’s say that we assign both
a=None; b=None. Now, because there are no variables referencing the object with a value
1, that memory location can be garbage collected.
If the reference count is 0, the memory location is wiped for reuse.
Pretty simple, right?
Now to cap this off, let’s also talk about the
del is used to delete variable names, not the object itself. By deleting the name of our variable, it frees up that name for later use and thereby avoids potential name conflicts. However, on the backend, if
del caused the reference count to be 0, then the object will be deleted as well.