5 Advanced Tips on Python Objects

5 Advanced Tips on Python Objects

Originally posted on towardsdatascience

Python is an object oriented programming language but can behave strangely. If you come from other OOP languages, this post may benefit you

In chapter 8 of Fluent Python, Luciano Ramalho discusses how python objects under the hood. Here will define the fundamental concept behind variable storage in python and explore some relevant notes.

Without further ado, let’s dive in.

1 — Python Variables are not Boxes

Tip: python variables are labels for a value.

In programming 101, we’re often taught that variables are boxes that store a value. For instance, the box a stores the list [1,2,3]. However, in python this is not the case.

In figure 1, variable names (blue) are “labels” for a location in memory. And note there can be multiple variable names per object, as shown on the left. Locations in memory (black) are the “box,” not the name itself. Those boxes store values (red). Let’s walk through an example.

var_name = 1var_name         # 1
id(var_name)     # 4443642160

In the above snippet, the variable var_name is assigned the value of 1. The “box” that stores the value is the memory id 4443642160.

Now, let’s add another variable to the same location in memory…

another_var_name = var_nameid(var_name)                            # 4443642160
var_name                                # 1
id(var_name) == id(another_var_name)    # True

As we can see, we have two python variables that reference the same object. However, as soon as we mutate our second variable, for example by running y += 1 , our memory location changes to 4443642192, making the objects different.

In python, it’s more conceptually intuitive to think that variable names are assigned to locations in memory. And those locations in memory store values.

2 — Object Equality

Tip: use == to determine if two variables have equal values and is to determine if two variables share locations in memory.

With the setup in section 1, this section will be brief.

== determines if variables have the same value — it’s like the .equals() function in java. However, if you are truly looking to determine if variables are the same object, you should use is to test memory ids.

a = 1
b = 1
c = aa == b == c      # True
a is c           # True

Now here’s an interesting side note. We’d expect a is b to evaluate to False. However, sometimes it’s True.

Amazingly, to save space python sometimes assigns variables to the same object if the object values are the same. This saves memory because, in our case, you don’t need to create separate instances of the integer 1 — you can just reference the same object.

Now as soon as we mutate any of these variables, it will get its own location in memory.

3 — Copy vs. Deep Copy

Tip: if you don’t know what you’re doing, use deepcopy().

If you’ve worked with pandas, you’ve probably seen and/or used df.copy(). Using copy vs. deepcopy only impacts you when working with compound objects (objects that contain other objects, like lists or class instances). But for some programming paradigms, compound objects are common.

Here’s the difference between a deep copy and shallow copy

  • deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
  • shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

An example should help clear this up…

from copy import copy, deepcopy

class Bus:
  def __init__(self, passengers=None): 
    if passengers is None:
      self.passengers = [] 
    else:
      self.passengers = list(passengers) 

  def pick(self, name):
    self.passengers.append(name) 

  def drop(self, name):
    self.passengers.remove(name)
    
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy(bus1)
bus3 = deepcopy(bus1)

bus1.drop('Bill')

bus1.passengers       # ['Alice', 'Claire', 'David']
bus2.passengers       # ['Alice', 'Claire', 'David']
bus3.passengers       # ['Alice', 'Bill', 'Claire', 'David']

# Example 8.9 from Fluent Python

The Bus class creates a list of passengers, thereby making each class instance a compound object.

As we can see, bus2 is a shallow copy of bus1 whereas bus3 is a deep copy. When objects in bus1 are changed, that will effect bus2. However, bus3 is completely unaffected.

That’s why it’s safer to use deepcopy — you will get a completely new copy of an object’s values.

4 — Function Parameters

Tip: if you set an empty mutable variable as a python function default, be careful.

Python parameters are passed using a system called call by sharing, which means that passed values are shallow copies.

If you pass an empty list as a default, for instance def my_func(x=[]): and then mutate x, you will be editing the built-in list() functionality and not the variable x.

The recommended method in the book is to specify defaults at the beginning of the function through case logic…

def my_func(x):
  if x is None:
    x = []

Personally, I think the above method is overkill and isn’t great for style, however it’s your call.

5 — Garbage Collection

Tip: python variables are “garbage collected” when they run out of references.

For our final tip, let’s talk about how python garbage collection works.

Each time we assign a variable name to an object, we give it a reference. So a=1; b=a means that the memory location storing 1 has a reference count of 2.

Now let’s say that we assign both a and b to None e.g. a=None; b=None. Now, because there are no variables referencing the object with a value 1, that memory location can be garbage collected.

If the reference count is 0, the memory location is wiped for reuse.

Pretty simple, right?

Now to cap this off, let’s also talk about the del keyword. del is used to delete variable names, not the object itself. By deleting the name of our variable, it frees up that name for later use and thereby avoids potential name conflicts. However, on the backend, if del caused the reference count to be 0, then the object will be deleted as well.

Source: towardsdatascience