Understand __slots__ in Python

Understand __slots__ in Python

Originally posted on towardsdatascience.

A simple way to improve your Python code

When we create an object from a class, the attributes of the object will be stored in a dictionary called __dict__. We use this dictionary to get and set attributes. It allows us to dynamically create new attributes after the creation of the object.

Let’s create a simple class Article that initially has 2 attributes date and writer. If we print out __dict__ of the object, we will get the key and value of each attribute. Meanwhile, we also print __dict__ of the class which will be needed later. After that, a new attribute reviewer is added to the object, and we can see it in the updated __dict__.

class Article:
    def __init__(self, date, writer):
        self.date = date
        self.writer = writer
article = Article("2020-06-01","xiaoxu")
print(article.__dict__)
# {'date': '2020-06-01', 'writer': 'xiaoxu'}
print(Article.__dict__)
# {'__module__': '__main__', '__init__': <function Article.__init__ at 0x10d28f0e0>, 
#  '__dict__': <attribute '__dict__' of 'Article' objects>, '__weakref__': <attribute '__weakref__' of 'Article' objects>, 
#  '__doc__': None}
article.reviewer = "jojo"
print(article.__dict__)
# {'date': '2020-06-01', 'writer': 'xiaoxu', 'reviewer': 'jojo'}
print(article.reviewer)
# jojo

Good enough?

Well, we can’t say this is bad until we find a better solution. Dictionary is very powerful in Python, but when it comes to creating thousands or millions of objects, we might face some issues:

  1. Dictionary needs memory. Millions of objects will definitely eat up the RAM usage.
  2. Dictionary is in fact a hash map. The worst case of the time complexity of get/set in a hash mapis O(n).

__slots__ solution

From Python documentation: __slots__ allows us to explicitly declare data members (like properties) and deny the creation of __dict__ and __weakref__ (unless explicitly declared in __slots__ or available in a parent.)

So, how does it relate to the issues I’ve mentioned?

Let’s create a class ArticleWithSlots. The only difference between 2 classes is the extra field __slots__.

class ArticleWithSlots:
    __slots__ = ["date", "writer"]

    def __init__(self, date, writer):
        self.date = date
        self.writer = writer

__slots__ is created on the class level, which means if we print ArticleWithSlots.__dict__, we should be able to see it. Besides, we also see 2 extra attributes on the class level, date: <member 'date' ..> and writer: <member 'writer' ..>, which belong to class member_descriptor.

print(ArticleWithSlots.__dict__)
# {'__module__': '__main__', '__slots__': ['date', 'writer'], '__init__': <function ArticleWithSlots.__init__ at 0x103f6c290>, 
# 'date': <member 'date' of 'ArticleWithSlots' objects>, 'writer': <member 'writer' of 'ArticleWithSlots' objects>, 
#  '__doc__': None}
print(ArticleWithSlots.date.__class__)
# <class 'member_descriptor'>

What is a descriptor in Python?

Before we talk about descriptor, we should understand the default behaviour of accessing attributes in Python. When you do article.writer, Python will call the method __getattribute__(), where it does a look up in __dict__self.__dict__["writer"] and return the value.

If the look up key is an object with one of the descriptor methods, then the default behaviour will be overwritten by the descriptor method.

screenshot of ArticleWithSlots class

Descriptor methods include __get__()__set__() and __delete__(). And a descriptor is simply a Python object that implements at least one descriptor methods.

__slots__ automatically creates a descriptor for each attribute with the implementation of descriptor methods. You can find them in the screenshot. It means that the object will use __get__()__set__() and __delete__() to interact with attributes instead of the default behavior.

According to Guido van Rossum, the implementation of __get__()__set__() uses an array instead of the dictionary and it’s entirely implemented in C which is highly efficient.

__slots__ has faster attribute access

In the following code, I compare the object creation time and attribute access time of Article and ArticleWithSlots__slots__ is around 10% faster.

@Timer()
def create_object(cls, size):
    for _ in range(size):
        article = cls("2020-01-01", "xiaoxu")

create_object(Article, 1000000)
# 0.755430193 seconds
create_object(ArticleWithSlots, 1000000)
# 0.6753360239999999 seconds

@Timer()
def access_attribute(obj, size):
    for _ in range(size):
        writer = obj.writer
        
article = Article("2020-01-01", "xiaoxu")
article_slots = ArticleWithSlots("2020-01-01", "xiaoxu")
access_attribute(article, 1000000)
# 0.06791842000000003 seconds
access_attribute(article_slots, 1000000)
# 0.06492474199999987 seconds

__slots__ has slightly better performance is because thetime complexity of get/set operationin a list is faster than a dictionary in the worst case. As O(n) only happens in the worst case, we will not notice the difference most of the time, especially when you have a small volume of data.

data source: wiki.python.org

__slots__ reduces RAM usage

Since attributes can be accessed as data members (like properties), there is no need to store them in the dictionary __dict__. Actually, __slots__ will deny the creation of __dict__ at all. So if you print article_slots.__dict__, you will get the AttributeError exception.

article_slots = ArticleWithSlots("2020-01-01", "xiaoxu")
print(article_slots.__dict__)
#AttributeError: 'ArticleWithSlots' object has no attribute '__dict__'

And this behavior reduces the RAM usage of an object. I will compare the size of article and article_slots using pympler. The reason for not using sys.getsizeof() is that getsizeof() doesn’t include the size of referenced objects. However, __dict__ is a referenced object which will be ignored in getsizeof().

from pympler import asizeof
import sys

a = {"key":"value"}
b = {"key":{"key2":"value"}}

print(sys.getsizeof(a))
# 248
print(sys.getsizeof(b))
# 248
print(asizeof.asizeof(a))
# 360
print(asizeof.asizeof(b))
# 664

It turns out that article_slots saves more than 50% of the memory. WoW, such an amazing improvement!

from pympler import asizeof

article = Article("2020-01-01", "xiaoxu")
article_slots = ArticleWithSlots("2020-01-01", "xiaoxu")

print(asizeof.asizeof(article))
# 416
print(asizeof.asizeof(article_slots))
# 184

Such a good performance is because article_slots doesn’t have __dict__ attribute which actually saves a lot of memory.

When to use and not use __slots__?

So far, it looks like __slots__ is such a nice feature. Can we add it to every class?

The answer is NO! Apparently, there are some trade-offs.

Fixed attributes

One of the reasons to use __dict__ is its flexibility after creating the object where you can add new attributes. However, __slots__ will fix the attributes when you create the class. So, it’s not possible to add new attributes later.

article_slots = ArticleWithSlots("2020-01-01", "xiaoxu")
article_slots.reviewer = "jojo"
# AttributeError: 'ArticleWithSlots' object has no attribute 'reviewer'

But …

In some cases where you want to take advantage of __slots__, and also have the flexibility of adding new attributes in the runtime. You can achieve this by adding __dict__ inside __slots__ as an attribute. Only the newly added attributes will appear in __dict__. This can be useful when your class has 10+ fixed attributes and you want to have 1 or 2 dynamic attributes later.

class ArticleWithSlotsAndDict:
    __slots__ = ["date", "writer", "__dict__"]

    def __init__(self, date, writer):
        self.date = date
        self.writer = writer

article_slots_dict = ArticleWithSlotsAndDict("2020-01-01", "xiaoxu")
print(article_slots_dict.__dict__)
# {}
article_slots_dict.reviewer = "jojo"
print(article_slots_dict.__dict__)
# {'reviewer': 'jojo'}

Inheritance

If you want to inherit a class that includes __slots__, you don’t have to repeat those attributes again in the subclass. Otherwise, the subclass will take up more space. Besides, the repeated attributes will be inaccessible in the parent class.

class ArticleBase:
    __slots__ = ["date", "writer"]

class ArticleAdvanced(ArticleBase):
    __slots__ = ["reviewer"]

article = ArticleAdvanced()
article.writer = "xiaoxu"
article.reviewer = "jojo"
print(ArticleBase.writer.__get__(article))
# xiaoxu
print(ArticleAdvanced.reviewer.__get__(article))
# jojo

It works the same when you inherit a NamedTuple. You don’t need to repeat attributes in the subclass. If you want to understand more about NamedTuple, you can read my article dedicated to this topic.

import collections

ArticleNamedTuple = collections.namedtuple("ArticleNamedTuple", ["date", "writer"])

class ArticleAdvancedNamedTuple(ArticleNamedTuple):
    __slots__ = ()

article = ArticleAdvancedNamedTuple("2020-01-01", "xiaoxu")
print(article.writer)
# xiaoxu 

You can also add __dict__ attribute in the subclass. Or, you don’t put __slots__ in the subclass, it will by default have __dict__.

class ArticleBase:
    __slots__ = ["date", "writer"]

class ArticleAdvanced(ArticleBase):
    __slots__ = ["__dict__"]

article = ArticleAdvanced()
article.reviewer = "jojo"
# {'reviewer': 'jojo'}

class ArticleAdvancedWithoutSlots(ArticleBase):
    pass

article = ArticleAdvancedWithoutSlots()
article.reviewer = "jojo"
print(article.__dict__)
# {'reviewer': 'jojo'}

If you inherit a class without __slots__, then the subclass will contain __dict__.

class Article:
    pass

class ArticleWithSlots(Article):
    __slots__ = ["date", "writer"]

article = ArticleWithSlots()
article.writer = "xiaoxu"
article.reviewer = "jojo"
print(article.__dict__)
# {'reviewer': 'jojo'}

Conclusion

I hope you’ve understood what __slots__ is and some details of the implementation. At the end of the article, I want to share the pros and cons that coming from my own experience and the internet (linked in the Reference).

pros

__slots__ can be definitely useful when you have a pressure on memory usage. It’s extremely easy to add or remove with just one-line of code. The possibility of having __dict__ as an attribute in __slots__ gives developers more flexibility to manage attributes while taking care of the performance.

cons

You need to be clear about what you are doing and what you want to achieve with __slots__, especially when inheriting a class with it. The order of inheritance, the attribute names can make a huge difference in the performance.

You can’t inherit a built-in type such as intbytestuple with non-empty __slots__. Besides, you can’t assign a default value to attributes in __slots__. This is because these attributes are supposed to be descriptors. Instead, you can assign the default value in __init__().

class ArticleNumber(int):
    __slots__ = ["number"]
# TypeError: nonempty __slots__ not supported for subtype of 'int'

class Article:
    __slots__ = ["date", "writer"]
    date = "2020-01-01"
# ValueError: 'date' in __slots__ conflicts with class variable

Source: towardsdatascience