Python Objects Mutability and Recycling
Overview of Object Referencing and Mutation and Garbage Collection in Python
Objects are the core constructs of any OOP language, and each language defines its own syntax to create them, update them and destroy them. In Python every object has an identity, a type, and a value. However, only the value of an object may change over time.
In this article we’ll address these topics:
- What are Variables in Python?
- How to Copy an Object?
- Python Garbage Collection
Variables Are Not Boxes
Usually Variables are regarded as Boxes or Containers which hinders the understanding of reference variables in object oriented languages.
Python variables are like reference variables in Java, you can think of them as labels with names attached to the objects.
In the next example we modify the list referenced by “var_one”, by appending another item. When we print “var_two” we get the same list.
var_one = [1, 2, 3]
var_two = var_one
var_one.append(4)
print(var_two)
# [1, 2, 3, 4]
This means the “var_two” references the same list referenced by “var_one”.
Nothing prevents an object from having several labels assigned to it, i.e. different variables referencing the same object.
This leads us to another question! How do we check if two objects are equal?
In Python, each object has an id, and it can be retrieved by the id(obj) function. Now two variables referencing the same object will have the same id, i.e. the id is the Memory Address of the object and it is unique during the object’s lifecycle.
charles = {'name': 'Charles L. Dodgson', 'born': 1832}
lewis = charles
print(lewis is charles)
# True
print(id(charles), id(lewis))
# (4300473992, 4300473992)
lewis['balance'] = 950
print(charles)
# {'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}
alex = {'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}
print(alex == charles)
# True
print(alex is not charles)
# False
The is and is not operators compare the identity of two objects while the
id() function returns an integer representing the identity.
The == operator compares the values of objects, i.e. he data they hold and this is what we often care more about.
The is operator cannot be overloaded and tends to be faster than == operator because it can be overloaded.
Most built-in types and Python objects override the __eq__ special method to support the == Operator.
How to Copy an Object? Shallow Copy VS Deep Copy
Shallow copies are the easiest to make, but they may not be what we want.
Shallow copies copy the references to the copied object, i.e. we don’t create new objects we just reference the existing embedded objects.
This saves memory and causes no problems if all the items are immutable, But if there are mutable items, it may lead to unpleasant surprises.
list_1 = [3, [66, 55, 44], (7, 8, 9)]
list_2 = list(list_1)
The above Code Snippet Execution Visualization clarifies that making a shallow copy of a list using the constructor will only reference the existing objects in the original list.
list_1.append(100)
list_1[1].remove(55)
print('list_1:', list_1)
# list_1: [3, [66, 44], (7, 8, 9), 100]
print('list_2:', list_2)
# list_2: [3, [66, 44], (7, 8, 9)]
list_2[1] += [33, 22]
list_2[2] += (10, 11)
print('list_1:', list_1)
# list_1: [3, [66, 44, 33, 22], (7, 8, 9), 100]
print('list_2:', list_2)
# list_2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]
Few things to keep in mind:
- Any operations on embedded objects will be visible to other variables referencing the said objects, unless that object is Immutable.
- Tuples are Immutable, so any operation on them creates a new tuple and rebinds it to the variable like in the example.
Deep Copy to The Rescue
Deep copies are duplicates that do not share references of embedded objects, i.e. if we deep copy a list or any object we will create new references for its embedded objects.
Python Standard Library has a module that implements this, it offers two functions copy() and deepcopy(), the first for the shallow copy and the latter for the deep copy.
import copy
class Bus:
def __init__(self, passengers=None):
if passengers is None:
self.passengers = []
else:
self.passengers = list(passengers)
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.copy(bus1)
bus3 = copy.deepcopy(bus1)
print(id(bus1), id(bus2), id(bus3))
# (4301498296, 4301499416, 4301499752)
bus1.drop('Bill')
print(bus2.passengers)
# ['Alice', 'Claire', 'David']
print(id(bus1.passengers), id(bus2.passengers), id(bus3.passengers))
# (4302658568, 4302658568, 4302657800)
print(bus3.passengers)
# ['Alice', 'Bill', 'Claire', 'David']
Sometimes making a deep copy can lead to bugs, some objects could have cyclic references or objects that may refer to external resources that should not be copied.
The solution would be to implement our own __copy__ and __deepcopy__ Special methods.
Garbage Collection
Objects in Python are never explicitly destroyed like in C# for example, they are garbage collected when they become unreachable.
We do have a statement that deletes references, del, but never the actual Objects.
a = [1, 2]
b = a
del a
print(b)
# [1, 2]
b = [3]
# The original object is now ready to be garbage collected
Python’s garbage collector will discard an object from memory if it has no references.
Conclusion
It's clear that Python tries, in its own way, to make our code readable and optimized with the built in garbage collection, so we don't have to worry too much about objects staying in memory for longer than they are needed. However, there are always more solutions to optimize the performance of an app.
Here are few resources that could help you understand better the inner workings of Python so you could implement the fitting fix for your service.
Further Reading
- The 6th chapter of the book Fluent Python
- The official Python documentation on the copy module
- The “Data Model” chapter of The Python Language Reference book