Introduction
When writing a program, we often want to create a copy of an object (int, float, list, tuple, dictionary, etc.). Because we want to use the copy (new) object without changing the content of the original.
There are a number of ways you can create a copy of the object in Python. But, beware! Most of these methods create a shallow copy that you don’t want. But what you should need in this scenario is a deep copy.
So, what are these different methods of creating a copy of an object? What are shallow copy and deep copy? We are going to answer all these questions in this article.
Let’s first understand that the assignment operator (=) does not create a copy (new) object. It just creates a binding between the copy (new) object and the original object (i.e. both objects share the same memory address). But this doesn’t make difference for immutable objects such as int, float, decimal, bool, tuple, etc. because we can’t modify the immutable objects.
Shallow Copy
A shallow copy creates a copy of the given compound object and while doing so it inserts the references to the objects found in the original. You will understand this definition once you go through this section.
There are many ways you can create shallow copies in Python. Let’s look at 6 different ways to create shallow copies in the below —
Notice that the memory location of both the objects (my_list1 and my_list2) in all the methods shown below. As you can see, id(my_list1) and id(my_list2) return two different memory addresses. It indicates that a new object (copy) has been created.
The id() function returns address of the object. It is briefly covered here if you want to take a quick look.
For loop:
my_list1 = [10,20,30,40,50]
my_list2 = []
for num in my_list1:
my_list2.append(num)
print(id(my_list1), id(my_list2))
Output
------
1964682668352 1964682820800
List comprehension
my_list1 = [10,20,30,40,50]
my_list2 = [ num for num in my_list1]
print(id(my_list1), id(my_list2))(my_list2))
Output
------
1964682602752 1964681785728
Copy method
my_list1 = [10,20,30,40,50]
my_list2 = my_list1.copy()
print(id(my_list1), id(my_list2))
Output
------
1964682620480, 1964682668352
Slicing
my_list1 = [10,20,30,40,50]
my_list2 = my_list1[:]
print(id(my_list1), id(my_list2))
Output
------
1964682859008 1964682620480
Constructor
my_list1 = [10,20,30,40,50]
my_list2 = list(my_list1)
print(id(my_list1), id(my_list2))
Output
------
1964683214656 1964683576832
Copy module
import copy
my_list1 = [10,20,30,40,50]
my_list2 = copy.copy(my_list1)
print(id(my_list1), id(my_list2))
Output
------
1964683560512 1964683214784
In all these methods if you try to modify (add/delete/change) an element in my_list1 or my_list2, you expect the other list not to get updated. And that’s exactly what happens. This is because we have used a list where the elements of the list are immutable. For example, in my_list = [10,20,30,40,50], the elements were immutable. Let’s see an example to confirm this.
my_list1 = [10,20,30,40,50]
my_list2 = [num for num in my_list1]
print(id(my_list1), id(my_list2))
Output
------
1964682380992 1964682885952
Now, say you appended 60 to my_list2. As you can see this didn’t modify the my_list1. Because my_list1 and my_list2 are two different objects.
my_list2.append(60)
print("my_list1:", my_list1)
print("my_list2:", my_list2)
print(id(my_list1), id(my_list2))
Output
------
my_list1: [10, 20, 30, 40, 50]
my_list2: [10, 20, 30, 40, 50, 60]
1964682380992 1964682885952
So far so good. The copy operation is working exactly how we think it should. What if the original object (ex. list) has a nested structure (lists within the list) and you try to modify these mutable elements? What happens then?
In the below example, my_list1 has a nested structure. Meaning the individual elements of my_list1 are also lists. Next, say you created my_list2 using any of the above-mentioned 6 methods. This is what it looks like visually. Though my_list1 and my_list2 are two different objects, their contents are sharing the same memory address.
Let’s confirm this programmatically. The id() function clearly shows that my_list1 and my_list2 are two different objects.
my_list1 = [[10,20], [30,40]]
my_list2 = [num for num in my_list1]
print(id(my_list1), id(my_list2))
Output
------
1964682568960 1964682620672
But note that if the elements of the list are mutable, then share the same memory address as you can see below.
print(id(my_list1[0]), id(my_list1[1]))
print(id(my_list2[0]), id(my_list2[1]))
Output
------
1964682602560 1964682647424
1964682602560 1964682647424
Since they share the same memory address, if you modify my_list2[0] or my_list2[1], the same changes will get reflected in my_list1 as well even though we are not intended to do that. But that is not what we want. Right? We want a real clone or copy that doesn’t affect the original. So, how do you handle this? That’s where deep copy comes into the picture.
my_list2[0][0] = 100
print(my_list1)
print(my_list2)
Output
------
[[100, 20], [30, 40]]
[[100, 20], [30, 40]]
Deep Copy
A deep copy creates a copy of the given compound object and while doing so it recursively inserts copies of the objects found in the original.
To create a deep copy of the object, you need to use deepcopy() method from the copy module. Let’s take the same example, we covered in the shallow copy section.
As you can see from the code output and diagram, with deepcopy() we are able to overcome the issue with the shallow copy. Notice that nested elements don’t share the same address anymore.
my_list1 = [[10,20], [30,40]]
my_list2 = copy.deepcopy(my_list1)
print(id(my_list1), id(my_list2))
# 1898086629760, 1898086651392
print(id(my_list1[0]), id(my_list1[1]))
print(id(my_list2[0]), id(my_list2[1]))
Output
------
1964682647424 1964682649152
1964682602560 1964682648704
Arbitrary Python Objects
In all the methods above, we have used a list for the demonstration. In fact, you can use any mutable type list, set, dictionary, or arbitrary Python object. Let’s look at an example of an arbitrary object below.
import copy
class MyClass():
def __init__(self):
self.x = [[1,2], [3,4]]
Shallow copy: In the below example, we first created object c of MyClass and then created a shallow copy c1. As you can see, even though both c (original object) and shallow copy object c1 are different objects, their contents share the same memory. Note x[0] is [1,2] and x[1] is [3,4].
c = MyClass()
c1 = copy.copy(c)
print(c is c1)
print(id(c.x[0]), id(c.x[1]))
print(id(c1.x[0]), id(c1.x[1]))
Output
------
False
1964683631168 1964683631360
1964683631168 1964683631360
Since we are dealing with a shallow copy, when we appended [5,6] to original object c, the copy object c1 also got updated as expected.
c.x.append([5,6])
print(c.x)
print(c1.x)
Output
------
[[1, 2], [3, 4], [5, 6]]
[[1, 2], [3, 4], [5, 6]]
Deep copy: Continuing the same example from above, we have created a deep copy c2 of the object c. Both original object c and copy object c2 are two different objects. Since it was a deep copy, instead of sharing the same memory, a real copy (clone) of the objects is created. As you can see the elements within c and c2 don’t share the same memory.
c = MyClass()
c2 = copy.deepcopy(c)
print(c is c2)
print(id(c.x[0]), id(c.x[1]))
print(id(c2.x[0]), id(c2.x[1]))
Output
------
False
1964683630976 1964682581248
1964682779456 1964683675008
Hence, even though we change the content of c, it won’t be reflected in the copy object c2.
c.x.append([5,6])
print(c.x)
print(c2.x)
Output
------
[[1, 2], [3, 4], [5, 6]]
[[1, 2], [3, 4]]
Conclusion
In this article, we have understood shallow copy and deep copy in Python with examples. We have gone through 6 different ways of creating shallow copies. In order to create deep copy, we used copy module. In the last section, we saw that we can create shallow copy and deep copy on Python arbitrary objects as well.
Hope this answers all your doubts about shallow copy and deep copy in Python. Please do let us know if you have any questions in the comments section.