Asked  7 Months ago    Answers:  5   Viewed   23 times

While reading up the documentation for dict.copy(), it says that it makes a shallow copy of the dictionary. Same goes for the book I am following (Beazley's Python Reference), which says:

The m.copy() method makes a shallow copy of the items contained in a mapping object and places them in a new mapping object.

Consider this:

>>> original = dict(a=1, b=2)
>>> new = original.copy()
>>> new.update({'c': 3})
>>> original
{'a': 1, 'b': 2}
>>> new
{'a': 1, 'c': 3, 'b': 2}

So I assumed this would update the value of original (and add 'c': 3) also since I was doing a shallow copy. Like if you do it for a list:

>>> original = [1, 2, 3]
>>> new = original
>>> new.append(4)
>>> new, original
([1, 2, 3, 4], [1, 2, 3, 4])

This works as expected.

Since both are shallow copies, why is that the dict.copy() doesn't work as I expect it to? Or my understanding of shallow vs deep copying is flawed?

 Answers

13

By "shallow copying" it means the content of the dictionary is not copied by value, but just creating a new reference.

>>> a = {1: [1,2,3]}
>>> b = a.copy()
>>> a, b
({1: [1, 2, 3]}, {1: [1, 2, 3]})
>>> a[1].append(4)
>>> a, b
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})

In contrast, a deep copy will copy all contents by value.

>>> import copy
>>> c = copy.deepcopy(a)
>>> a, c
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})
>>> a[1].append(5)
>>> a, c
({1: [1, 2, 3, 4, 5]}, {1: [1, 2, 3, 4]})

So:

  1. b = a: Reference assignment, Make a and b points to the same object.

    Illustration of 'a = b': 'a' and 'b' both point to '{1: L}', 'L' points to '[1, 2, 3]'.

  2. b = a.copy(): Shallow copying, a and b will become two isolated objects, but their contents still share the same reference

    Illustration of 'b = a.copy()': 'a' points to '{1: L}', 'b' points to '{1: M}', 'L' and 'M' both point to '[1, 2, 3]'.

  3. b = copy.deepcopy(a): Deep copying, a and b's structure and content become completely isolated.

    Illustration of 'b = copy.deepcopy(a)': 'a' points to '{1: L}', 'L' points to '[1, 2, 3]'; 'b' points to '{1: M}', 'M' points to a different instance of '[1, 2, 3]'.

Tuesday, June 1, 2021
 
mschuett
answered 7 Months ago
52

For a more general solution that works regardless of the number of dimensions, use copy.deepcopy():

import copy
b = copy.deepcopy(a)
Tuesday, June 1, 2021
 
viper
answered 7 Months ago
66

I want to shallow copy it, so can't I just do this:

int[] shallow = orig;

That's not really a shallow copy. A copy is a discrete entity that is similar to the original, but is not the original item. In your example, what you actually have are two references that are pointing to the same object. When you create a copy, you should have two resulting objects: the original and the copy.

Here, anything you do to modify shallow will happen to orig as well since they both point to the same object.

"Shallowness" comes into play when the object you are comparing has references to other objects inside it. For example, if you have an array of integers and you create a copy, you now have two arrays which both contain the same integer values:

Original Array

[0]
[1]
[2]
[3]

After copying:

[0] <--- Original  [0]
[1]                [1]
[3]                [2]
[4]      Copy ---> [3]

However, what if you had an array that consists of objects (let's say objArr1 and objArr2)? When you do a shallow copy you now have two new array objects, but each corresponding entry between the two arrays points to the same object (because the objects themselves haven't been copied; just the references have).

Original Array:

[0:]----> [object 0]
[1:]----> [object 1]
[2:]----> [object 2]
[3:]----> [object 3]

After copying (notice how the corresponding locations are pointing to the same instances):

Original -> [0:]----> [object 0] <----[:0] <- Copy
            [1:]----> [object 1] <----[:1]
            [2:]----> [object 2] <----[:2]
            [3:]----> [object 3] <----[:3]

Now if you modify objArr1 by replacing an entry or deleting an entry, that same thing doesn't happen to objArr2. However if you modify the object at objArr1[0], that is reflected in objArr2[0] as well since those locations point to the same object. So in this case, even though the container objects themselves are distinct, what they contain are references to the same object.

When you do a deep copy, you will two new arrays where each corresponding location points to different instances. So essentially you make copies of objects all the way down.

My professor said that for primitives, shallow and deep copy are essentially the same, in that we have to copy over each index of the array.

The important distinction to make is that when you copy an array of primitives, you are copying the values over exactly. Each time you get a new primitive. However, when you have an array of objects, what you really have is an array of references to objects. So when you create a copy, all you have done is create a new array that has copies of the references in the original array. However, these new copies of the references still point to the same corresponding objects. This is what's known as a shallow copy. If you deep-copied the array, then the objects that each individual location refers to, will have been copied also. So you would see something like this:

Original -> [0:]----> [object 0] Copy -> [0:]----> [copy of object 0]
            [1:]----> [object 1]         [1:]----> [copy of object 1]
            [2:]----> [object 2]         [2:]----> [copy of object 2]
            [3:]----> [object 3]         [3:]----> [copy of object 3]

But setting the whole array equals to another array does the same thing, right?

No it does not. What you're doing here is simply creating a new reference to an existing array:

arr1 -> [0, 1, 2, 3, 4]

Now let's say you did arr2 = arr1. What you have is:

arr1 -> [0, 1, 2, 3, 4] <- arr2

So here both arr1, and arr2 are pointing to the same array. So any modification you perform using arr1 will be reflected when you access the array using arr2 since you are looking at the same array. This doesn't happen when you make copies.

Friday, September 17, 2021
 
Sauleil
answered 3 Months ago
99

Your confusion is about the difference between variables and values.

So, when you do something like,

val p1 = Person("amit", "shah")
val p2 = p1.copy()

Then p2 is a shallow copy of p1, so the variables p1.firstname and p2.firstname point to the same value of String type which is "amit".

When you are doing p1.firstname = "raghu", you are actually telling variable p1.firstname to point to a different value of String type which is "raghu". Here you are not changing the value itself but the variable.

If you were to change to value itself, then both p1 and p2 will reflect the change. Unfortunately, String values are immutable in Scala, so you can not modify a String value.

Let me show you by using something modifiable like a ArrayBuffer.

scala> import scala.collection.mutable.ArrayBuffer
// import scala.collection.mutable.ArrayBuffer

scala> case class A(s: String, l: ArrayBuffer[Int])
// defined class A

scala> val a1 = A("well", ArrayBuffer(1, 2, 3, 4))
// a1: A = A(well,ArrayBuffer(1, 2, 3, 4))

scala> val a2 = a1.copy()
// a2: A = A(well,ArrayBuffer(1, 2, 3, 4))

// Lets modify the `value` pointed by `a1.l` by removing the element at index 1
scala> a1.l.remove(1)
// res0: Int = 2

// You will see the impact in both a1 and a2.

scala> a1
// res1: A = A(well,ArrayBuffer(1, 3, 4))

scala> a2
//res2: A = A(well,ArrayBuffer(1, 3, 4))
Sunday, October 10, 2021
 
Kiran Dash
answered 2 Months ago
90

You always clear, then append and then insert the same sectionList, that's why it always overwrites the entries - because you told the program it should.

Always remember: In Python assignment never makes a copy!

Simple fix

Just insert a copy:

formatDict[section] = sectionList.copy()    # changed here

Instead of inserting a reference:

formatDict[section] = sectionList  

Complicated fix

There are lots of things going on and you could make it "better" by using functions for subtasks like the grouping, also files should be opened with with so that the file is closed automatically even if an exception occurs and while loops where the end is known should be avoided.

Personally I would use code like this:

def groups(seq, width):
    """Group a sequence (seq) into width-sized blocks. The last block may be shorter."""
    length = len(seq)
    for i in range(0, length, width):   # range supports a step argument!
        yield seq[i:i+width]

# Printing the dictionary could be useful in other places as well -> so
# I also created a function for this.
def print_dict_line_by_line(dct):  
    """Print dictionary where each key-value pair is on one line."""
    for key, value in dct.items():
        print("for key =", key, "value =", value)

def mytask(filename):
    formatDict = {}
    with open(filename) as formatFileHandle:
        # I don't "strip" each line (remove leading and trailing whitespaces/newlines)
        # but if you need that you could also use:
        # for usableLine in (line.strip() for line in formatFileHandle):
        # instead.
        for usableLine in formatFileHandle:
            section = usableLine[:3]
            sectionList = list(groups(usableLine[3:]))
            formatDict[section] = sectionList
    # upon exiting the "with" scope the file is closed automatically!
    print_dict_line_by_line(formatDict)

if __name__ == '__main__':
    mytask('insert your filename here')
Sunday, October 24, 2021
 
Nasenbaer
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share