-1

I have two variables with the following structures:

# initialization
    O1 = [[] for g in range(n)] 
    for i in range(n):
        O1[i] = [[] for g in range(m)] 
         
    O2 = [[] for g in range(n)] 
    for i in range(n):
        O2[i] = [[] for g in range(m)] 

Then, I input these two into an initial function where different values are appended to them as:

for i in range(n):
    for j in range(m):
        O1[i][j].append([s, as, pat, 0])
        O2[i][j].append([[ol, oa, os, ot],[dl, da, ds],tpat, tp, 0])

As you can see, the internal structure of the lists gets complicated, especially in the case of O2. After this value assignment to O1 and O2, they are input into several other functions. Each function needs to make a deep copy of O1 and O2 and modify the copy for its own use without changing the original O1 and O2 variables. The changes include both .remove() and .append() and also +/- of values in the internal lists. The important thing is that no matter the changes, the original O1 and O2 should not change at all. This process runs iteratively where first O1 and O2 are assigned new values and then input into several other functions and copied and edited without any change in the original O1 and O2 that were used as input.

Using .deepcopy() is the only way that I know but as this is an iterative process, .deepcopy() function slows the code significantly down especially when O1 and O2 are large. I have tried using tuples as an immutable data structure but given the initial and internal structure and the changes made in the functions, it does not work. using tuples prevent the append() and remove() changes but not +/- operations.

I would be grateful if someone could suggest a faster way to do this.

12
  • 2
    Explicitly copying exactly as deep as you want (like O1_copy = [[d1.copy() for d1 in d0] for d0 in O1]) should be faster than copy.deepcopy. Copy on write as MuhammadHassnain suggested in an answer could be a good solution, if your usage pattern suits it. remove is inherently slow in a sense and you might want to pick a better data structure entirely. It’s hard to make specific suggestions without knowing what problem you’re trying to solve with these repeatedly copied arrays. Commented Oct 13, 2024 at 7:25
  • 3
    copy.deepcopy spends a ton of time figuring out the structure of the input - traversing the entire object graph, finding circular references, looking up copiers for every single object... like 99% of its runtime is overhead. When you already know how your data structures are built, you can massively speed things up with the kind of explicit copy implementation Ry- suggests. Commented Oct 13, 2024 at 7:35
  • 2
    A list's .copy() method makes a shallow copy, which looks like it should be all you need for that particular level. If the s, as, or pat objects are mutable, it may not be enough. Commented Oct 13, 2024 at 7:39
  • 1
    The standard number types are not mutable in Python. (It might help to read nedbatchelder.com/text/names.html) Commented Oct 13, 2024 at 7:41
  • 3
    @PaulHankin because copy.deepcopy has to handle arbitrary object graphs, keeping track of already seen objects (so as not to enter into a cycle) and has to check the type of every object it encouners, if you know that the list has a specific structure, you can implement a much faster deep copy. Commented Oct 13, 2024 at 10:41

2 Answers 2

0

There can be multiple solutions:

  1. Use numpy arrays instead of normal python lists, there memory layout allows for faster operations.
  2. Use copy-on-write techniques. You can use a library like pyrsistent (https://github.com/tobgu/pyrsistent) that delays copying until modifications are actually made, so essentially on the parts that are modified will be copied.
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your suggestions, however, this includes changing the data structure, which I cannot do.
0

If this is not verboten consider using PyPy, I prepared following simple rig

import copy
import itertools
import time

n = 500
cnt = itertools.count(1)
structure = [[[next(cnt) for _ in range(n)] for _ in range(n)] for _ in range(n)]

t1 = time.time()
structure_copy = copy.deepcopy(structure)
t2 = time.time()

print('Elapsed time (seconds) %.03f' % (t2-t1))

it does create 3D structure of shape n x n x n with subsequent integer numbers and then I measured time required to deepcopy it, got following results

  • Python 3.10.12 Elapsed time (seconds) 53.793
  • PyPy 7.3.9 with GCC 11.3.0 Elapsed time (seconds) 11.703

That is 5x times faster without any change required in code. Note that these are not most recent versions, so you might get different result. Please test this approach if you are allowed to install pypy3 and write if it give any speedup.

1 Comment

Unfortunately, I have not been able to install PyPy. But it seems really interesting, as I am dealing with an extremely slow code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.