1

I scan for a trigger and when I get it, load a .npy file and process it. It started to take almost 2 seconds to load the NumPy file from within the process but when I tried to load the same file from a different script it takes 0.015 seconds.

How can I diagnose this? It used to be instantaneous originally and I am not sure why it started to take longer since 3 weeks ago. The production script uses multiprocessing. I created a module containing several definitions and I call one of those definitions in the multiprocessing script:

from projectAI.projectAI import dataProcessor
.
.
.
initializations
.
.
.

processes = []
process1 = Process(target = dataProcessor,
                   args   = (genModelName, dataPath, DB, Max, Min),
                   daemon = True)
processes.append(process1)

for i in range(Count):
    process = Process(target = appEstimator,
                      args   = (Data, str(i+1), city),
                      daemon = True)
processes.append(process)

for process in processes:
    process.start()

for process in processes:
    process.join()

The definition "dataProcessor" within the module "projectAI":

dataProcessor(genModelName, dataPath, DB, Max, Min):

    while True:

        # Check for a new entry in DB using queries

        if newEntry:
            data = np.load(dataPath + newEntry["fileName"] + ".npy")

            # Further processing

This has been in production for months and never had issues. Suddenly it started to take 1.8 to 2 seconds to load this file. I executed the following script in the same environment:

import numpy as np, time
T1 = time.time()
data = np.load(dataPath + newEntry["fileName"] + ".npy")
T2 = time.time()
print(T2-T1)

Which takes only 0.015 seconds. What might be the cause?

7
  • 2
    We can't see your code, a minimal reproducible example is required. What do you mean by "scanning"? Is it possible other processes are continuing thrashing your CPU even after you have found whatever you are scanning for? Commented Oct 21 at 17:28
  • I wonder if that while True loop in your dataProcessor is hammering your system polling for a new database entry... Maybe the process that puts the entry in there can signal you more economically? Commented Oct 25 at 7:18
  • I agree with the comment on "while True" being computationally not efficient, but my more pressing issue is the "no.load" taking 2 seconds. I would like to know how I can see why so that I can address it in future. Commented Oct 25 at 19:38
  • Just for fun, try putting a sleep() of 0.05s immediately after the while True. Commented Oct 25 at 21:18
  • I already have it after the if block inside while. I can move it to top. I added logs after every line inside the while and everything before np.load runs within 5 Ms. Only the load takes 2 seconds. And, I didn't have this problem when few weeks ago. That's the biggest mystery to me Commented Oct 25 at 23:30

2 Answers 2

1

My first suspicion is the multiprocessing capability while using numpy library may be redundant and can cause slow down. If that makes sense to you, you can restrict the multiprocessing for each Numpy instance. However, if you think the bottleneck is at numpy file reading operation, then you may like to see if a scanner is slowing the operation down or perhaps the lcoation of the production is not conducive for fast operations. You can isolate the read operation and time it in production to rule it out.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your response. I have done more research on this and found out that the execution of " open(os_fspath(file), "rb") " within "npyio.py" which contains numpy's "load" definition is the reason for the delay. I am not sure why it causes delay sometimes and does not in other instances. If you have any insights into this I can really use it. Meanwhile, I will keep digging and will let you know when I get an answer.
0

Thank you everyone for your comments and answers. I found out that it was VS Code all along, refreshing the same log file to which I was adding data. I think with the latest update VS Code started to do that as I have not see it auto refresh while open.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.