2

I'm currently using pickle/joblib (but I'm flexible with library) to load a large numpy array. I have a SSD with 500MB/s read speed.

I'm hoping to read my numpy array faster.

Before investing in a new SSD, I'm wondering if the new SSDs with 1000MB/s-3000MB/s read speeds would actually allow me to read the numpy arrays faster. Are the pickle/joblib libraries themselves limited in read speed?

I have confirmed my current SSD is reading at about 400-500MB/s.

Would I get 1000MB/s if I bought a 1000MB/s SSD?

9
  • 1
    Almost always disks are the bottleneck. Commented May 20, 2020 at 0:33
  • 1
    @Barmar true, but it all depends on how much work you need to do per byte. Pickling doesn't feel like it would be low overhead, but I'd expect numpy to do a good job. Commented May 20, 2020 at 0:37
  • you want to check your CPU usage while loading the data. if the CPU is maxed out then getting a faster disk won't help, if your CPU is at 50% then you might be able to get twice as fast Commented May 20, 2020 at 0:39
  • And if the loading speed is about the same as the disk speed, a faster disk is likely to help. Commented May 20, 2020 at 0:41
  • 1
    You are using the entire disk channel so a faster SSD will very likely help. There are other ways to save arrays that should be faster than pickle. array.tofile as a binary array and then reading with numpy.memmap for instance, if you are dealing with basic types like ints or floats. Commented May 20, 2020 at 0:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.