0

In the below code, I create a big array, and I want to plot it as a line plot. I must use numpy's memory mapped arrays that I learned about here to even create the array (and the x-values). This post Plotting a large number of points using matplotlib and running out of memory has the same issue, but not with a line plot, and I'm afraid I couldn't figure out how to use those ideas to get my line plot to work. (Using the tqdm package to track progress of my loops, it seems that both loops complete, and then while drawing, the RAM explodes.)

I am running this on Google Colab, and everything is fine until drawing the matplot picture, and I don't understand how this could possibly be an issue because ultimately it's just producing some .png file, dot by dot and line by line, which can't be that large/memory intensive! EDIT: instead of plt.show(), if I do plt.savefig(), it also fails.

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

width = 40
height = 8
transition = 0.9
chunk_size = int(1E6)  # Define a chunk size for processing data

n = int(1E8)

def calculate_array(n, tr, filename):
    a = np.memmap(filename, dtype='float64', mode='w+', shape=(n + 1,))
    if tr<1:
        for k in range(n, int(n * tr), -1):
            a[k] = 1 / (1 - tr) * (n - k) / n
    elif tr==1:
        a[n]=1

    for k in tqdm(range(int(n * tr), 0, -1), desc=f"Calculating array for t={tr}"):
        a[k] = 17 # of course in reality it's more complicated but let us cast aside those details
    
    a.flush()  # Ensure changes are written to disk

# Values of t to be used
t_values = [1, 0.99, 0.9, 0.8, 0.7, 0.6, 0.5]

# Create a memory-mapped file for x_values
x_filename = "x_values.dat"
x_values = np.memmap(x_filename, dtype='int64', mode='w+', shape=(n + 1,))
x_values[:] = np.arange(n + 1)
x_values.flush()

# Plot with both log axes
for t in t_values:
    plt.figure(figsize=(width, 8))
    filename = f"array_t_{t}.dat"
    calculate_array(n, t, filename)
    
    # Open the memory-mapped arrays for reading
    a = np.memmap(filename, dtype='float64', mode='r', shape=(n + 1,))
    x_values = np.memmap(x_filename, dtype='int64', mode='r', shape=(n + 1,))
    
    # Plot in chunks
    for start in tqdm(range(1, n, chunk_size), desc=f"Plotting array for t={t}"):
        end = min(start + chunk_size, n)
        x_chunk = x_values[start:end]
        y_chunk = a[start:end]
        
        plt.scatter(x_chunk, y_chunk, s=1, c='blue')
        
        # For the first chunk, connect the last point of the previous chunk
        if start > 1:
            plt.plot(x_values[start-1:end], a[start-1:end], linestyle='-', alpha=0.6, color='blue')
        else:
            plt.plot(x_chunk, y_chunk, linestyle='-', alpha=0.6, color='blue')
    
    plt.xscale('log')
    plt.xlabel('Index (log scale)')
    plt.yscale('symlog')
    plt.ylabel('a(k) (symlog scale)')
    plt.title(f'Individual Plot of the array a with both axes in log scale for t={t}')
    plt.legend([f't={t}'])
    plt.grid(True)
    plt.show()
5
  • screen usually has size 1900x1200 pixels - you can't see more pixels so why to draw billion elements if you can't even see them. Commented Jul 4, 2024 at 1:34
  • @furas I don't have to draw them all, but I don't know how to draw only the ones I need to draw and leave out the ones that would be redundant. Like there indeed will be many points that are essentially on top over each other, but is there a way matplot can recognize this so that the memory doesn't explode? Commented Jul 4, 2024 at 1:39
  • 1
    You need to think a bit harder. There is no sane way to plot a billion elements. Even a million will be a total mess on a 4k width screen. Compact it down to the size of your display and compute highest, lowest, median and mean for each block of your data and then you will have something meaningful that summarises its behaviour. Commented Jul 4, 2024 at 9:40
  • matplot doesn't have method for this - it is your job to reduce data. But maybe you should first check what other people tried to do - e.g. python - Interactive large plot with \~20 million sample points and gigabytes of data - Stack Overflow . There is some table with comparison of different tools to visualize data. Commented Jul 4, 2024 at 11:04

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.