I was trying to test the performance of numpy using this very simple script:
import numpy as np
import argparse
from timeit import default_timer as timer
p = argparse.ArgumentParser()
p.add_argument("--N", default=1000, type=int, help="size of matrix A")
p.add_argument(
"--repeat", default=1000, type=int, help="perform computation x = A*b repeat times"
)
args = p.parse_args()
np.random.seed(0)
A = np.random.rand(args.N, args.N)
b = np.random.rand(args.N)
x = np.zeros(args.N)
ts = timer()
for i in range(args.repeat):
x[:] = A.dot(b)
te = timer()
gbytes = 8.0 * (args.N**2 + args.N * 2) * 1e-9
print("bandwidth: %.2f GB/s" % (gbytes * args.repeat / (te - ts)))
What it does is it creates a random dense matrix, performs matrix-vector multiplication repeat times, and computes the averaged bandwidth of such operation, which I believe includes memory read, computation and memory write. However when I run this script on my laptop, the results vary quite significantly for each run:
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 93.64 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 99.15 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 95.08 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 77.28 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 56.90 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 63.87 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 85.43 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 95.69 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 93.91 GB/s
~/toys/python ❯ python numpy_performance.py --N 8000 --repeat 100
bandwidth: 101.99 GB/s
Is this behavior expected? If so, how can it be explained? Thanks!