3

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below:

import polars as pl
import time
import numpy as np

print(pl.__version__)
np.random.seed(0)

indices = np.arange(2_000)
columns = [f"col_{i}" for i in range(20_000)]

df_1 = pl.DataFrame({
    "index": indices,
    **{col: np.random.rand(len(indices)) for col in columns}
})

df_2 = pl.DataFrame({
    "index": indices,
    **{col: np.random.rand(len(indices)) for col in columns}
})

print("DataFrames created.")

t0 = time.time()
df_merged = df_1.join(df_2, on="index", how="left", suffix="_right")
t1 = time.time()
print(f"Time taken to merge: {t1 - t0:.2f} seconds") 

When using polars 1.30.0, the merge step takes 0.06 seconds,

1.30.0
DataFrames created.
Time taken to merge: 0.06 seconds

but when using polars 1.31.0, the merge step takes almost 30 seconds

1.31.0
DataFrames created.
Time taken to merge: 27.68 seconds

Anyone knows why that happened?

5
  • 1
    I do see the same slowdown. It seems to be specific to the in-memory engine, if I call .lazy() on both frames and run with .collect(engine="streaming") it is fast again. You could report it on Github as a performance issue. Commented Nov 7 at 13:58
  • Check if it is still present in the latest version before reporting it though (as in, it hasn't already been fixed) Commented Nov 7 at 14:05
  • 1
    Latest version is 1.35.1, and I can see the issue is still there. Will report it in Github and see how it goes. Commented Nov 7 at 14:11
  • I do not have the issue on the latest version (1.35.2). I get Time taken to merge: 0.06 seconds on my (Debian) Linux. I do reproduce the issue in the 1.35.1. They probably fixed the problem in the minor update :) ! Commented Nov 9 at 20:53
  • 4
    Can confirm it is fixed in 1.35.2 For those curious: github.com/pola-rs/polars/releases/tag/py-1.35.2 github.com/pola-rs/polars/pull/25222 Commented Nov 10 at 8:59

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.