1

I am currently working on a dataframe function that assigns values of a numpy array of shape 2 to a given column of a dataframe using the polars library in Python.

I have a dataframe df with the following columns : ['HZ', 'FL', 'Q']. The column 'HZ'takes values in [0, EC + H - 1] and the column 'FL' takes values in [1, F].

I also have a numpy array q of shape (EC + H, F), and I want to assign its values to the column 'Q' in this way : if df['HZ'] >= EC, then df['Q'] = q[df['HZ']][df['F'] - 1].

You can find below the pandas instruction that does exactly what I want to do.

df.loc[df['HZ'] >= EC, 'Q'] = q[df.loc[df['HZ'] >= EC, 'HZ'], df.loc[df['HZ'] >= EC, 'F'] - 1]

Now I want to do it using polars, and I tried to do it this way:

df = df.with_columns(pl.when(pl.col('HZ') >= EC).then(q[pl.col('HZ')][pl.col('F') - 1]).otherwise(pl.col('Q')).alias('Q'))

And I get the following error :

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I understand that I don't give numpy the good format of indexes to get the corresponding value in the array, but I don't know how to replace it to get the desired behavior.

Thanks by advance

4
  • You'd need to get the actual values out of the dataframe to index the numpy array e.g. hz, fl = df.filter(pl.col("HZ") >= EC).select(pl.col("HZ"), pl.col("FL") - 1) then use q[hz, fl] Commented Mar 17, 2023 at 16:19
  • This instruction seems to return the lines of the array. I tried it and I get the following error : TypeError: did not expect value [...] of type <class 'numpy.ndarray'>, maybe disambiguate with pl.lit or pl.col where [...] seems to be a line of my initial numpy array Commented Mar 17, 2023 at 16:29
  • 2
    Perhaps you could add a code example/test case to make it easier to know what's going on exactly. Commented Mar 17, 2023 at 16:46
  • The pandas code does exactly what I want to do. For more details, I opened up a question for the exact same topic in pandas there stackoverflow.com/questions/73770083/… Commented Mar 20, 2023 at 8:22

1 Answer 1

1

By test case/example I meant something like:

df = pl.DataFrame({
    "HZ": [0, 0, 1, 1], 
    "FL": [0, 1, 2, 3], 
    "Q": [0, 0, 0, 0]
})
q = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
EC = 1
>>> df
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ  ┆ FL  ┆ Q   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ 0   │
│ 0   ┆ 1   ┆ 0   │
│ 1   ┆ 2   ┆ 0   │
│ 1   ┆ 3   ┆ 0   │
└─────┴─────┴─────┘

The problem with your attempted approach is q[pl.col('HZ') happens before .with_columns executes and numpy does not understand pl.col('HZ')

One way to use the actual values to index the numpy array is by using .map

df.with_columns(Q = 
   pl.when(pl.col("HZ") >= EC)
     .then(
        pl.map(
           ["HZ", pl.col("FL") - 1], 
           lambda cols: q[cols[0], cols[1]])
        .flatten())
     .otherwise(pl.col("Q")))
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ  ┆ FL  ┆ Q   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ 0   │
│ 0   ┆ 1   ┆ 0   │
│ 1   ┆ 2   ┆ 6   │
│ 1   ┆ 3   ┆ 7   │
└─────┴─────┴─────┘

It's slightly awkward to do - it would probably be better to have your data in a better format for polars e.g. another dataframe.

df_q = pl.DataFrame(
   ((row, col, value) for (row, col), value in np.ndenumerate(q)),
   schema=["HZ", "FL", "Q"]
)
>>> df_q
shape: (8, 3)
┌─────┬─────┬─────┐
│ HZ  ┆ FL  ┆ Q   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ 1   │
│ 0   ┆ 1   ┆ 2   │
│ 0   ┆ 2   ┆ 3   │
│ 0   ┆ 3   ┆ 4   │
│ 1   ┆ 0   ┆ 5   │
│ 1   ┆ 1   ┆ 6   │
│ 1   ┆ 2   ┆ 7   │
│ 1   ┆ 3   ┆ 8   │
└─────┴─────┴─────┘

This would allow you to use a more regular approach to matching values such as a .join

df.join(df_q.with_columns(pl.col("FL") + 1), on=["HZ", "FL"], how="left")
shape: (4, 4)
┌─────┬─────┬─────┬─────────┐
│ HZ  ┆ FL  ┆ Q   ┆ Q_right │
│ --- ┆ --- ┆ --- ┆ ---     │
│ i64 ┆ i64 ┆ i64 ┆ i64     │
╞═════╪═════╪═════╪═════════╡
│ 0   ┆ 0   ┆ 0   ┆ null    │
│ 0   ┆ 1   ┆ 0   ┆ 1       │
│ 1   ┆ 2   ┆ 0   ┆ 6       │
│ 1   ┆ 3   ┆ 0   ┆ 7       │
└─────┴─────┴─────┴─────────┘
Sign up to request clarification or add additional context in comments.

1 Comment

Oh thanks it seems to do the job ! I think the .map() function was the missing link in my reasoning. I understand this way of doing is a bit curious but my sata is structured in a way that this kind of operation is necessary for now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.