1

Goal

I have a NumPy array

true_direction = np.array([1,2,3]).reshape(1,3)

which I want to insert into a Polars DataFrame; that is, repeat this array in every row of the DataFrame.

What I have tried

Below are what I have tried currently

  1. Repeat the numpy array and use .with_column():
    .with_columns(
       pl.Series(
         np.repeat(true_direction, repeats=912, axis=0)
       ).alias('true_direction')
    )
    
    The problem would be I have to somehow get the shape of the DataFrame beforehand, which is kind of annoying.
  2. Another way is to not start out with a numpy array
    true_direction = [1,2,3]
    
    in which case I can use pl.lit() (suggested by ChatGpt)
     .with_columns(
       pl.lit(true_direction)
       # .cast(pl.Array(pl.Float64, 3))
       .alias('true_direction')
     )
    
    The problem here is then I'd have to manually convert the list[f64] column into an array[f64,3] column since I need to take a dot product later on.

My question

Is there a more Polaric way to do this?

1 Answer 1

1

With polars.lit, Polars will broadcast the literal to the height of the DataFrame for you. In this you also need to add .first() to let it know your numpy array is a scalar to be broadcasted.

You mentioned floats, but have an array of ints. The type of the array in Polars will match the type of the input in NumPy, as shown below.

true_direction = np.array([1, 2, 3]).reshape(1, 3)
true_direction_float = np.array([1., 2., 3.]).reshape(1, 3)

df = pl.DataFrame({"a": range(10)})

df.with_columns(
    true_direction=pl.lit(true_direction).first(),
    true_direction_float=pl.lit(true_direction_float).first(),
)

outputs

shape: (10, 3)
┌─────┬────────────────┬──────────────────────┐
│ a   ┆ true_direction ┆ true_direction_float │
│ --- ┆ ---            ┆ ---                  │
│ i64 ┆ array[i32, 3]  ┆ array[f64, 3]        │
╞═════╪════════════════╪══════════════════════╡
│ 0   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 1   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 2   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 3   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 4   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 5   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 6   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 7   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 8   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
│ 9   ┆ [1, 2, 3]      ┆ [1.0, 2.0, 3.0]      │
└─────┴────────────────┴──────────────────────┘

If you want to change from int to float, you would need to cast (either in NumPy or Polars). Maybe if the other input to your dot product is a float, Polars will cast the result as a float (float being the supertype). Not sure on that one, test it out.

Sign up to request clarification or add additional context in comments.

4 Comments

I get the following error polars.exceptions.ComputeError: 'value' must be scalar value I'm using Polars version 1.29.0
It looks like a scalar check was added to pl.repeat: github.com/pola-rs/polars/pull/22088 which the numpy array does not pass (can't figure out if it should?) .with_columns(true_direction=pl.lit(true_direction).first()) could be a potential workaround.
My bad, I was on a recent version of Polars, but not the most recent. I will update this answer with a working solution ASAP (and check out the .first() option suggested below)
Answer has been updated. Thanks very much @jqurious Yes, also not sure if the numpy array should pass the scalar check. In this case the outer dimension only has a single element (suggesting it should be a scalar).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.