Newest 'polars' Questions

0 votes

0 answers

66 views

nushell polars plugin panicking

I just created this function for making label-encoding: # Custom command to perform label encoding on a specified column of a table, it returns a polar dataframe @example "Simple example" { ...

kurokirasama

787

asked Oct 10 at 19:24

1 vote

1 answer

93 views

Converting a Rust `futures::TryStream` to a `polars::LazyFrame`

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...

bmitc

908

asked Sep 30 at 4:00

0 votes

1 answer

106 views

Show progress bar when reading files with globbing with polars

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...

robertspierre

5,379

asked Sep 23 at 3:18

2 votes

1 answer

112 views

Horizontal cumulative sum + unnest bug in polars

When I use horizontal cumulative sum followed by unnest, a "literal" column is formed that stays in the schema even when dropped. Here is an example: import polars as pl def ...

Nicolò Cavalleri

223

asked Aug 4 at 18:49

4 votes

1 answer

596 views

How to use the "is_in" function correctly?

In Polars 0.46.0 it works normally: let df = df!( "id" => [0, 1, 2, 3, 4], "col_1" => [1, 2, 3, 4, 5], "col_2" => [3, 4, 5, 6, 7], ) .unwrap(); dbg!(&...

Alex Avin

43

asked Aug 4 at 10:49

4 votes

1 answer

186 views

How to use Polars copy-on-write principle?

I come from C++ and R world and just started using Polars. This is a great library. I want to confirm my understanding of its copy-on-write principle: import polars as pl x = pl.DataFrame({'a': [1, 2, ...

user2961927

1,790

asked Jul 20 at 18:47

1 vote

1 answer

219 views

Polars reading just one file from s3 with glob patterns

I have a s3 location in which I have a list of directories and each directory contains a csv named sample_file.csv. I am trying to read these files using a glob pattern in pl.read_csv but it is just ...

figs_and_nuts

5,881

asked Jun 30 at 15:23

0 votes

0 answers

122 views

Trying and failing to encrypt CSV with polars_encryption

When I try to run the below function: import polars as pl from polars_encryption import encrypt, decrypt def crypt(csv_file: str, delim: str, password: str, output_file: str): """ ...

James McIntyre

116

asked Jun 17 at 14:47

2 votes

0 answers

102 views

Best way to trigger lazy evaluation in PySpark and Polars for benchmarking

I'm currently benchmarking PySpark vs the growing alternative Polars. Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...

Ernest P W

73

asked Jun 5 at 22:21

1 vote

1 answer

83 views

How to join/map a polars dataframe to a dict? [duplicate]

I have a polars dataframe, and a dictionary. I want to map a column in the dataframe to the keys of the dictionary, and then add the corresponding values as a new column. import polars as pl my_dict =...

falsePockets

4,423

asked May 26 at 10:22

0 votes

1 answer

118 views

Is there a polars operation to apply a function over each pair of groups?

I have a polars data frame which could be generated like so: import polars as pl import numpy as np num_points = 10 group_count = 3 df = pl.DataFrame( { "group_id": np....

TOgy

3

asked May 19 at 17:10

6 votes

1 answer

132 views

Split a column of string into list of list

How could I split a column of string into list of list? Minimum example: import polars as pl pl.Config(fmt_table_cell_list_len=6, fmt_str_lengths=100) df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,...

Baffin Chu

217

asked May 17 at 9:00

1 vote

1 answer

207 views

How do I ensure that a Polars expression plugin properly uses multiple CPUs?

I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function. What might I need to do to ...

sclamons

103

asked May 15 at 17:40

0 votes

1 answer

180 views

Polars for Python, can I read parquet files with hive_partitioning when the directory structure and files have been manually written?

I manually created directory structures and wrote parquet files rather than used the partition_by parameter in the write_parquet() function of the python polars library because I want full control ...

Matt

7,316

asked May 8 at 2:34

3 votes

1 answer

108 views

Modify list of arrays in place

I have a df like: # /// script # requires-python = ">=3.13" # dependencies = [ # "polars", # ] # /// import polars as pl df = pl.DataFrame( { "points"...

DJDuque

954

asked Apr 28 at 3:55

1 vote

1 answer

140 views

Get a grouped sum in polars, but keep all individual rows

I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...

gernophil

637

asked Apr 16 at 9:29

2 votes

1 answer

124 views

how to unnest struct columns without dropping empty structs with r-polars

I have a DataFrame that I need to separate columns when there are commas. The problem is when I have columns that are all null. In the example below, I need a DataFrame with the columns "mpg"...

user27247029

65

asked Apr 14 at 20:26

1 vote

0 answers

130 views

Howto efficiently apply a gufunc to a 2D region of a Polars DataFrame

Both Polars and Numba are fantastic libraries that complement each other pretty well. There are some limitations when using Numba-compiled functions in Polars: Arrow columns must be converted to ...

Olibarer

423

asked Apr 7 at 12:57

1 vote

1 answer

212 views

How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...

pinpss

173

asked Apr 3 at 23:13

0 votes

0 answers

191 views

Polars out of core sorting and memory usage

From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes ...

Nicolas Galler

1,319

asked Apr 3 at 9:11

0 votes

1 answer

111 views

Polars wheel file

The provided whl files for polars library are tagged as abi3. I am working with specific setup that needs ABI tag to be cp39. I tried unpacking and packing again while changing the tag but still not ...

RGI

21

asked Mar 28 at 12:58

2 votes

1 answer

167 views

Polars Dataframe from nested dictionaries as columns

I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one ...

Ghost

1,594

asked Mar 24 at 17:51

-5 votes

1 answer

1k views

Filtering polars dataframe by row with boolean mask [closed]

I'm trying to filter a Polars dataframe by using a boolean mask for the rows, which is generated from conditions on an specific column using: df = df[df['col'] == cond] And it's giving me an error ...

Ghost

1,594

asked Mar 17 at 16:26

0 votes

0 answers

206 views

Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation

I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code. All the ...

newandlost

1,080

asked Mar 13 at 13:12

2 votes

2 answers

149 views

Create a uniform dataset in Polars with cross joins

I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...

Olibarer

423

asked Mar 12 at 16:46

Collectives™ on Stack Overflow

nushell polars plugin panicking

Converting a Rust `futures::TryStream` to a `polars::LazyFrame`

Show progress bar when reading files with globbing with polars

Horizontal cumulative sum + unnest bug in polars

How to use the "is_in" function correctly?

How to use Polars copy-on-write principle?

Polars reading just one file from s3 with glob patterns

Trying and failing to encrypt CSV with polars_encryption

Best way to trigger lazy evaluation in PySpark and Polars for benchmarking

How to join/map a polars dataframe to a dict? [duplicate]

Is there a polars operation to apply a function over each pair of groups?

Split a column of string into list of list

How do I ensure that a Polars expression plugin properly uses multiple CPUs?

Polars for Python, can I read parquet files with hive_partitioning when the directory structure and files have been manually written?

Modify list of arrays in place

Get a grouped sum in polars, but keep all individual rows

how to unnest struct columns without dropping empty structs with r-polars

Howto efficiently apply a gufunc to a 2D region of a Polars DataFrame

How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame

Polars out of core sorting and memory usage

Polars wheel file

Polars Dataframe from nested dictionaries as columns

Filtering polars dataframe by row with boolean mask [closed]

Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation

Create a uniform dataset in Polars with cross joins

Hot Network Questions