Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
66 views

I just created this function for making label-encoding: # Custom command to perform label encoding on a specified column of a table, it returns a polar dataframe @example "Simple example" { ...
kurokirasama's user avatar
1 vote
1 answer
93 views

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...
bmitc's user avatar
  • 908
0 votes
1 answer
106 views

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...
robertspierre's user avatar
2 votes
1 answer
112 views

When I use horizontal cumulative sum followed by unnest, a "literal" column is formed that stays in the schema even when dropped. Here is an example: import polars as pl def ...
Nicolò Cavalleri's user avatar
4 votes
1 answer
596 views

In Polars 0.46.0 it works normally: let df = df!( "id" => [0, 1, 2, 3, 4], "col_1" => [1, 2, 3, 4, 5], "col_2" => [3, 4, 5, 6, 7], ) .unwrap(); dbg!(&...
Alex Avin's user avatar
4 votes
1 answer
186 views

I come from C++ and R world and just started using Polars. This is a great library. I want to confirm my understanding of its copy-on-write principle: import polars as pl x = pl.DataFrame({'a': [1, 2, ...
user2961927's user avatar
  • 1,790
1 vote
1 answer
219 views

I have a s3 location in which I have a list of directories and each directory contains a csv named sample_file.csv. I am trying to read these files using a glob pattern in pl.read_csv but it is just ...
figs_and_nuts's user avatar
0 votes
0 answers
122 views

When I try to run the below function: import polars as pl from polars_encryption import encrypt, decrypt def crypt(csv_file: str, delim: str, password: str, output_file: str): """ ...
James McIntyre's user avatar
2 votes
0 answers
102 views

I'm currently benchmarking PySpark vs the growing alternative Polars. Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...
Ernest P W's user avatar
1 vote
1 answer
83 views

I have a polars dataframe, and a dictionary. I want to map a column in the dataframe to the keys of the dictionary, and then add the corresponding values as a new column. import polars as pl my_dict =...
falsePockets's user avatar
  • 4,423
0 votes
1 answer
118 views

I have a polars data frame which could be generated like so: import polars as pl import numpy as np num_points = 10 group_count = 3 df = pl.DataFrame( { "group_id": np....
TOgy's user avatar
  • 3
6 votes
1 answer
132 views

How could I split a column of string into list of list? Minimum example: import polars as pl pl.Config(fmt_table_cell_list_len=6, fmt_str_lengths=100) df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,...
Baffin Chu's user avatar
1 vote
1 answer
207 views

I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function. What might I need to do to ...
sclamons's user avatar
  • 103
0 votes
1 answer
180 views

I manually created directory structures and wrote parquet files rather than used the partition_by parameter in the write_parquet() function of the python polars library because I want full control ...
Matt's user avatar
  • 7,316
3 votes
1 answer
108 views

I have a df like: # /// script # requires-python = ">=3.13" # dependencies = [ # "polars", # ] # /// import polars as pl df = pl.DataFrame( { "points"...
DJDuque's user avatar
  • 954
1 vote
1 answer
140 views

I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...
gernophil's user avatar
  • 637
2 votes
1 answer
124 views

I have a DataFrame that I need to separate columns when there are commas. The problem is when I have columns that are all null. In the example below, I need a DataFrame with the columns "mpg"...
user27247029's user avatar
1 vote
0 answers
130 views

Both Polars and Numba are fantastic libraries that complement each other pretty well. There are some limitations when using Numba-compiled functions in Polars: Arrow columns must be converted to ...
Olibarer's user avatar
  • 423
1 vote
1 answer
212 views

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...
pinpss's user avatar
  • 173
0 votes
0 answers
191 views

From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes ...
Nicolas Galler's user avatar
0 votes
1 answer
111 views

The provided whl files for polars library are tagged as abi3. I am working with specific setup that needs ABI tag to be cp39. I tried unpacking and packing again while changing the tag but still not ...
RGI's user avatar
  • 21
2 votes
1 answer
167 views

I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one ...
Ghost's user avatar
  • 1,594
-5 votes
1 answer
1k views

I'm trying to filter a Polars dataframe by using a boolean mask for the rows, which is generated from conditions on an specific column using: df = df[df['col'] == cond] And it's giving me an error ...
Ghost's user avatar
  • 1,594
0 votes
0 answers
206 views

I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code. All the ...
newandlost's user avatar
  • 1,080
2 votes
2 answers
149 views

I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...
Olibarer's user avatar
  • 423