84 questions
0
votes
0
answers
66
views
nushell polars plugin panicking
I just created this function for making label-encoding:
# Custom command to perform label encoding on a specified column of a table, it returns a polar dataframe
@example "Simple example" {
...
1
vote
1
answer
93
views
Converting a Rust `futures::TryStream` to a `polars::LazyFrame`
I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...
0
votes
1
answer
106
views
Show progress bar when reading files with globbing with polars
I have a folder with multiple Excel files.
I'm reading all of them in a single polars DataFrame concatenated vertically using globbing:
import polars as pl
df = pl.read_excel("folder/*.xlsx")...
2
votes
1
answer
112
views
Horizontal cumulative sum + unnest bug in polars
When I use horizontal cumulative sum followed by unnest, a "literal" column is formed that stays in the schema even when dropped.
Here is an example:
import polars as pl
def ...
4
votes
1
answer
596
views
How to use the "is_in" function correctly?
In Polars 0.46.0 it works normally:
let df = df!(
"id" => [0, 1, 2, 3, 4],
"col_1" => [1, 2, 3, 4, 5],
"col_2" => [3, 4, 5, 6, 7],
)
.unwrap();
dbg!(&...
4
votes
1
answer
186
views
How to use Polars copy-on-write principle?
I come from C++ and R world and just started using Polars. This is a great library. I want to confirm my understanding of its copy-on-write principle:
import polars as pl
x = pl.DataFrame({'a': [1, 2, ...
1
vote
1
answer
219
views
Polars reading just one file from s3 with glob patterns
I have a s3 location in which I have a list of directories and each directory contains a csv named sample_file.csv. I am trying to read these files using a glob pattern in pl.read_csv but it is just ...
0
votes
0
answers
122
views
Trying and failing to encrypt CSV with polars_encryption
When I try to run the below function:
import polars as pl
from polars_encryption import encrypt, decrypt
def crypt(csv_file: str, delim: str, password: str, output_file: str):
"""
...
2
votes
0
answers
102
views
Best way to trigger lazy evaluation in PySpark and Polars for benchmarking
I'm currently benchmarking PySpark vs the growing alternative Polars.
Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...
1
vote
1
answer
83
views
How to join/map a polars dataframe to a dict? [duplicate]
I have a polars dataframe, and a dictionary. I want to map a column in the dataframe to the keys of the dictionary, and then add the corresponding values as a new column.
import polars as pl
my_dict =...
0
votes
1
answer
118
views
Is there a polars operation to apply a function over each pair of groups?
I have a polars data frame which could be generated like so:
import polars as pl
import numpy as np
num_points = 10
group_count = 3
df = pl.DataFrame(
{
"group_id": np....
6
votes
1
answer
132
views
Split a column of string into list of list
How could I split a column of string into list of list?
Minimum example:
import polars as pl
pl.Config(fmt_table_cell_list_len=6, fmt_str_lengths=100)
df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,...
1
vote
1
answer
207
views
How do I ensure that a Polars expression plugin properly uses multiple CPUs?
I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function. What might I need to do to ...
0
votes
1
answer
180
views
Polars for Python, can I read parquet files with hive_partitioning when the directory structure and files have been manually written?
I manually created directory structures and wrote parquet files rather than used the partition_by parameter in the write_parquet() function of the python polars library because
I want full control ...
3
votes
1
answer
108
views
Modify list of arrays in place
I have a df like:
# /// script
# requires-python = ">=3.13"
# dependencies = [
# "polars",
# ]
# ///
import polars as pl
df = pl.DataFrame(
{
"points"...
1
vote
1
answer
140
views
Get a grouped sum in polars, but keep all individual rows
I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...
2
votes
1
answer
124
views
how to unnest struct columns without dropping empty structs with r-polars
I have a DataFrame that I need to separate columns when there are commas. The problem is when I have columns that are all null. In the example below, I need a DataFrame with the columns "mpg"...
1
vote
0
answers
130
views
Howto efficiently apply a gufunc to a 2D region of a Polars DataFrame
Both Polars and Numba are fantastic libraries that complement each other pretty well. There are some limitations when using Numba-compiled functions in Polars:
Arrow columns must be converted to ...
1
vote
1
answer
212
views
How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame
I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...
0
votes
0
answers
191
views
Polars out of core sorting and memory usage
From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes ...
0
votes
1
answer
111
views
Polars wheel file
The provided whl files for polars library are tagged as abi3. I am working with specific setup that needs ABI tag to be cp39. I tried unpacking and packing again while changing the tag but still not ...
2
votes
1
answer
167
views
Polars Dataframe from nested dictionaries as columns
I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one ...
-5
votes
1
answer
1k
views
Filtering polars dataframe by row with boolean mask [closed]
I'm trying to filter a Polars dataframe by using a boolean mask for the rows, which is generated from conditions on an specific column using:
df = df[df['col'] == cond]
And it's giving me an error ...
0
votes
0
answers
206
views
Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation
I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code.
All the ...
2
votes
2
answers
149
views
Create a uniform dataset in Polars with cross joins
I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...