I've written a custom function in Polars to generate a horizontal forward/backward fill list of expressions. The function accepts an iterable of expressions (or column names) to determine the order of filling. I want to to use all columns via pl.all() as default. The problem is that pl.all() returns a single expression rather than an iterable, so trying to reverse or iterate over it leads to a TypeError.
Is there a way to convert between single expressions and iterables of expressions? Any suggestions or workarounds are greatly appreciated!
Here is the function:
from typing import Iterable
from polars._typing import IntoExpr
import polars as pl
def fill_horizontal(exprs: Iterable[IntoExpr], forward: bool = True) -> list[pl.Expr]:
"""Generate a horizontal forward/backward fill list of expressions."""
# exprs = exprs or pl.all() # use all columns as default
cols = [col for col in reversed(exprs)] if forward else exprs
return [pl.coalesce(cols[i:]) for i in range(0, len(cols) - 1)]
Here is an example:
df = pl.DataFrame({
"col1": [1, None, 2],
"col2": [1, 2, None],
"col3": [None, None, 3]})
print(df)
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ null │
# │ null ┆ 2 ┆ null │
# │ 2 ┆ null ┆ 3 │
# └──────┴──────┴──────┘
print('forward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=True)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ 1 │
# │ null ┆ 2 ┆ 2 │
# │ 2 ┆ 2 ┆ 3 │
# └──────┴──────┴──────┘
print('backward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=False)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞══════╪══════╪══════╡
# │ 1 ┆ 1 ┆ null │
# │ 2 ┆ 2 ┆ null │
# │ 2 ┆ 3 ┆ 3 │
# └──────┴──────┴──────┘
Edit: Merging @Henry Harbeck's answer and @jqurious's comment seems to be not perfect but a sufficient solution as of now.
def fill_horizontal(
exprs: Iterable[IntoExpr] | None = None,
*,
forward: bool = True,
ncols: int = 1000) -> pl.Expr:
"""Generate a horizontal forward/backward fill expression."""
if exprs is None:
# if forward is false, ncols has to be defined with the present number of cols or more
cols = pl.all() if forward else pl.nth(range(ncols, -1, -1))
else:
cols = exprs if forward else reversed(exprs)
return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()
pl.all()instead ofdf.columns? Sorry for being dense I'm on mobile.df.with_columns(fill_horizontal())might be called. Especially, they want a generic expression to iterate over all columns available in the enclosing context.pl.all()which I don't think is currently possible with just expressions. The closest isnthbut it requires you specify the number of columns. You could use some "huge" numberpl.nth(range(100_000, -1, -1))but it would break if the frame had more columns.pl.all()couldn't be used as one cannot dopl.coalesce(...) for i in len(pl.all()), no?