How do I select the longest string from a list of strings in polars?
Example and expected output:
import polars as pl
df = pl.DataFrame({
"values": [
["the", "quickest", "brown", "fox"],
["jumps", "over", "the", "lazy", "dog"],
[]
]
})
┌──────────────────────────────┬────────────────┐
│ values ┆ longest_string │
│ --- ┆ --- │
│ list[str] ┆ str │
╞══════════════════════════════╪════════════════╡
│ ["the", "quickest", … "fox"] ┆ quickest │
│ ["jumps", "over", … "dog"] ┆ jumps │
│ [] ┆ null │
└──────────────────────────────┴────────────────┘
My use case is to select the longest overlapping match.
Edit: elaborating on the longest overlapping match, this is the output for the example provided by polars:
┌────────────┬───────────┬─────────────────────────────────┐
│ values ┆ matches ┆ matches_overlapping │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ list[str] │
╞════════════╪═══════════╪═════════════════════════════════╡
│ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"] │
└────────────┴───────────┴─────────────────────────────────┘
I desire a way to select the longest match in matches_overlapping.