I have a dataframe df made up of n columns which are groups and one, "data". This dataframe is then grouped on the n group columns.
df = pd.DataFrame(data={"g0": ["foo", "foo", "bar", "bar"],
"g1": ["baz", "baz", "baz", "qux"],
...,
"gn": [...],
"data": [0.1, 0.3, 0.4, 0.2]},
index=["a", "b", "c", "d"])
groups = df.groupby(by=["g0", "g1", ..., "gn"], sort=False)
Then I have a list idx_kept which includes only some of the original dataframe indices e.g. idx_kept = ["a", "b", "d"]. Is there a way to filter groups and keep only the data which initially had the indices in idx_kept? My understanding of DataFrameGroupBy.filter is that it is not appropriate in that case as it uses an aggregate function and removes whole groups.
I could filter df directly to get a df_filtered and do a groups_filtered=df_filtered.groupby(by=["g0", "g1", ..., "gn"], sort=False). However, In my process, I need both groups and groups_filtered so my goal is to avoid a second groupby to save some time. Is there an elegant/fast way to achieve that?
Edit: I realise that I should have had given some more info as I received good answers which did not work for my case. My end goal is to compare len(groups) and len(groups_filtered). In the example, using g0, g1, and idx_kept = ["a", "b", "d"], len(groups) = 3 but len(groups_filtered) = 2 because "c" was the only member of its group. However, if idx_kept = ["a", "c", "d"], len(groups_filtered) = 3 because "b" was part of a group containing "a" and "b". So potentially, there is another approach to do that than the one I thought about.