52

I have a dataframe that may or may not have columns that are the same value. For example

    row    A    B
    1      9    0
    2      7    0
    3      5    0
    4      2    0

I'd like to return just

   row    A  
   1      9    
   2      7    
   3      5    
   4      2

Is there a simple way to identify if any of these columns exist and then remove them?

6 Answers 6

84

I believe this option will be faster than the other answers here as it will traverse the data frame only once for the comparison and short-circuit if a non-unique value is found.

>>> df

   0  1  2
0  1  9  0
1  2  7  0
2  3  7  0

>>> df.loc[:, (df != df.iloc[0]).any()] 

   0  1
0  1  9
1  2  7
2  3  7
Sign up to request clarification or add additional context in comments.

8 Comments

+1 thanks for changing. This short circuits on the any, after it's already done the != comparison on every element, so DSM's solution will probably be more efficient... wonder if better short circuiting solution.
In my tests, my solution is always faster than counting the unique elements, although the factor varies from 0.1 for a 10×10 DataFrame to around 0.5 for 10000×10. I think the memory you save by not calculating the full equality array trades off against the extra time involved in counting all the unique values (and maintaining a table of values already seen and so on).
Good point, take back the more efficient! Still wonder if way to short circuit the != after first difference it sees.
Note that a column with NaNs will not be considered constant. This is technically correct (because NaN ≠ Nan), but this is probably not what we want (since there is no practical difference between each NaN).
I have a column that is a timestamp and I get TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp' don't understand why.
|
30

Ignoring NaNs like usual, a column is constant if nunique() == 1. So:

>>> df
   A  B  row
0  9  0    1
1  7  0    2
2  5  0    3
3  2  0    4
>>> df = df.loc[:,df.apply(pd.Series.nunique) != 1]
>>> df
   A  row
0  9    1
1  7    2
2  5    3
3  2    4

4 Comments

df.apply(pd.Series.nunique) is more simply df.nunique(), in Pandas 0.20.3 at least.
And if we want NaN to be considered as a unique value, df.nunique(dropna=False) works well (it handles the fact that NaN ≠ NaN as we expect, counting all NaN values as the same value even though they are not equal).
Another alternative using nunique: df[df.columns[df.nunique() > 1]]
@EricOLebigot Subtle but helpful point about inquality of and uniqueness of NaNs!
15

I compared various methods on data frame of size 120*10000. And found the efficient one is

def drop_constant_column(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    return dataframe.loc[:, (dataframe != dataframe.iloc[0]).any()]

1 loop, best of 3: 237 ms per loop

The other contenders are

def drop_constant_columns(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    result = dataframe.copy()
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
            result = result.drop(column,axis=1)
    return result

1 loop, best of 3: 19.2 s per loop

def drop_constant_columns_2(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
            dataframe.drop(column,inplace=True,axis=1)
    return dataframe

1 loop, best of 3: 317 ms per loop

def drop_constant_columns_3(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    keep_columns = [col for col in dataframe.columns if len(dataframe[col].unique()) > 1]
    return dataframe[keep_columns].copy()

1 loop, best of 3: 358 ms per loop

def drop_constant_columns_4(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    keep_columns = dataframe.columns[dataframe.nunique()>1]
    return dataframe.loc[:,keep_columns].copy()

1 loop, best of 3: 1.8 s per loop

1 Comment

Using len(df.col.unique()) is very expensive. A simple df.col.nunique() will give the same result with significantly less overhead.
4

Assuming that the DataFrame is completely of type numeric:

you can try:

>>> df = df.loc[:, df.var() == 0.0]

which will remove constant(i.e. variance = 0) columns.

If the DataFrame is of type both numeric and object, then you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)

which will drop constant columns of numeric type only.

If you also want to ignore/delete constant enum columns, you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> enum_df = enum_df.loc[:, [True if y !=1 else False for y in [len(np.unique(x, return_counts=True)[-1]) for x in enum_df.T.as_matrix()]]]
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)

1 Comment

Presumably you would want df = df.loc[:, ~df.var() == 0.0] otherwise you are selecting the 0 columns. It's probably also worth doing np.isclose(0, df.var()) for possible floating point errors
0

Here is my solution since I needed to do both object and numerical columns. Not claiming its super efficient or anything but it gets the job done.

def drop_constants(df):
    """iterate through columns and remove columns with constant values (all same)"""
    columns = df.columns.values
    for col in columns:
        # drop col if unique values is 1
        if df[col].nunique(dropna=False) == 1:
            del df[col]
    return df

Extra caveat, it won't work on columns of lists or arrays since they are not hashable.

Comments

0

Many examples in this thread does not work properly. Check this my answer with collection of examples that work

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.