3

I have a dataframe which has more than 10 million raws composed of about 30 columns.

The first column is ID

ID   C
1    1
1    2
1    3
1    2
1    3
2    1
2    5
2    9
2    0
2    1

I would like to extract only the first four rows of each ID(they are the newest inputs as it is already sorted)

I am currently using the below code, but unfortunately it is so slow as it takes about two hours to process about 5% of the data and it may take a day or so to process the whole data.

df1 = pd.DataFrame() # an empty dataframe
for i in df.ID:   # df is the dataframe which contains the data
    df2 = df[df["ID"]== i] 
    df2 = df2[0:4] # take the first four rows
    df_f = df1.append(df2) 

Are there an effecient way to do the same thing in a shorter time.

2
  • Is it guaranteed that there are at least four instances of each ID? Commented Dec 6, 2016 at 3:29
  • Yes there are more than 10 instances for most IDs and I want to get instances from the last four months only and instances are already sorted in a descending order for each ID. Commented Dec 6, 2016 at 3:31

1 Answer 1

2

You need the head() method:

df.groupby("ID").head(4)

enter image description here

Here is a revised version of your original code with run time testing against groupby().head() method:

def loop():
    df1 = pd.DataFrame() # an empty dataframe
    for i in df.ID.drop_duplicates():   # df is the dataframe which contains the data
        df2 = df[df["ID"]== i] 
        df2 = df2[0:4] # take the first four rows
        df1 = pd.concat([df1, df2])
    return df1

%timeit loop()
# 100 loops, best of 3: 1.99 ms per loop

%timeit df.groupby("ID").head(4)
# 1000 loops, best of 3: 485 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

I used your code: df.groupby("ID").head(4) It solves my problem without using a loop. Thanks so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.