1

I have a dataframe that is as follows:

    MID        POSITION
1   22596394       R8

2   22596394       R8 

3   22596394       R8

4   22591549       R6

5   22591549       R6

6   22591549       R6

Now I have another dataframe which will be the output after running some code which will look like the following:

Position     Usage
R1             0  
R2             0 
R3             0
R4             0
R5             0
R6             1
R7             0 
R8             1
L1             0
L2             0
L3             0 
...           
L8             0

I would like to fill out the Usage column according to the logic below:

Wherever MID changes, note the corresponding POSITION and fill the Usage row corresponding in the output dataframe, for eg: in the above dataframe, R8 and R6 Usage rows should be filled with 1 and the rest Position columns with 0. Similarly if MID changes twice for the same position say R6 for example the R6 Usage row should be filled with 2 and so on. What would be the best way to do this? Thanks in advance!

4
  • 2
    Can you add desired output from input? Commented Oct 20, 2016 at 8:09
  • I've updated the output dataframe. To make it more clear let's say the MID changed 2 times when the positions was still say R6. Then the usage row corresponding to R6 should be filled with 2 and so on. Thanks! Commented Oct 20, 2016 at 8:13
  • Hmmm, but 'MID' is not changed in R6 nor in R8. It is 3 times same value. Commented Oct 20, 2016 at 8:19
  • Sorry I'm unable to make myself clear.Let's say rather MID should be unique and the position is noted. For example in the above table, Usage of R6 and R8 is 1 because it has only one unique MID. Hope that makes it clear. Commented Oct 20, 2016 at 8:21

1 Answer 1

1

I think you need nunique and then reindex:

print (df1.groupby('POSITION')['MID'].nunique())
POSITION
R6    1
R8    1
Name: MID, dtype: int64

print (df1.groupby('POSITION')['MID']
          .nunique()
          .reindex(df2.set_index('Position').index, fill_value=0)
          .rename('Usage')
          .reset_index())
   Position  Usage
0        R1      0
1        R2      0
2        R3      0
3        R4      0
4        R5      0
5        R6      1
6        R7      0
7        R8      1
8        L1      0
9        L2      0
10       L3      0

Explanation:

For geting number of unique values per group need groupby by column POSITION and then aggreagate nunique on column MID. You get new Series with indexes R6 and R8. Then need add another values from df2 and column Position. So if values are unique, one posible solution is create index from column position by set_index and then reindex values in index of df1 by index of df2. Get some NaN, which are replaced by 0 (parameter fill_value=0). Then need create new column from index - first rename Series name by rename and last reset_index - get nice DataFrame.

Sign up to request clarification or add additional context in comments.

6 Comments

Shouldn't it be Usage instead of Position in (df2.set_index('Position').index, fill_value=0) given that I want to fill the Usage column?
Works as usual. Thanks again!
Glad can help you! Nice day!
I don't think you want nunique rather something more like (df.POSITION[1:][~(df.MID.shift(1) == df.MID)[1:]]), given your description. You want the corresponding POSITION when MID changes... at least that is what you described at first, but then you said something about uniqueness...
Rather, something to the effect of : (df.POSITION[1:][~(df.MID.shift(1) == df.MID)[1:]]).value_counts().reindex(['R1','R2','R3','R4','R5','R6'], fill_value=0) or use the clever index trick jezrael used in this answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.