Python pandas dataframe populate hierarchical levels from parent/child relationship

Question

I have a dataframe with information regarding all employers from a given company. All employers should have an ID and the corresponding Manager ID.

Example:
data = pd.DataFrame({'Parent':['a','a','b','c','c','f','q','z','k'],
                      Child':['b','c','d','f','g','h','k','q','w']})

a
├── b
│   └── d
└── c
    ├── f
    │   └── h
    └── g
z
└── q
    └── k
        └── w

(example: w reports to k and k reports to q and q reports to z)

I would like to get a new dataframe which contains information from all employers as follows:

 child  level1  level2  level x
   a      a        -        -
   b      a        -        -
   d      a        b        -
   c      a        -        -
   f      a        c        -
   h      a        c        f
   g      a        c        -
   z      z        -        -
   q      z        -        -
   k      z        q        -
   w      z        q        k

I do not know how many levels there are upfront therefore I have used 'level x'. I guess I somehow need a recursive pattern iterate over the dataframe.

Could you make this question more specific? On the left you want to have the child and then you always want to start from top level inheritance to lowest level inheritance? — Uwe.Schneider
– Uwe.Schneider, Commented Apr 4, 2023 at 15:45
Please provide enough code so others can better understand or reproduce the problem. — Community
– Community Bot, Commented Apr 4, 2023 at 16:41
The bottom part of your expected output doesn't look right. 'z' isn't a parent. — Bill
– Bill, Commented Apr 7, 2023 at 4:15
The second part depicts the reporting structure from high level (left) to lower level (right). z (as well as a) are the two top level managers reporting to no-one. — C. Pappy
– C. Pappy, Commented Apr 7, 2023 at 14:54

C. Pappy · Accepted Answer · 2023-04-07 04:01:40Z

I'm posting this code in the hope that someone other that the OP finds it useful since the OP seems to have lost interest.

Note the the output does NOT exactly meet the OP's requirements.

import pandas as pd

def get_manager(row, column, data):
    manager_ids = data.index[data['Child'] == row[column]].tolist()
    return data['Parent'][manager_ids[0]] if manager_ids else '-'

data = pd.DataFrame({'Parent': ['a','a','b','c','c','f','q','z','k'],
                     'Child':  ['b','c','d','f','g','h','k','q','w']})
staff = sorted(set(list(data['Parent']) + list(data['Child'])))
df = pd.DataFrame(staff, columns=[0])  # we start with all staff in first column
for i in range(len(staff)):  # can't have more than len(staff) columns
    df[i+1] = df.apply(lambda row: get_manager(row, i, data), axis=1)
    if sum(df[i+1].str.count('-')) == len(staff):
        break  # when no higher level managers
print(df)  # we could stop here but the OP wants the order reversed.
for index, row in df.iterrows():
    row = list(row)
    row.reverse()  # We want the top managers first
    i = len(row) - 1 - row[::-1].index('-')  # index of last '-'
    row = row[i+1:] + row[:i]  # we rotate the -'s to the end and drop the 1st col.
    print('  '.join(row))

Output:

    0  1  2  3  4
0   a  -  -  -  -
1   b  a  -  -  -
2   c  a  -  -  -
3   d  b  a  -  -
4   f  c  a  -  -
5   g  c  a  -  -
6   h  f  c  a  -
7   k  q  z  -  -
8   q  z  -  -  -
9   w  k  q  z  -
10  z  -  -  -  -
a  -  -  -
a  b  -  -
a  c  -  -
a  b  d  -
a  c  f  -
a  c  g  -
a  c  f  h
z  q  k  -
z  q  -  -
z  q  k  w
z  -  -  -

Doesn't meet the OP's exact requirements but does almost exactly what I want it to do 2 years later ....

Collectives™ on Stack Overflow

Python pandas dataframe populate hierarchical levels from parent/child relationship

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related