I have a wide data frame with several years:
df = pd.DataFrame(index=pd.Index([29925, 223725, 280165, 813285, 956765], name='ID'),
columns=pd.Index([1991, 1992, 1993, 1994, 1995, 1996, '2010-2012'], name='Year'),
data = np.array([[np.NaN, np.NaN, 16, 17, 18, 19, np.NaN],
[16, 17, 18, 19, 20, 21, np.NaN],
[np.NaN, np.NaN, np.NaN, np.NaN, 16, 17, 31],
[np.NaN, 22, 23, 24, np.NaN, 26, np.NaN],
[36, 36, 37, 38, 39, 40, 55]]))
Year 1991 1992 1993 1994 1995 1996 2010-2012
ID
29925 NaN NaN 16.0 17.0 18.0 19.0 NaN
223725 16.0 17.0 18.0 19.0 20.0 21.0 NaN
280165 NaN NaN NaN NaN 16.0 17.0 31.0
813285 NaN 22.0 23.0 24.0 NaN 26.0 NaN
956765 36.0 36.0 37.0 38.0 39.0 40.0 55.0
The values in each row are the age of each person, with each holding a unique ID. I want to fill the NaN of this data frame in each year of every row, based on the existing age values in each row.
For example, ID 29925 is 16 in 1993, we know they are 15 in 1992 and 14 in 1991, therefore we want to replace the NaN for 29925 in the columns 1992 and 1991. Similarly, I want to replace the NaN in the column2010-2012 based on the existing age values for 29925. Let's assume that 29925 is 15 years older from 1996 in the 2010-2012 column. What is the fastest way to do this for the whole data frame - i.e for all IDs?


35 NaN 36and it becomes impossible to know whether thatNaNshould become 35 or 36...NaNs, it should be filled according to a general rule of +1 or -1 according to the year.