1

I don't know whether it can be achieved or not using python pandas. Here is the scenario I'm trying to do

I created a databases connection to MSSQL using python (pyodbc, sqlalchemy)

I read one table and saved it as dataframe like this

data = pd.read_sql_table('ENCOUNTERP1', conn)

and the dataframe looks like this

ENCOUNTERID DIAGCODE DIAGSEQNO POA DIAGVERFLAG
0        78841   3GRNFC         3   P
1        89960                  6
2        86479  N18BZON         9   K
3        69135    MPPY3         9   9           0
4        32422   DS6SBT         2               P
5        69135                  4   D           H
6        92019      PP0         1
7        42105                  2               L
8        99256        U         1               J
9        33940  II9ZODF         3   2
10       33940       OH         1
11       65108   CI6COE         8   U
12       77871   Y3ZHN1         7               S
13       65108  73BJBZV         8   7
14       99256        7         1               T

Now I have one more dataframe (dp = pd.read_sql_table('tblDiagnosis', conn))which has DIAGCODE column in it and they all are unique

I want to get those DIAGCODE from dataframe dp and update it to dataframe data['DIAGCODE']

I tried to do like this iterate over each row and update another dataframe row by row but here in this code the second for loop will start from 0 index every time so, finally the entire row is filled with one value.

for index, row in dp.iterrows(): 
        for i, r in data.iterrows():
            r['DIAGCODE'] = row['Code']

First of all the two dataframe's are not equal in size this is dataframe for data

Code Description Category IcdSet 0 001 001 - CHOLERA CHOLERA 9 1 0010 0010 - CHOLERA D/T V. CHOLERAE CHOLERA 9 2 0011 0011 - CHOLERA D/T V. EL TOR CHOLERA 9 3 0019 0019 - CHOLERA NOS CHOLERA 10 4 002 002 - TYPHOID/PARATYPHOID FEV TYPHOID AND PARATYPHOID FEVERS 9 5 0020 0020 - TYPHOID FEVER TYPHOID AND PARATYPHOID FEVERS 9

and the output should be something like this

ENCOUNTERID DIAGCODE DIAGSEQNO POA DIAGVERFLAG 0 78841 001 3 P 1 89960 0010 6 2 86479 0011 9 K 3 69135 0019 9 9 0 4 32422 002 2 P 5 69135 0020 4 D H

I would like to add one condition from dataframe dp like this

for index, row in dp.iterrows(): for i, r in data.iterrows(): if row['Code'] == 10: r['DIAGCODE'] = row['Code']

5
  • What is the connecting link between both tables? Some ID? Commented Feb 19, 2018 at 17:30
  • 1
    I don't think you want to do row-by-row updates because that just sounds like you want to replace the column of data with a column from dp. I think what you want is actually a join. To better understand your intent, are you trying to get the code descriptions from dp and add them to data? Commented Feb 19, 2018 at 17:30
  • As a general rule of thumb you should: 1. post all relevant data to reproduce your problem and 2 show the code you tried and lastly 3 expected output. We're lacking input from 1 and 3. Commented Feb 19, 2018 at 17:37
  • @Scratch'N'Purr yes, I want to copy whole column from dp to data but what if I had some condition in between so I can update row by row ? Commented Feb 19, 2018 at 18:03
  • @supar Could you possibly update your question with an example of your input files from both sources and what your expected output is? Also elaborate on what the condition is. Commented Feb 19, 2018 at 18:15

1 Answer 1

1

I assume that the two tables have same row size and are both in desired order you wanted. If it's correct, then you can simply use:

df = pd.concat([data, pd], axis=1)

Then extract the columns you wanted:

df = df.ix[;,['ENCOUNTERID','CODE', 'DIAGSEQNO', 'POA', 'DIAGVERFLAG']].rename(columns={'CODE': 'DIAGCODE'})

If this meets your requirement, please vote.


Sorry, the .ix was deprecated even it can still be used without problem. So please use

df = df[['ENCOUNTERID','CODE', 'DIAGSEQNO', 'POA', 'DIAGVERFLAG']].rename(columns={'CODE': 'DIAGCODE'})

BTW, the issue in your code is that you were using two loops which makes the last value of inside loop to be the final value of outside loop. So here is solution:

for row, r in zip(pd.iterrows(),data.iterrows()):
    r[1]['DIAGCODE']=row[1]['CODE']
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for the answer though the dataframes are not equal in size what I can do in this case ?
Could you please describe the rule of order for the concatenation? If they are not in the same size, how you drop off the extra rows in the bigger size data set?
Please add the condition of that how these 2 dataframes connected and what is the common key values in both dataframe.
Your modification confused me. It looks like for the sample you provided, the data and dp simply connected row by row without condition. But you said later that the condition is that row['Code'] in dp == 10 and then the r['DIAGCODE'] in data will be replaced by row['Code'] which is 10. That means all the r['DIAGCODE'] will be 10. It doesn't make any sense. Please review your statement and make it clear.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.