I have a dataset that is approximately 30,000 rows & approximately 30 columns, consisting of:
- 1 column is a unique identifier (SSN)
- 6 columns are of diagnostic ICD codes
- 6 columns are of descriptions of the respective ICD diagnostic code
Here's what the original data looks like:
enter image description here The original data
I would like to restructure the data, to 'combine' the diagnostic codes & diagnostic descriptions, so that each row consists of 1 SSN identifier, 1 diagnostic code, and it's respective diagnostic code. So that SSNs with multiple diagnoses are on separate rows.
Here's what the structure of the table I would like to look like:
enter image description here The structure of the data, after transposing the variables.
Since I am using SAS EG, I am using the 'Transpose' task. The options SAS EG gives when assigning variables are:
- Transpose variables (I've tried ICD codes individually & ICD codes/descriptions)
- Copy variables (Those variables not SSN/ICD related)
- New column names (limit 1): (I am not including anything here)
- Group analysis by: (I am using SSN for this variable).
When I attempt to transpose both ICD Codes & Description columns, it looks like this: Pic from SAS EG Results post transpose
SAS EG creates 4 new variables: Column1 - Column4; I realize these come from the 'New column names' option from above, but it's mixing both ICD codes & descriptions in all 4 columns.
Regardless if I try to 'transpose' the ICD code columns first, while 'copying' the 'Descriptions' columns in step 1, and transposing the 'Descriptions' in a 2nd step, it continues to combine the columns, so that there are both Codes & Descriptions in the newly created SAS 'Column1 - Column4' variables.
Am I not using the correct task (proc transpose) to have it so that the first column is the SSN, the 2nd column is the ICD code, and the 3rd column is the respective ICD description?
Thanks for reading so far down!