1

This is hopefully an easy question for someone out there:

I have one data frame that looks like this:

import pandas as pd
names_raw = {
    'device_id': [ '1d28d33a-c98e-4986-a7bb-5881d222c9a8','54322099-e76d-4986-afd2-0861e2113a16','ec3a9f9d-8e4d-4986-bea8-c17c361366e9','cc8e247d-4e2e-4986-b783-e516d03a358c','ca2d8769-ccf5-4986-8aed-741ca68e94cd','12178e22-6d64-4986-966a-374326fdaf3d','50ba7a2e-a1aa-4986-86a7-08e0605dc702','f427c8e9-65d4-46de-b986-8f8e79242842','cee68e2b-135f-45b0-be4b-7c23009866ba','e785988e-2693-47ad-9899-0049860ccaa7','a1986866-13f8-4dbe-b661-8c9f78eac745','a9998ecd-9fe9-4932-870d-29c6b5df1214','9b88e362-b06d-4317-96f5-f266c986a8d6','a04498ef-fd7c-4aa4-bffc-9158ccbad3a1'],
    'pod_id': ['B00001','B00011','B00013','B00016','B00021','B00023','B00024','B00026','B00027','B00028','B00030','B00032','B00034','B00039'],
    'native_id': ['zim_pod_0001','zim_pod_0002',    'zim_pod_0003', 'zim_pod_0004', 'zim_pod_0005', 'zim_pod_0006', 'zim_pod_0007', 'zim_pod_0008', 'zim_pod_0009', 'zim_pod_0010', 'zim_pod_0011', 'zim_pod_0012', 'zim_pod_0013','zim_pod_0014']
    }
names = pd.DataFrame(names_raw, columns = ['device_id', 'pod_id', 'native_id'])

And another data frame that looks like this:

>>> df
                          device_id      day  month  year  rain
0  1d28d33a-c98e-4986-a7bb-5881d222c9a8   31     12  2016   0.0
1  54322099-e76d-4986-afd2-0861e2113a16   31     12  2016   0.0
2  ec3a9f9d-8e4d-4986-bea8-c17c361366e9   31     12  2016   0.0
3  cc8e247d-4e2e-4986-b783-e516d03a358c   31     12  2016   1.2
4  ca2d8769-ccf5-4986-8aed-741ca68e94cd   31     12  2016   2.2
5  12178e22-6d64-4986-966a-374326fdaf3d   31     12  2016   0.2
6  9b88e362-b06d-4317-96f5-f266c986a8d6   31     12  2016   0.0

I want to replace the device_id column with the native_id column. How can this be done using the least amount of lines of code?

The final data frame should look something like this:

>>> df
                           native_id      day  month  year  rain
0                          zim_pod_0001   31     12  2016   0.0
1                          zim_pod_0002   31     12  2016   0.0
2                          zim_pod_0003   31     12  2016   0.0

etc. etc...

2 Answers 2

1

Try this:

df['native_id'] = df.device_id.map(names.set_index('device_id')['native_id'])

Or if you don't want to preserve device_id column in the df DF:

In [210]: df['native_id'] = df.pop('device_id').map(names.set_index('device_id')['native_id'])

In [211]: df
Out[211]:
   day  month  year  rain     native_id
0   31     12  2016   0.0  zim_pod_0001
1   31     12  2016   0.0  zim_pod_0002
2   31     12  2016   0.0  zim_pod_0003
3   31     12  2016   1.2  zim_pod_0004
4   31     12  2016   2.2  zim_pod_0005
5   31     12  2016   0.2  zim_pod_0006
6   31     12  2016   0.0  zim_pod_0013
Sign up to request clarification or add additional context in comments.

1 Comment

@JAG2024, glad i could help :)
0

Use the merge() method which is built-in to Pandas. It essentially works as a join, and is quite straightforward to use. Specify device_id as the joining key, and then select the columns that you want, like so:

df2 = pd.merge(df,names,on="device_id")[["native_id","day","month","year","rain"]]

Result:

      native_id  day  month  year  rain
0  zim_pod_0001   31     12  2016   0.0
1  zim_pod_0002   31     12  2016   0.0
2  zim_pod_0003   31     12  2016   0.0
3  zim_pod_0004   31     12  2016   1.2
4  zim_pod_0005   31     12  2016   2.2
5  zim_pod_0006   31     12  2016   0.2
6  zim_pod_0013   31     12  2016   0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.