Issue with creating new variable with python

Question

I'm very new to Python but have an analysis to do.

I have created a new variable from the data set data_2015 and data 2016:

rfd_per_district_15 = data_2015[['Nom_Districte', 'Índex RFD Barcelona = 100']].groupby(['Nom_Districte'], as_index=False).sum()['Índex RFD Barcelona = 100']
rfd_per_district_16 = data_2016[['Nom_Districte', 'Índex RFD Barcelona = 100']].groupby(['Nom_Districte'], as_index=False).sum()['Índex RFD Barcelona = 100']

but the outcomes are very different. The 'rfd_per_district_16' came out exactly the way I would like it to come out. In float format, so I can carry on working with it

0     367.7
1     719.4
2     523.3
3     921.2
4     476.6
5     676.0
6     494.4
7     952.6
8     604.0
9    1054.8
Name: Índex RFD Barcelona = 100, dtype: float64

but 'rfd_per_district_15' came out very strangely. Like data from multiple lines get attached to each other:

0                                     75.8108.576.696.4
1                          104.895.8165.8128.9103.898.6
2                              111.289.0114.4106.3103.0
3          90.488.182.795.256.974.593.572.392.689.380.9
4                                       124.5109.9250.5
6     65.061.448.851.155.955.647.855.454.035.647.134...
7                          43.160.258.075.676.870.185.8
8           78.784.796.8150.295.6162.554.4102.868.357.5
9                      74.336.970.483.282.274.377.088.2
10                       151.7199.1214.1188.9205.1141.0
Name: Índex RFD Barcelona = 100, dtype: object

I see the difference as 'rfd_per_district_15' came out as object type, but why? I had to delete index [5] in 'rfd_per_district_15' as there was some weird value, but even after that data came out strangely (not as rfd_per_district_16). I know just the basics of python so really don't know how to figure that out ...

sam · Accepted Answer · 2021-05-12 13:07:51Z

1

Can you tell the difference between

Name: Índex RFD Barcelona = 100, dtype: float64
Name: Índex RFD Barcelona = 100, dtype: object

Problem is with the data type /dtype of the column

Try to convert the data type of the dataframe data_2015 to match the data_2016

Use data_2015.dtypes to check the dtypes of columns.

answered May 12, 2021 at 13:07

sam

1,9521 gold badge18 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yuliya Hilevich Over a year ago

Hi Sam, yes you are right, for some reason data_15[Índex RFD Barcelona] is stored as object type, I realised that also. I tried: rfd_per_district_15 = list(np.float_(rfd_per_district_15)) as I did with other variables from the same dataset, bit its says it couldn't convert string to float

Yuliya Hilevich · Accepted Answer · 2021-05-12 18:58:19Z

1

data_2015["Índex RFD Barcelona = 100"] = data_2015['Índex RFD Barcelona = 100'].astype('float')

with this code, I managed to convert the type of column. Thank you @sam

answered May 12, 2021 at 18:58

Yuliya Hilevich

233 bronze badges

Collectives™ on Stack Overflow

Issue with creating new variable with python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related