Linear regression using Python (Pandas and Numpy)

Question

I am trying to implement linear regression using python.

I did the following steps:

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following:

regression_coeff = n.polyfit(x,y,1)

And then I get the following error:

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print x and y, I can very clearly see that they are both 1D vectors.

Can someone please help?

Dataset can be found here: DataSets

The original code is:

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)

I was using the IDLE, whatever I have done till now is there in the question above. — Pragyaditya Das
– Pragyaditya Das, Commented Apr 1, 2016 at 14:23

Mike Müller · Accepted Answer · 2016-04-01 16:50:14Z

6

This should work:

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

data is a dataframe and its values are 2D:

>>> data.values.shape
(546, 1)

flatten() turns it into 1D array:

>> data.values.flatten().shape
(546,)

which is needed for polyfit().

Simpler alternative:

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)

edited Apr 1, 2016 at 16:50

answered Apr 1, 2016 at 14:25

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pragyaditya Das Over a year ago

Thanks a lot Mike :) It worked perfectly. Can you please say why it worked when you added the flatten(), what did it actually do?

Mike Müller Over a year ago

Added some explanation.

Stefan · Accepted Answer · 2016-04-01 16:56:15Z

2

pandas.read_csv() returns a DataFrame, which has two dimensions while np.polyfit wants a 1D vector for both x and y for a single fit. You can simply convert the output of read_csv() to a pd.Series to match the np.polyfit() input format using .squeeze():

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()

edited Apr 1, 2016 at 16:56

answered Apr 1, 2016 at 14:43

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

1 Comment

Pragyaditya Das Over a year ago

Worked perfectly. But, can you please give me some basic background, or at least provide a link for a place to refer and learn?

Community · Accepted Answer · 2017-04-24 00:11:06Z

2

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

edited Apr 24, 2017 at 0:11

CommunityBot

11 silver badge

answered Apr 1, 2016 at 14:39

Alessandro

87511 silver badges21 bronze badges

1 Comment

Pragyaditya Das Over a year ago

How is it a 2Dish array. It is clearly seen that I am taking only one column . Please guide me into a better understanding.

Collectives™ on Stack Overflow

Linear regression using Python (Pandas and Numpy)

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related