4

I am trying to implement linear regression using python.

I did the following steps:

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following:

regression_coeff = n.polyfit(x,y,1)

And then I get the following error:

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print x and y, I can very clearly see that they are both 1D vectors.

Can someone please help?

Dataset can be found here: DataSets

The original code is:

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)
11
  • 2
    There is no and use the following:: Commented Apr 1, 2016 at 14:20
  • I did not get you. Commented Apr 1, 2016 at 14:21
  • 2
    You probably forgot to paste your code. Commented Apr 1, 2016 at 14:22
  • I was using the IDLE, whatever I have done till now is there in the question above. Commented Apr 1, 2016 at 14:23
  • 3
    Sorry, can't help debug code that I cannot see. Commented Apr 1, 2016 at 14:29

3 Answers 3

6

This should work:

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

data is a dataframe and its values are 2D:

>>> data.values.shape
(546, 1)

flatten() turns it into 1D array:

>> data.values.flatten().shape
(546,)

which is needed for polyfit().

Simpler alternative:

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot Mike :) It worked perfectly. Can you please say why it worked when you added the flatten(), what did it actually do?
Added some explanation.
2

pandas.read_csv() returns a DataFrame, which has two dimensions while np.polyfit wants a 1D vector for both x and y for a single fit. You can simply convert the output of read_csv() to a pd.Series to match the np.polyfit() input format using .squeeze():

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()

1 Comment

Worked perfectly. But, can you please give me some basic background, or at least provide a link for a place to refer and learn?
2

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. You can transform your data in a numpy array and squeeze it to fix your problem.

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

1 Comment

How is it a 2Dish array. It is clearly seen that I am taking only one column . Please guide me into a better understanding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.