0

I am a brand new to programming and am taking a course in Python. I was asked to do linear regression on a data set that my professor gave out. Below is the program I have written (it doesn't work).

from math import *

f=open("data_setshort.csv", "r")
data = f.readlines()
f.close()

xvalues=[]; yvalues=[]

    for line in data:
    x,y=line.strip().split(",")
    x=float(x.strip())
    y=float(y.strip())

    xvalues.append(x)
    yvalues.append(y)

def regression(x,y):
    n = len(x)
    X = sum(x)
    Y = sum(y)

    for i in x:
        A = sum(i**2)
        return A
    for i in x:
        for j in y:
            C = sum(x*y)
        return C
    return C

    D = (X**2)-nA
    m = (XY - nC)/D
    b = (CX - AY)/D

    return m,b

print "xvalues:", xvalues
print "yvalues:", yvalues   

regression(xvalues,yvalues)

I am getting an error that says: line 23, in regression, A = sum (I**2). TypeError: 'float' object is not iterable.

I need to eventually create a plot for this data set (which I know how to do) and for the line defined by the regression. But for now I am trying to do linear regression in Python.

3
  • 1
    i is a single float, there is no point in summing its square (e.g. what is the sum of 4 * 4) Also you might want to restudy what a return statement does. Commented Apr 30, 2015 at 16:59
  • 1
    It doesn't seem like you know what return actually does. Commented Apr 30, 2015 at 16:59
  • this question has nothing to do with linear regression . . . i'd suggest you edit the title. Commented Apr 30, 2015 at 20:52

2 Answers 2

1

You can't sum over a single float, but you can sum over lists. E. g. you probably mean A = sum([xi**2 for xi in x]) to calculate Sum of each element in x to the power of 2. You also have various return statements in your code that don't really make any sense and can probably be removed completely, e. g. return C after the loop. Additionally, multiplication of two variables a and b can only be done by using a*b in python. Simply writing ab is not possible and will instead be regarded as a single variable with name "ab".

The corrected code could look like this:

def regression(x,y):
    n = len(x)
    X = sum(x)
    Y = sum(y)

    A = sum([xi**2 for xi in x])
    C = sum([xi*yi for xi, yi in zip(x,y)])

    D = X**2 - n*A
    m = (X*Y - n*C) / float(D)
    b = (C*X - A*Y) / float(D)

    return (m, b)
Sign up to request clarification or add additional context in comments.

3 Comments

Okay, I knew about using * to do multiplication. Forgetting to have done that probably shows how new I am to this.
I don't understand what you're doing to calculate A and C. It looks like you are doing a for-loop in one line. What is the difference between the two lines you have and a typical for-loop?
[xi**2 for xi in x] means: Take every element xi from list x, increase it to the power of 2 (by **2) and then add it to a new list. [xi*yi for xi, yi in zip(x,y)] means: First build a new list of pairs of x and y values by merging lists x and y. The result would be something like [(x1,y1), (x2,y2), ...]. Then take each of these pairs xi and yi, multiply them by xi*yi and then add them to a new list. In both cases you use sum() to calculate the sum of these lists/values. Generating lists this way is more concise and thus usually easier to read once you understood it.
0

You should probably put in something like A += i**2 As you must understand from the error message that you cannot iterate over a float, which means if i=2 you can't iterate over it as it is not a list, but if as you need to sum all the squares of x, you are iterating over x in for i in x and then you add the squares of i i**2 to A A+=i**2 adn then you return A.

Hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.