1

I am regressing Y~(X1, X2) such that this fit is perfect (R squared is 1), where the R squared for Y~X1 and Y~X2 are chosen somewhere in the interval $(0,1)$. I am trying to create the numerical example from Silverfish's excellent answer to this question in order to understand how R squared values relate to angles between the data.

I managed to create the code for this example, but I'm not entirely sure how to reconcile the angles I get from my code to the plot I am looking at.

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

import math

Angle1 = 45
Angle2 = 120

# u, v computed from https://stats.stackexchange.com/questions/164586/can-the-coefficient-of-determination-r2-be-more-than-one-what-is-its-upper-b
Rad1, Rad2 = math.radians(Angle1), math.radians(Angle2)
Rsquared1, Rsquared2 = math.cos(Rad1)**2, math.cos(Rad2)**2
p1, p2 = np.sqrt(Rsquared1), np.sqrt(Rsquared2)

u = (np.array([2, -1, -1])/np.sqrt(6)).reshape(-1, 1)
v = (np.array([0, 1, -1])/np.sqrt(2)).reshape(-1, 1)

# define X and Y
X1, X2 = u, p1 * u + np.sqrt(1-p1**2)*v
X = np.hstack((X1, X2))
Y = (p1*p2 - np.sqrt(1-p1**2)*np.sqrt(1-p2**2))*u + (p1*np.sqrt(1-p1**2)*np.sqrt(1-p2**2))*v

# Fit joint regression
model = LinearRegression(fit_intercept=False)
model.fit(X, Y)
Yh = model.predict(X)
e = Y - Yh

# Fit univariate regressions
model1 = LinearRegression(fit_intercept=False)
model1.fit(X1, Y)
Y1 = model1.predict(X1)

model2 = LinearRegression(fit_intercept=False)
model2.fit(X2, Y)
Y2 = model2.predict(X2)

costheta = np.linalg.norm(Yh - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))
costheta1 = np.linalg.norm(Y1 - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))
costheta2 = np.linalg.norm(Y2 - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))

print(f"Angle1 = {Angle1}")
print(f"Angle2 = {Angle2}")
print(f"Rsquared1 = {Rsquared1}")
print(f"Rsquared2 = {Rsquared2}")

print(f"cos(theta)^2 = {costheta ** 2}")
print(f"cos(theta1)^2 = {costheta1 ** 2}")
print(f"cos(theta2)^2 = {costheta2 ** 2}")

print(f"theta:  {math.degrees(math.acos(costheta))}")
print(f"theta1:  {math.degrees(math.acos(costheta1))}")
print(f"theta2:  {math.degrees(math.acos(costheta2))}")

I get

Angle1 = 45
Angle2 = 120
Rsquared1 = 0.5000000000000001
Rsquared2 = 0.24999999999999983
cos(theta)^2 = 1.0
cos(theta1)^2 = 0.2632245247831985
cos(theta2)^2 = 0.05961678695043583
theta:  0.0
theta1:  59.13252220932562
theta2:  75.86747779067439

The angles theta, theta1 and theta2 seem correct as their squared cosines agree with the r2_score function. However, the "nominal" values of R2, namely Rsquared1 and Rsquared2 are different, but I don't see where the disconnect is with Silverfish's answer in the link above. Shouldn't these align with the Angle1 and Angle2 variables I have set above, or are these angles measuring something else?

I have plotted X1, X2, their span, Y1 and Y2 as well as the joint regression Y using this code:

# Plot
mesh_size = 30
scale = np.max(np.abs(Y))
xs = np.linspace(-scale, scale, mesh_size)
U, V = np.meshgrid(xs, xs)

# get basis from X
q, _ = np.linalg.qr(X)
v1, v2 = q[:, 0], q[:, 1]
q1 = U * v1[0] + V * v2[0]
q2 = U * v1[1] + V * v2[1]
q3 = U * v1[2] + V * v2[2]

fig = plt.figure(figsize=(9, 7))
ax = fig.add_subplot(111, projection='3d')
# Column space of X visualised
ax.plot_surface(q1, q2, q3, alpha=0.3, color='lightblue', edgecolor='none', label='Fitted Plane')

# Plot origin and basis vectors
origin = np.array([0, 0, 0])
ax.quiver(*origin, *X1, color='r', linestyle=':', linewidth=1, label='X1')
ax.quiver(*origin, *X2, color='g', linewidth=1, linestyle=':', label='X2')
ax.quiver(*origin, *Y1, color='r', linewidth=2, label='Y ~ X1')
ax.quiver(*origin, *Y2, color='g', linewidth=2, label='Y ~ X2')
ax.quiver(*origin, *Yh, color='b', linewidth=1, label='Y ~ X = (X1, X2)')
ax.scatter(*origin, color='k', s=15, label='origin')
ax.legend()
plt.show()

The prediction "Y hat" of course overlaps with Y we have a perfect fit by design. Can you help me relate what I am seeing with the angles chosen in my script?

3
  • 1
    Can you provide the plotting code? Commented Nov 5 at 0:05
  • 1
    From the way you set it up, Rsquared1 is indeed the square of the cosine of the angle between X1 and X2; you can quickly check with math.degrees(math.acos(math.sqrt(Rsquared1))). So in this sense the values are alligned; is that what you are looking for? Commented Nov 5 at 8:48
  • I forgot to add the plotting code @NickODell, done! @mcvincekova is it not true that math.degrees(math.acos(math.sqrt(Rsquared1))) is the angle between X1 and Y, not X1 and X2? When I use your code, the angles I get are (45, 60). How do I reconcile these angles with the input, Angle1 = 45, Angle2 = 120, and the plots? I Commented Nov 5 at 14:28

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.