I am regressing Y~(X1, X2) such that this fit is perfect (R squared is 1), where the R squared for Y~X1 and Y~X2 are chosen somewhere in the interval $(0,1)$. I am trying to create the numerical example from Silverfish's excellent answer to this question in order to understand how R squared values relate to angles between the data.
I managed to create the code for this example, but I'm not entirely sure how to reconcile the angles I get from my code to the plot I am looking at.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
import math
Angle1 = 45
Angle2 = 120
# u, v computed from https://stats.stackexchange.com/questions/164586/can-the-coefficient-of-determination-r2-be-more-than-one-what-is-its-upper-b
Rad1, Rad2 = math.radians(Angle1), math.radians(Angle2)
Rsquared1, Rsquared2 = math.cos(Rad1)**2, math.cos(Rad2)**2
p1, p2 = np.sqrt(Rsquared1), np.sqrt(Rsquared2)
u = (np.array([2, -1, -1])/np.sqrt(6)).reshape(-1, 1)
v = (np.array([0, 1, -1])/np.sqrt(2)).reshape(-1, 1)
# define X and Y
X1, X2 = u, p1 * u + np.sqrt(1-p1**2)*v
X = np.hstack((X1, X2))
Y = (p1*p2 - np.sqrt(1-p1**2)*np.sqrt(1-p2**2))*u + (p1*np.sqrt(1-p1**2)*np.sqrt(1-p2**2))*v
# Fit joint regression
model = LinearRegression(fit_intercept=False)
model.fit(X, Y)
Yh = model.predict(X)
e = Y - Yh
# Fit univariate regressions
model1 = LinearRegression(fit_intercept=False)
model1.fit(X1, Y)
Y1 = model1.predict(X1)
model2 = LinearRegression(fit_intercept=False)
model2.fit(X2, Y)
Y2 = model2.predict(X2)
costheta = np.linalg.norm(Yh - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))
costheta1 = np.linalg.norm(Y1 - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))
costheta2 = np.linalg.norm(Y2 - np.mean(Y)) / np.linalg.norm(Y - np.mean(Y))
print(f"Angle1 = {Angle1}")
print(f"Angle2 = {Angle2}")
print(f"Rsquared1 = {Rsquared1}")
print(f"Rsquared2 = {Rsquared2}")
print(f"cos(theta)^2 = {costheta ** 2}")
print(f"cos(theta1)^2 = {costheta1 ** 2}")
print(f"cos(theta2)^2 = {costheta2 ** 2}")
print(f"theta: {math.degrees(math.acos(costheta))}")
print(f"theta1: {math.degrees(math.acos(costheta1))}")
print(f"theta2: {math.degrees(math.acos(costheta2))}")
I get
Angle1 = 45
Angle2 = 120
Rsquared1 = 0.5000000000000001
Rsquared2 = 0.24999999999999983
cos(theta)^2 = 1.0
cos(theta1)^2 = 0.2632245247831985
cos(theta2)^2 = 0.05961678695043583
theta: 0.0
theta1: 59.13252220932562
theta2: 75.86747779067439
The angles theta, theta1 and theta2 seem correct as their squared cosines agree with the r2_score function. However, the "nominal" values of R2, namely Rsquared1 and Rsquared2 are different, but I don't see where the disconnect is with Silverfish's answer in the link above. Shouldn't these align with the Angle1 and Angle2 variables I have set above, or are these angles measuring something else?
I have plotted X1, X2, their span, Y1 and Y2 as well as the joint regression Y using this code:
# Plot
mesh_size = 30
scale = np.max(np.abs(Y))
xs = np.linspace(-scale, scale, mesh_size)
U, V = np.meshgrid(xs, xs)
# get basis from X
q, _ = np.linalg.qr(X)
v1, v2 = q[:, 0], q[:, 1]
q1 = U * v1[0] + V * v2[0]
q2 = U * v1[1] + V * v2[1]
q3 = U * v1[2] + V * v2[2]
fig = plt.figure(figsize=(9, 7))
ax = fig.add_subplot(111, projection='3d')
# Column space of X visualised
ax.plot_surface(q1, q2, q3, alpha=0.3, color='lightblue', edgecolor='none', label='Fitted Plane')
# Plot origin and basis vectors
origin = np.array([0, 0, 0])
ax.quiver(*origin, *X1, color='r', linestyle=':', linewidth=1, label='X1')
ax.quiver(*origin, *X2, color='g', linewidth=1, linestyle=':', label='X2')
ax.quiver(*origin, *Y1, color='r', linewidth=2, label='Y ~ X1')
ax.quiver(*origin, *Y2, color='g', linewidth=2, label='Y ~ X2')
ax.quiver(*origin, *Yh, color='b', linewidth=1, label='Y ~ X = (X1, X2)')
ax.scatter(*origin, color='k', s=15, label='origin')
ax.legend()
plt.show()
The prediction "Y hat" of course overlaps with Y we have a perfect fit by design. Can you help me relate what I am seeing with the angles chosen in my script?
Rsquared1is indeed the square of the cosine of the angle betweenX1andX2; you can quickly check withmath.degrees(math.acos(math.sqrt(Rsquared1))). So in this sense the values are alligned; is that what you are looking for?math.degrees(math.acos(math.sqrt(Rsquared1)))is the angle betweenX1andY, notX1andX2? When I use your code, the angles I get are(45, 60). How do I reconcile these angles with the input,Angle1 = 45,Angle2 = 120, and the plots? I