3
$\begingroup$

I am currently working a lot with projective transformations for 3d rendering but also pose estimation, and I still have a hard time wrapping my head around the underlying math. Usually my intuition suffices though, except for one thing.

So Currently I want to "back project" an image point into the 3d space, with the constraint that the 3d point I Am looking for lies on the ground plane. Without the tools of the projective transformation I would simply do the ray-plane intersection math by solving the relevant equations. But I wonder wether I can express this using the "right" projective transformation, which would be nice, since then I could simply use a single matrix for all calculations. I have the camera parameters (focal length and optical center), its rotation and translation.

$\endgroup$
0

1 Answer 1

2
$\begingroup$

It’s pretty easy to construct a matrix that performs this back mapping using the same sort of trick that’s often used to derive the perspective projection matrix. I’ll follow the common computer graphics convention of using row vectors to represent points and vectors and right-multiplication by a matrix to apply a transformation.

In the canonical camera coordinate system, the camera is at the origin and sights along the negative $z$-axis. The image plane is taken to be $z=f$, with $f\lt0$, so a point $P$ on the image plane has coordinates of the form $(x,y,f)$. (It’s also common to have the camera point in the positive direction instead, but that doesn’t affect this solution.) The back-projection of a point on the image plane to the ground plane is the intersection of the ray $\lambda P=\lambda(x,y,f)$ with the ground plane. If we plug this into the normal equation of the ground plane $\mathbf n\cdot\mathbf p=d$ and solve for $\lambda$, we find that the corresponding point $P'$ on the ground plane is $$P'={d\over\mathbf n\cdot P}P.$$ Observe that this blows up when $\mathbf n\cdot P=0$, but in that case the ray through $P$ is parallel to the ground plane, so the intersection is at infinity.

The homogeneous coordinates of this ground-plane point are $$P'={dx\over\mathbf n\cdot P}:{dy\over\mathbf n\cdot P}:{df\over\mathbf n\cdot P}:1=dx:dy:df:\mathbf n\cdot P$$ from which we can directly construct the transformation matrix $$\begin{bmatrix}d&0&0&n_x\\0&d&0&n_y\\0&0&d&n_z\\0&0&0&0\end{bmatrix}.$$ All that’s left to do for a complete solution is to transform the resulting point from camera to world coordinates.

Computing $\mathbf n$ and $d$ is also easy if you don’t happen to have them handy. Take any three noncolinear points $P_0$, $P_1$ and $P_2$ on the ground plane. Then $\mathbf n=(P_1-P_0)\times(P_2-P_0)$ and $d=\mathbf n\cdot P_0$ (or the dot product with either of the other two points). If $d=0$, then the ground plane passes through the origin, i.e. the camera is at ground level and this back-mapping is impossible since every ray through a point on the image plane either lies in the ground plane or intersects it at the camera’s position. Remember that these coordinates are camera-relative, so you might have to map from world to camera coordinates first before deriving these two values.

It’s probably easiest just to append $f$ and $1$ to the $(x,y)$-coordinates of a point in the image plane, but if you like you can incorporate some of that into a matrix as well. To map from $(x,y,1)$ to $(x,y,f,1)$ multiply by the matrix $$\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&f&1\end{bmatrix}.$$ Combining this with the back-mapping derived above gives $$\begin{bmatrix}d&0&0&n_x\\0&d&0&n_y\\0&0&fd&fn_z\end{bmatrix}$$ for the back-mapping matrix.

$\endgroup$
5
  • $\begingroup$ Ah, cool. Ok, the part where you use the normal equation and the trick with the scaling of the homogenous coordinates really unknots it for me! $\endgroup$ Commented Apr 5, 2017 at 8:09
  • $\begingroup$ but the "multiply-from-right" convention is new to me... $\endgroup$ Commented Apr 5, 2017 at 8:14
  • $\begingroup$ It’s an old convection that’s still used in computer graphics, in OpenGL for instance. It does make some sense if you think of a pipeline of transformations—when you add another transformation to the pipeline, it might seem more natural for folks who normally read from left to right for the end of the pipeline to be on the right. Besides, it saves having to put those superscript $T$s on all of the vectors ;) $\endgroup$ Commented Apr 5, 2017 at 23:06
  • $\begingroup$ If you have the camera matrix $\mathtt P=\left[\mathtt M\mid\mathbf p_4\right]$, you can construct the world-coordinate back-projection matrix directly: $\left(\mathbf C\mathbf\Pi^T-\mathbf C^T\mathbf\Pi\mathtt I_4\right)\mathtt P^+$. $\mathbf C$ is the camera center, $\mathbf\Pi$ is the homogeneous vector $(\mathbf n^T,-d)^T$ of the ground plane, and $\mathtt P^+$ is the pseudo-inverse $\mathtt P^T(\mathtt P\mathtt P^T)^{-1}$. Alternatively, is it $\left(\mathbf C\mathbf\Pi^T-\mathbf C^T\mathbf\Pi\mathtt I_4\right)\left[\mathtt M^{-T}\mid\mathbf 0\right]^T$. $\endgroup$ Commented Jul 10, 2018 at 0:32
  • $\begingroup$ N.B.: The previous comment uses the usual mathematical convention of column vectors left-multiplied by the transformation matrix—transposed relative to the answer. $\endgroup$ Commented Jul 12, 2018 at 18:50

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.