It’s pretty easy to construct a matrix that performs this back mapping using the same sort of trick that’s often used to derive the perspective projection matrix. I’ll follow the common computer graphics convention of using row vectors to represent points and vectors and right-multiplication by a matrix to apply a transformation.
In the canonical camera coordinate system, the camera is at the origin and sights along the negative $z$-axis. The image plane is taken to be $z=f$, with $f\lt0$, so a point $P$ on the image plane has coordinates of the form $(x,y,f)$. (It’s also common to have the camera point in the positive direction instead, but that doesn’t affect this solution.) The back-projection of a point on the image plane to the ground plane is the intersection of the ray $\lambda P=\lambda(x,y,f)$ with the ground plane. If we plug this into the normal equation of the ground plane $\mathbf n\cdot\mathbf p=d$ and solve for $\lambda$, we find that the corresponding point $P'$ on the ground plane is $$P'={d\over\mathbf n\cdot P}P.$$ Observe that this blows up when $\mathbf n\cdot P=0$, but in that case the ray through $P$ is parallel to the ground plane, so the intersection is at infinity.
The homogeneous coordinates of this ground-plane point are $$P'={dx\over\mathbf n\cdot P}:{dy\over\mathbf n\cdot P}:{df\over\mathbf n\cdot P}:1=dx:dy:df:\mathbf n\cdot P$$ from which we can directly construct the transformation matrix $$\begin{bmatrix}d&0&0&n_x\\0&d&0&n_y\\0&0&d&n_z\\0&0&0&0\end{bmatrix}.$$ All that’s left to do for a complete solution is to transform the resulting point from camera to world coordinates.
Computing $\mathbf n$ and $d$ is also easy if you don’t happen to have them handy. Take any three noncolinear points $P_0$, $P_1$ and $P_2$ on the ground plane. Then $\mathbf n=(P_1-P_0)\times(P_2-P_0)$ and $d=\mathbf n\cdot P_0$ (or the dot product with either of the other two points). If $d=0$, then the ground plane passes through the origin, i.e. the camera is at ground level and this back-mapping is impossible since every ray through a point on the image plane either lies in the ground plane or intersects it at the camera’s position. Remember that these coordinates are camera-relative, so you might have to map from world to camera coordinates first before deriving these two values.
It’s probably easiest just to append $f$ and $1$ to the $(x,y)$-coordinates of a point in the image plane, but if you like you can incorporate some of that into a matrix as well. To map from $(x,y,1)$ to $(x,y,f,1)$ multiply by the matrix $$\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&f&1\end{bmatrix}.$$ Combining this with the back-mapping derived above gives $$\begin{bmatrix}d&0&0&n_x\\0&d&0&n_y\\0&0&fd&fn_z\end{bmatrix}$$ for the back-mapping matrix.