Extract subexpression, and express them as identities.
There are two distinct questions here:
- When doing manipulations, how can I best use writing as a tool?
- How should I present proofs that use manipulations? Proofs should be succinct and easy to follow, yet explicitly show their correctness.
When manipulating, if there's any doubt, write it out. Writing a manipulation out, neatly and clearly, is faster and less straining than doing it in your head; and it leaves your mind free to assess reasonability and notice unexpected patterns. It's a pure win: the only downsides are the cost of a separate section in your notebook, the few movements of your fingers, and giving up bragging rights ("I did it in my head").
However, you need a good strategy for how you will organize and perform these manipulations. The naive strategy, of rewriting the entire expression anew each line is terrible. It's tedious to write, difficult to read, and error prone. It's akin to programming without subroutines or classes: you have one mess of code with no structure or boundaries.
Instead, you must extract self-contained subproblems, and manipulate them in isolation. Just like in software engineering, extract subproblems or expressions that have self-contained meaning (identity) and can be manipulated in isolation (interface). These can, and should, be as trivial as a simple integral or identity: the simpler the better. Elegance and power come not from complexity, but from the simplest building blocks providing unexpected results.
As you work the problem, you'll find new ways of organizing and extracting, until you've reached the solution.
Then, it's time to present your work. Your extractions will provide good suggestions for how to present it; often these can simply be stated as identities without needing to show any interim steps. If not, pick one or two or at most three key interim steps to show; any more suggests a need for better decomposition.
Example
Let's illustrate this with a simple, concrete problem:
Let $X \sim \text{Exponential}(\lambda = 1)$. Bob repeatedly samples $X$ until $X_i > k$. What is the expected value of $X_i$?
Work: Reasoning, we realize the answer must be
$$
\int_k^\infty e^kxe^{-x}\ dx.
$$
Evaluating that is a routine manipulation, but hard to do in our head. So we extract a subproblem $\int xe^{-x}\ dx$. This is a perfect subproblem, because it has universal meaning, and cleanly disconnected from the other particulars of the problem. We guess $-xe^{-x}$ and proceed to check and adjust, writing out each step explicitly:
$$
\begin{align*}
[-xe^{-x}]'
&= &-x[e^{-x}]' &+ [-x]'e^{-x} \\
&= &xe^{-x} &- e^{-x} \\
[-xe^{-x} - e^{-x}]'
&= &xe^{-x} &- e^{-x} + e^{-x}\\
&= &xe^{-x} &&✔
\end{align*}
$$
With that identity, evaluating the integral is straightforward (but still too complex to do without writing):
$$
e^k[xe^{-x} + e^{-x}]\bigg\rvert_\infty^k = k + 1.
$$
Write-up: Now we go to present our work, and immediately realize our notes above leave out the most important step: Where did this expression come from? We can be explicit without being verbose:
For any continuous rv $X$ with PDF $f$, $$\mathbb E[X \mid X > k] = \int_k^\infty x \cdot \frac{f(x)}{P[X > k]} \ dx$$ (where defined).
In our case, with $f(x) = e^{-x}$, we get $$\mathbb E[X \mid X > k] = \int_k^\infty x \cdot \frac{e^{-x}}{e^{-k}} \ dx = k + 1$$
since $P[X > k] = \int_k^\infty e^{-x} \ dx = e^{-k}$ and $\int xe^{-x} \ dx = -xe^{-x} -e^{-x}$.
Voila! We've presented the proof clearly and succinctly, while demonstrating its method and correctness.
Wait a second: Writing it up this way makes me realize an alternative I like even better (because it hints at the memoryless property):
$$\int_k^\infty x \cdot \frac{e^{-x}}{e^{-k}} \ dx = \int_0^\infty (x+k)e^{-x} \ dx = k + 1.$$
I can do this in my head, since I know that the expected value of an Exponential, $ \int_0^\infty xe^{-x}dx$, is $1$. So this gives our final step:
When done, interpret your conclusions; this may obviate the need for manipulation.