1
$\begingroup$

The Deep Learning Book (Goodfellow Et al. 2016) defines a random variable as (see section 3.2):

A random variable is a variable that can take on different values randomly. We typically denote the random variable itself with a lower case letter in plain typeface, and the values it can take on with lower case script letters. For example, x1 and x2 are both possible values that the random variable x can take on. For vector-valued variables, we would write the random variable as x and one of its values as x. On its own, a random variable is just a description of the states that are possible; it must be coupled with a probability distribution that specifies how likely each of these states are.

Random variables may be discrete or continuous. A discrete random variable is one that has a finite or countably infinite number of states. Note that these states are not necessarily the integers; they can also just be named states that are not considered to have any numerical value. A continuous random variable is associated with a real value.

This definition seems to be different from the regular definition of a random variable, which from my understanding, requires the value to be numerical (more concretely, the random variable defines a mapping from sample space to a measurable space).

This becomes apparent when you compare the regular definition of the expected value (discrete case):

$$ \mathbb{E}[X] = \sum_x xP(x) $$

compared to the definition deep learning book (see section 3.8):

$$ \mathbb{E}_{x \sim P}[f(x)] = \sum_x P(x)f(x) $$

where since their definition of random variable is not numerical, we require this mapping function $f$ that's basically what the random variable is actually suppose to be defined as. That is, their definition is more a synonym for the outcome of an experiment (i.e. a variable that takes on a value from sample space).

Is my understanding here correct? If so, is there a good reason for deviating from the standard definition for random variable? It seems quite confusing that they use a different definition from what is standard.

$\endgroup$
7
  • 1
    $\begingroup$ Seems natural to me. Suppose you have a hat filled with white and red balls. $\endgroup$ Commented Aug 31, 2023 at 13:56
  • $\begingroup$ This definition is no deviation from the standard. $\endgroup$ Commented Aug 31, 2023 at 14:17
  • $\begingroup$ related math.stackexchange.com/questions/3456658/… math.stackexchange.com/questions/240673/… stats.stackexchange.com/questions/236765/… $\endgroup$ Commented Aug 31, 2023 at 15:35
  • 1
    $\begingroup$ The book is wrong. A random variable, by definition, is real valued. A random element can take values in more general spaces, but not random variables. $\endgroup$ Commented Aug 31, 2023 at 18:28
  • $\begingroup$ I once met a well-known statistician, who asked (rhetorically): "What is a random variable?" His answer: "A random variable is a number you don't know." I think the moral is that there are different definitions for different purposes. $\endgroup$ Commented Sep 1, 2023 at 13:36

2 Answers 2

1
$\begingroup$

The textbook does not use

$$ \mathbb{E}_{x \sim P}[f(x)] = \sum_x P(x)f(x) $$

as the definition of the expected value of $x$. They use it as the definition of the expected value of $f(x)$. Quote from section 3.8: "The expectation, or expected value, of some function $f(x)$ with respect to a probability distribution $P(x)$ is the average, or mean value, that $f$ takes on when $x$ is drawn from $P$."

Indeed, if you take $f(x)=x$, then you recover the expression for the expected value of $x$.

$$ \mathbb{E}[X] = \sum_x xP(x) $$

$\endgroup$
2
  • 2
    $\begingroup$ Re last sentence: You actually don’t because, as the book writes, $X$ need not be a number. In particular, it need not take values in a vector space, according to the book at least. $\endgroup$ Commented Sep 1, 2023 at 21:42
  • $\begingroup$ ah, yes, you're right. Either way though, I wanted to illustrate that the book added a layer of indirection by using f(x) $\endgroup$ Commented Sep 1, 2023 at 21:45
0
$\begingroup$

The book is using the standard definition of random variable. They are just trying to explain it in a self-contained, intuitive, less formal way, one that can be understood by a reader who is less sophisticated with mathematics. They are not trying to define a different notion of random variable, and indeed, everything they do is consistent with the standard mathematical definition of random variables.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.