Chapter 2 - Information Exercises

Exercises for Chapter 2 - Information and Conditioning.

Exercise 2.1 We want to show that if $X$ is measurable with respect to the trivial $\sigma$-algebra $\mathcal{F}_0 = {\emptyset, \Omega}$, then $X$ is constant.

Since $X$ is measurable with respect to $\mathcal{F}_0$, this means that for any Borel set $B \subseteq \mathbb{R}$, the preimage $X^{-1}(B) \in \mathcal{F}_0$. This implies that the only possible measurable sets are $\emptyset$ and $\Omega$.

Since $X^{-1}(B)$ must be either $\emptyset$ or $\Omega$, there exists a single real number $c$ such that:

\[X(\omega) = c, \quad \forall \omega \in \Omega.\]

Thus, $X$ is constant and not random, meaning it is degenerate.

Exercise 2.2 (i) The random variable $X$ is defined by:

\[X = \begin{cases} 1, & \text{if } S_2 = 4, \\ 0, & \text{if } S_2 \neq 4. \end{cases}\]

From the problem statement, $S_2(HH) = 16$, $S_2(HT) = S_2(TH) = 4$, and $S_2(TT) = 1$. This means that $X=1$ for ${HT, TH}$ and $X=0$ for ${HH, TT}$. The $\sigma$-algebra generated by $X$ is:

\[\sigma(X) = \{\emptyset, \{HT, TH\}, \{HH, TT\}, \Omega\}.\]

(ii) The stock price $S_1$ takes values $S_1(HH) = 8$, $S_1(HT) = 8$, $S_1(TH) = 2$, $S_1(TT) = 2$. So the level sets of $S_1$ form the $\sigma$-algebra:

\[\sigma(S_1) = \{\emptyset, \{HH, HT\}, \{TH, TT\}, \Omega\}.\]

(iii) To check independence under $\tilde{\mathbb{P}}$, we verify that for any $A \in \sigma(X)$ and $B \in \sigma(S_1)$,

\[\tilde{\mathbb{P}}(A \cap B) = \tilde{\mathbb{P}}(A) \tilde{\mathbb{P}}(B).\]

Consider $A = {HT, TH}$ (where $X=1$) and $B = {HH, HT}$ (where $S_1 = 8$):

\[\tilde{\mathbb{P}}(A) = \tilde{\mathbb{P}}(\{HT, TH\}) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2},\] \[\tilde{\mathbb{P}}(B) = \tilde{\mathbb{P}}(\{HH, HT\}) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}.\]

The intersection $A \cap B = {HT}$, so:

\[\tilde{\mathbb{P}}(A \cap B) = \tilde{\mathbb{P}}(\{HT\}) = \frac{1}{4}.\]

Since

\[\tilde{\mathbb{P}}(A \cap B) = \tilde{\mathbb{P}}(A) \tilde{\mathbb{P}}(B) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4},\]

$X$ and $S_1$ are independent under $\tilde{\mathbb{P}}$.

(iv) Under $\mathbb{P}$, we check the same independence condition.

\[\mathbb{P}(A) = \mathbb{P}(\{HT, TH\}) = \frac{2}{9} + \frac{2}{9} = \frac{4}{9},\] \[\mathbb{P}(B) = \mathbb{P}(\{HH, HT\}) = \frac{4}{9} + \frac{2}{9} = \frac{6}{9}.\]

The intersection $A \cap B = {HT}$, so:

\[\mathbb{P}(A \cap B) = \mathbb{P}(\{HT\}) = \frac{2}{9}.\]

However,

\[\mathbb{P}(A) \mathbb{P}(B) = \frac{4}{9} \times \frac{6}{9} = \frac{24}{81} = \frac{8}{27} \neq \frac{2}{9}.\]

Since $\mathbb{P}(A \cap B) \neq \mathbb{P}(A) \mathbb{P}(B)$, $X$ and $S_1$ are not independent under $\mathbb{P}$.

(v) We are given that under $\mathbb{P}$,

\[\mathbb{P}(S_1 = 8) = \frac{2}{3}, \quad \mathbb{P}(S_1 = 2) = \frac{1}{3}.\]

However, if we are told that $X = 1$, this means that we are restricting to the event where $S_2 = 4$, which happens for ${HT, TH}$. We must then compute the conditional probabilities of $S_1$ given $X=1$. From the problem setup:

  • $S_1(HT) = 8$, $S_1(TH) = 2$.
  • $\mathbb{P}(HT) = \frac{2}{9}$, $\mathbb{P}(TH) = \frac{2}{9}$. Since $X=1$ corresponds to ${HT, TH}$, we normalize:
\[\mathbb{P}(S_1 = 8 \mid X = 1) = \frac{\mathbb{P}(HT)}{\mathbb{P}(HT) + \mathbb{P}(TH)} = \frac{\frac{2}{9}}{\frac{4}{9}} = \frac{1}{2}.\]

Similarly,

\[\mathbb{P}(S_1 = 2 \mid X = 1) = \frac{\mathbb{P}(TH)}{\mathbb{P}(HT) + \mathbb{P}(TH)} = \frac{\frac{2}{9}}{\frac{4}{9}} = \frac{1}{2}.\]

Thus, knowing that $X=1$ changes our estimate of the distribution of $S_1$ from:

\[\mathbb{P}(S_1 = 8) = \frac{2}{3}, \quad \mathbb{P}(S_1 = 2) = \frac{1}{3}\] \[\mathbb{P}(S_1 = 8 \mid X=1) = \frac{1}{2}, \quad \mathbb{P}(S_1 = 2 \mid X=1) = \frac{1}{2}.\]

Exercise 2.3 Since $X$ and $Y$ are standard normal, we have:

\[\mathbb{E}[X] = 0, \quad \mathbb{E}[Y] = 0, \quad \text{Var}(X) = 1, \quad \text{Var}(Y) = 1.\]

Using linearity of expectation,

\[\mathbb{E}[V] = \mathbb{E}[X \cos \theta + Y \sin \theta] = \cos \theta \mathbb{E}[X] + \sin \theta \mathbb{E}[Y] = 0.\]

Similarly,

\[\mathbb{E}[W] = \mathbb{E}[-X \sin \theta + Y \cos \theta] = -\sin \theta \mathbb{E}[X] + \cos \theta \mathbb{E}[Y] = 0.\]

Now, compute the variances:

\[\begin{align*} \text{Var}(V) &= \text{Var}(X \cos \theta + Y \sin \theta) \\ &= \cos^2 \theta \text{Var}(X) + \sin^2 \theta \text{Var}(Y) + 2\cos \theta \sin \theta \text{Cov}(X,Y). \end{align*}\]

Since $X$ and $Y$ are independent, $\text{Cov}(X, Y) = 0$, so:

\[\text{Var}(V) = \cos^2 \theta + \sin^2 \theta = 1.\]

Similarly, for $W$:

\[\begin{align*} \text{Var}(W) &= \text{Var}(-X \sin \theta + Y \cos \theta) \\ &= \sin^2 \theta \text{Var}(X) + \cos^2 \theta \text{Var}(Y) - 2\sin \theta \cos \theta \text{Cov}(X,Y) = 1. \end{align*}\]

Thus, both $V$ and $W$ are standard normal. To check independence, we compute the covariance:

\[\begin{align*} \text{Cov}(V, W) &= \text{Cov}(X \cos \theta + Y \sin \theta, -X \sin \theta + Y \cos \theta) \\ &= -\cos \theta \sin \theta \text{Var}(X) + \sin \theta \cos \theta \text{Var}(Y) \\ &\quad + (\cos \theta \cos \theta - \sin \theta \sin \theta) \text{Cov}(X, Y). \end{align*}\]

Since $X$ and $Y$ are independent, $\text{Cov}(X, Y) = 0$, so:

\[\text{Cov}(V, W) = -\cos \theta \sin \theta + \sin \theta \cos \theta = 0.\]

Since $V$ and $W$ are jointly normal and uncorrelated, they are independent.

Exercise 2.4 (i) We compute the joint moment-generating function (MGF) of $(X, Y)$. Given that $Y = XZ$ and that $Z$ takes values $\pm 1$ with equal probability, we have:

\[\mathbb{E}[e^{uX + vY}] = \mathbb{E}[e^{uX + vXZ}].\]

Since $Z$ is independent of $X$, we condition on $Z$:

\[\mathbb{E}[e^{uX + vXZ}] = \frac{1}{2} \mathbb{E}[e^{uX + vX}] + \frac{1}{2} \mathbb{E}[e^{uX - vX}].\]

Factor out the terms:

\[= \frac{1}{2} \mathbb{E}[e^{(u+v)X}] + \frac{1}{2} \mathbb{E}[e^{(u-v)X}].\]

Using the MGF of a standard normal variable, $\mathbb{E}[e^{tX}] = e^{\frac{1}{2} t^2}$, we get:

\[= \frac{1}{2} e^{\frac{1}{2} (u+v)^2} + \frac{1}{2} e^{\frac{1}{2} (u-v)^2}.\]

Expanding the squares:

\[= \frac{1}{2} e^{\frac{1}{2} (u^2 + 2uv + v^2)} + \frac{1}{2} e^{\frac{1}{2} (u^2 - 2uv + v^2)}.\]

Factor out the common term:

\[= e^{\frac{1}{2} (u^2 + v^2)} \cdot \frac{e^{uv} + e^{-uv}}{2}.\]

Since $\frac{e^{uv} + e^{-uv}}{2} = \cosh(uv)$, we obtain the final result:

\[\mathbb{E}[e^{uX + vY}] = e^{\frac{1}{2} (u^2 + v^2)} \cdot \cosh(uv).\]

(ii) To find $\mathbb{E}[e^{vY}]$, set $u = 0$ in the joint MGF:

\[\mathbb{E}[e^{vY}] = e^{\frac{1}{2} v^2} \cdot \frac{e^0 + e^0}{2} = e^{\frac{1}{2} v^2}.\]

This is the MGF of a standard normal variable, so $Y$ is standard normal.

(iii) If $X$ and $Y$ were independent, their joint MGF would factor as the product of their individual MGFs:

\[\mathbb{E}[e^{uX+vY}] = \mathbb{E}[e^{uX}] \mathbb{E}[e^{vY}].\]

We check:

\[\begin{align*} \mathbb{E}[e^{uX}] \mathbb{E}[e^{vY}] &= e^{\frac{1}{2} u^2} \cdot e^{\frac{1}{2} v^2} = e^{\frac{1}{2} (u^2 + v^2)}. \end{align*}\]

Comparing with the joint MGF:

\[\mathbb{E}[e^{uX + vY}] = e^{\frac{1}{2} (u^2 + v^2)} \cosh(uv).\]

Since $\cosh(uv) \neq 1$ for $uv \neq 0$, the joint MGF is not the product of the individual MGFs, proving that $X$ and $Y$ are not independent.

Exercise 2.5 To find the marginal density of $X$, integrate out $y$:

\[f_X(x) = \int_{- |x|}^{\infty} f_{X,Y}(x,y) dy.\]

Substituting $f_{X,Y}(x,y)$:

\[f_X(x) = \int_{- |x|}^{\infty} \frac{2|x|+y}{\sqrt{2\pi}} \exp \left(-\frac{(2|x|+y)^2}{2} \right) dy.\]
Let $u = 2 x + y$, so $du = dy$, and changing limits, $y = - x $ corresponds to $u = x $:
\[f_X(x) = \int_{|x|}^{\infty} \frac{u}{\sqrt{2\pi}} e^{-u^2/2} du.\]

Using the known result for the standard normal density,

\[f_X(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}.\]

Thus, $X \sim \mathcal{N}(0,1)$. A similar calculation shows $Y \sim \mathcal{N}(0,1)$. We compute:

\[\mathbb{E}[XY] = \int_{-\infty}^{\infty} \int_{-|x|}^{\infty} xy f_{X,Y}(x,y) dy dx.\]

Since $f_{X,Y}(x,y)$ is symmetric in $x$ and $y$ (odd function in $x$ or $y$), the integral evaluates to zero:

\[\mathbb{E}[XY] = 0.\]

Since $\mathbb{E}[X] = \mathbb{E}[Y] = 0$, we conclude:

\[\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] = 0.\]

Thus, $X$ and $Y$ are uncorrelated. If $X$ and $Y$ were independent, we would have $f_{X,Y}(x,y) = f_X(x) f_Y(y)$. However, from the given form of $f_{X,Y}(x,y)$, we see that the density depends on $x$ and $y$ in a way that does not factor as a product of functions in $x$ and $y$. This dependence implies $X$ and $Y$ are not independent.

Exercise 2.6 (i) List the sets in $\sigma(X)$ The $\sigma$-algebra $\sigma(X)$ consists of all sets that can be distinguished by $X$. Since $X$ takes only two values, we partition $\Omega$ based on those values:

\[\sigma(X) = \{ \emptyset, \Omega, \{a,b\}, \{c,d\} \}.\]

This follows because $X(a) = X(b) = 1$ and $X(c) = X(d) = -1$, so $\sigma(X)$ is generated by the partition ${ {a,b}, {c,d} }$.

(ii) Determine $\mathbb{E}[Y|X]$ We compute $\mathbb{E}[Y | X]$ by conditioning on the sets in $\sigma(X)$:

  • If $X = 1$ (i.e., $\omega \in {a, b}$),
\[\mathbb{E}[Y | X = 1] = \mathbb{E}[Y | \{a,b\}] = \frac{\mathbb{P}(a) Y(a) + \mathbb{P}(b) Y(b)}{\mathbb{P}(a) + \mathbb{P}(b)}\] \[= \frac{\frac{1}{6} (1) + \frac{1}{3} (-1)}{\frac{1}{6} + \frac{1}{3}} = \frac{\frac{1}{6} - \frac{2}{6}}{\frac{3}{6}} = -\frac{1}{3}.\]
  • If $X = -1$ (i.e., $\omega \in {c, d}$),
\[\mathbb{E}[Y | X = -1] = \mathbb{E}[Y | \{c,d\}] = \frac{\mathbb{P}(c) Y(c) + \mathbb{P}(d) Y(d)}{\mathbb{P}(c) + \mathbb{P}(d)}\] \[= \frac{\frac{1}{4} (1) + \frac{1}{4} (-1)}{\frac{1}{4} + \frac{1}{4}} = 0.\]

Thus,

\[\mathbb{E}[Y | X] = \begin{cases} - \frac{1}{3}, & X = 1, \\ 0, & X = -1. \end{cases}\]

We verify the partial-averaging property:

\[\mathbb{E}[\mathbb{E}[Y | X]] = \mathbb{E} \left[ -\frac{1}{3} 1 \cdot \mathbb{P}(\{a, b\}) + 0 \cdot \mathbb{P}(\{c,d\}) \right] = -\frac{1}{3} \times \frac{1}{2} + 0 \times \frac{1}{2} = 0.\]

Since $\mathbb{E}[Y] = 0$, the property holds.

(iii) Determine $\mathbb{E}[Z|X]$ Since $Z = X + Y$, we use linearity of conditional expectation:

\[\mathbb{E}[Z | X] = \mathbb{E}[X | X] + \mathbb{E}[Y | X] = X + \mathbb{E}[Y | X].\]
Thus, using our previous result for $\mathbb{E}[Y X]$:
\[\mathbb{E}[Z | X] = \begin{cases} 1 - \frac{1}{3} = \frac{2}{3}, & X = 1, \\ -1 + 0 = -1, & X = -1. \end{cases}\]

Again, verifying the partial-averaging property:

\[\mathbb{E}[\mathbb{E}[Z | X]] = \mathbb{E} \left[ \frac{2}{3} \mathbb{I}_{X=1} - 1 \mathbb{I}_{X=-1} \right] = \frac{2}{3} \times \frac{1}{2} + (-1) \times \frac{1}{2} = 0.\]

Since $\mathbb{E}[Z] = 0$, the property holds.

(iv) Compute $\mathbb{E}[Z X] - \mathbb{E}[Y X]$ and explain why $\mathbb{E}[X X] = X$
From part (iii), we already have $\mathbb{E}[Z X] = X + \mathbb{E}[Y X]$, so:  
\[\mathbb{E}[Z | X] - \mathbb{E}[Y | X] = X + \mathbb{E}[Y | X] - \mathbb{E}[Y | X] = X.\]
This follows directly from the fact that $\mathbb{E}[X X] = X$ (since conditioning on $X$ provides no new information about $X$).

Exercise 2.7 We are given an integrable random variable $Y$ and a sub-$\sigma$-algebra $\mathcal{G}$. The best estimate of $Y$ given $\mathcal{G}$ is $\mathbb{E}[Y | \mathcal{G}]$, and we define the error term as:

\[\text{Err} = Y - \mathbb{E}[Y | \mathcal{G}].\]

We want to show that for any other $\mathcal{G}$-measurable estimate $X$, we have:

\[\text{Var}(\text{Err}) \leq \text{Var}(Y - X).\]

Define $\mu = \mathbb{E}[Y - X]$. The variance of $Y - X$ can be rewritten as:

\[\mathbb{E}[(Y - X - \mu)^2] = \mathbb{E} \left( (Y - \mathbb{E}[Y | \mathcal{G}]) + (\mathbb{E}[Y | \mathcal{G}] - X - \mu) \right)^2.\]

Expanding the square, we get:

\[\mathbb{E}[(Y - X - \mu)^2] = \mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}])^2] + \mathbb{E}[(\mathbb{E}[Y | \mathcal{G}] - X - \mu)^2] + 2\mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}])(\mathbb{E}[Y | \mathcal{G}] - X - \mu)].\]

Using the tower property of conditional expectation:

\[\mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}]) | \mathcal{G}] = 0.\]
Since $\mathbb{E}[Y \mathcal{G}] - X - \mu$ is $\mathcal{G}$-measurable, we take expectations:
\[\mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}])(\mathbb{E}[Y | \mathcal{G}] - X - \mu)] = \mathbb{E}[\mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}]) | \mathcal{G}] (\mathbb{E}[Y | \mathcal{G}] - X - \mu)] = 0.\]

Thus, the variance simplifies to:

\[\mathbb{E}[(Y - X - \mu)^2] = \mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}])^2] + \mathbb{E}[(\mathbb{E}[Y | \mathcal{G}] - X - \mu)^2].\]

Since variance is non-negative, we conclude:

\[\text{Var}(\text{Err}) = \mathbb{E}[(Y - \mathbb{E}[Y | \mathcal{G}])^2] \leq \mathbb{E}[(Y - X)^2] = \text{Var}(Y - X).\]
Thus, $\mathbb{E}[Y \mathcal{G}]$ minimizes the variance of the error, proving the result.

Exercise 2.8 We are given integrable random variables $X$ and $Y$ on a probability space $(\Omega, \mathcal{F}, \mathbb{P})$. We decompose $Y$ into two components:

\[Y = Y_1 + Y_2,\]

where

\[Y_1 = \mathbb{E}[Y | X]\]

is $\sigma(X)$-measurable, and

\[Y_2 = Y - \mathbb{E}[Y | X]\]

is the residual component. We want to show that $Y_2$ and $X$ are uncorrelated and, more generally, that $Y_2$ is uncorrelated with any $\sigma(X)$-measurable random variable. The covariance between $Y_2$ and $X$ is given by:

\[\text{Cov}(Y_2, X) = \mathbb{E}[Y_2 X] - \mathbb{E}[Y_2] \mathbb{E}[X].\]
Since $Y_2 = Y - \mathbb{E}[Y X]$, we substitute:
\[\mathbb{E}[Y_2 X] = \mathbb{E}[(Y - \mathbb{E}[Y | X]) X].\]

Using the linearity of expectation:

\[\mathbb{E}[Y_2 X] = \mathbb{E}[Y X] - \mathbb{E}[\mathbb{E}[Y | X] X].\]

By the tower property of conditional expectation:

\[\mathbb{E}[\mathbb{E}[Y | X] X] = \mathbb{E}[\mathbb{E}[Y X | X]] = \mathbb{E}[Y X].\]

Thus,

\[\mathbb{E}[Y_2 X] = \mathbb{E}[Y X] - \mathbb{E}[Y X] = 0.\]

Since $\mathbb{E}[Y_2] = 0$ by construction, we conclude:

\[\text{Cov}(Y_2, X) = 0.\]

Thus, $Y_2$ and $X$ are uncorrelated. Let $Z$ be any $\sigma(X)$-measurable random variable. We compute:

\[\mathbb{E}[Y_2 Z] = \mathbb{E}[(Y - \mathbb{E}[Y | X]) Z].\]

Since $Z$ is $\sigma(X)$-measurable, we write:

\[\mathbb{E}[Y_2 Z] = \mathbb{E}[\mathbb{E}[(Y - \mathbb{E}[Y | X]) Z | X]].\]
Since $\mathbb{E}[Y X]$ is the best $\sigma(X)$-measurable approximation of $Y$, we have:
\[\mathbb{E}[(Y - \mathbb{E}[Y | X]) | X] = 0.\]

Thus, for any $\sigma(X)$-measurable $Z$,

\[\mathbb{E}[Y_2 Z] = 0.\]

This shows that $Y_2$ is uncorrelated with every $\sigma(X)$-measurable random variable.

Exercise 2.9 (i) Example where $\sigma(f(X))$ is strictly smaller than $\sigma(X)$

Let the probability space be:

\[\Omega = \{ \omega_1, \omega_2, \omega_3, \omega_4 \},\]

with the power set $\mathcal{F} = 2^{\Omega}$ as the $\sigma$-algebra and a uniform probability measure $\mathbb{P}$ assigning probability $\frac{1}{4}$ to each outcome.

Define a random variable $X: \Omega \to \mathbb{R}$ by:

\[X(\omega_1) = 1, \quad X(\omega_2) = 2, \quad X(\omega_3) = 3, \quad X(\omega_4) = 4.\]

The $\sigma$-algebra generated by $X$, denoted $\sigma(X)$, is:

\[\sigma(X) = \{ \emptyset, \Omega, \{ \omega_1 \}, \{ \omega_2 \}, \{ \omega_3 \}, \{ \omega_4 \}, \{ \omega_1, \omega_2 \}, \{ \omega_3, \omega_4 \} \}.\]

Now, define a function $f: \mathbb{R} \to \mathbb{R}$ such that:

\[f(X) = \begin{cases} a, & X = 1 \text{ or } 2, \\ b, & X = 3 \text{ or } 4. \end{cases}\]

The $\sigma$-algebra generated by $f(X)$ is:

\[\sigma(f(X)) = \{ \emptyset, \Omega, \{ \omega_1, \omega_2 \}, \{ \omega_3, \omega_4 \} \}.\]

Since $\sigma(f(X))$ is a proper subset of $\sigma(X)$, it is strictly smaller.

(ii) Can $\sigma(f(X))$ ever be strictly larger than $\sigma(X)$?

No, the $\sigma$-algebra generated by $f(X)$ can never be strictly larger than $\sigma(X)$.

By definition, $\sigma(f(X))$ consists of all subsets of $\Omega$ that can be expressed in terms of $f(X)$, while $\sigma(X)$ consists of all subsets expressible in terms of $X$. Since $f(X)$ is a function of $X$, every event in $\sigma(f(X))$ is already in $\sigma(X)$. That is:

\[\sigma(f(X)) \subseteq \sigma(X).\]

Thus, $\sigma(f(X))$ can either be equal to $\sigma(X)$ (if $f$ is injective on the range of $X$) or strictly smaller but never larger.

Exercise 2.10 To show that $\mathbb{E}[Y|X] = g(X)$, we verify the partial-averaging property. First, recall the definition of conditional expectation:

\[\mathbb{E}[Y|X = x] = g(x) = \int_{-\infty}^{\infty} y f_{Y|X}(y|x) dy.\]

By definition of the conditional density:

\[f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}.\]

Thus, we rewrite:

\[g(x) = \int_{-\infty}^{\infty} y \frac{f_{X,Y}(x,y)}{f_X(x)} dy.\]

To confirm that $\mathbb{E}[Y|X] = g(X)$, we check the partial-averaging property: For any $\sigma(X)$-measurable function $h(X)$, we must show:

\[\int_A g(X) d\mathbb{P} = \int_A Y d\mathbb{P}.\]

Expanding both integrals:

\[\begin{align*} \int_A g(X) d\mathbb{P} &= \int_A \left( \int_{-\infty}^{\infty} y f_{Y|X}(y|X) dy \right) d\mathbb{P} \\ &= \int_A \int_{-\infty}^{\infty} y f_{Y|X}(y|X) dy d\mathbb{P}. \end{align*}\]

Using Fubini’s theorem:

\[\begin{align*} \int_A g(X) d\mathbb{P} &= \int_{-\infty}^{\infty} y \left( \int_A f_{Y|X}(y|X) d\mathbb{P} \right) dy. \end{align*}\]

By the definition of conditional expectation:

\[\int_A f_{Y|X}(y|X) d\mathbb{P} = \int_A f_{X,Y}(X,y) d\mathbb{P}.\]

Thus,

\[\begin{align*} \int_A g(X) d\mathbb{P} &= \int_{-\infty}^{\infty} y \left( \int_A f_{X,Y}(X,y) d\mathbb{P} \right) dy \\ &= \int_A Y d\mathbb{P}. \end{align*}\]
This verifies that $\mathbb{E}[Y X] = g(X)$, completing the proof.

Exercise 2.11 (i) Existence of a function $g$ such that $W = g(X)$

Since $W$ is $\sigma(X)$-measurable, by definition, there exists a function $g: \mathbb{R} \to \mathbb{R}$ such that:

\[W = g(X).\]

To construct $g(X)$, consider that every set in $\sigma(X)$ is of the form ${X \in B}$ for some Borel set $B \subset \mathbb{R}$. We analyze this in steps:

  1. Step 1: Indicator Functions
    Suppose $W$ is an indicator function of such a set $B$:
\[W = \mathbb{1}_{B}(X).\]

Then we can simply define:

\[g(x) = \mathbb{1}_{B}(x),\]

which satisfies $W = g(X)$.

  1. Step 2: Simple Functions
    If $W$ is a simple function, it can be written as:
\[W = \sum_{i=1}^{n} c_i \mathbb{1}_{B_i}(X).\]

Defining:

\[g(x) = \sum_{i=1}^{n} c_i \mathbb{1}_{B_i}(x),\]

again ensures $W = g(X)$.

  1. Step 3: General Nonnegative Functions
    More generally, any nonnegative $\sigma(X)$-measurable function $W$ can be approximated by a sequence of simple functions $W_n$ that converge to $W$. Since each $W_n$ has a corresponding function $g_n(x)$, we define:
\[g(x) = \lim_{n \to \infty} g_n(x).\]

By the standard construction of measurable functions, this function satisfies $W = g(X)$. Thus, there exists a function $g$ such that $W = g(X)$, as required.