# Random Variable

 Definition (random variable) A random variable ${\displaystyle X}$ on a sample space ${\displaystyle \Omega }$ is a real-valued function ${\displaystyle X:\Omega \rightarrow \mathbb {R} }$. A random variable X is called a discrete random variable if its range is finite or countably infinite.

For a random variable ${\displaystyle X}$ and a real value ${\displaystyle x\in \mathbb {R} }$, we write "${\displaystyle X=x}$" for the event ${\displaystyle \{a\in \Omega \mid X(a)=x\}}$, and denote the probability of the event by

${\displaystyle \Pr[X=x]=\Pr(\{a\in \Omega \mid X(a)=x\})}$.

# Independent Random Variables

The independence can also be defined for variables:

 Definition (Independent variables) Two random variables ${\displaystyle X}$ and ${\displaystyle Y}$ are independent if and only if ${\displaystyle \Pr[(X=x)\wedge (Y=y)]=\Pr[X=x]\cdot \Pr[Y=y]}$ for all values ${\displaystyle x}$ and ${\displaystyle y}$. Random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$ are mutually independent if and only if, for any subset ${\displaystyle I\subseteq \{1,2,\ldots ,n\}}$ and any values ${\displaystyle x_{i}}$, where ${\displaystyle i\in I}$, {\displaystyle {\begin{aligned}\Pr \left[\bigwedge _{i\in I}(X_{i}=x_{i})\right]&=\prod _{i\in I}\Pr[X_{i}=x_{i}].\end{aligned}}}

Note that in probability theory, the "mutual independence" is not equivalent with "pair-wise independence", which we will learn in the future.

# Expectation

Let ${\displaystyle X}$ be a discrete random variable. The expectation of ${\displaystyle X}$ is defined as follows.

 Definition (Expectation) The expectation of a discrete random variable ${\displaystyle X}$, denoted by ${\displaystyle \mathbf {E} [X]}$, is given by {\displaystyle {\begin{aligned}\mathbf {E} [X]&=\sum _{x}x\Pr[X=x],\end{aligned}}} where the summation is over all values ${\displaystyle x}$ in the range of ${\displaystyle X}$.

### Linearity of Expectation

Perhaps the most useful property of expectation is its linearity.

 Theorem (Linearity of Expectations) For any discrete random variables ${\displaystyle X_{1},X_{2},\ldots ,X_{n}}$, and any real constants ${\displaystyle a_{1},a_{2},\ldots ,a_{n}}$, {\displaystyle {\begin{aligned}\mathbf {E} \left[\sum _{i=1}^{n}a_{i}X_{i}\right]&=\sum _{i=1}^{n}a_{i}\cdot \mathbf {E} [X_{i}].\end{aligned}}}
Proof.
 By the definition of the expectations, it is easy to verify that (try to prove by yourself): for any discrete random variables ${\displaystyle X}$ and ${\displaystyle Y}$, and any real constant ${\displaystyle c}$, ${\displaystyle \mathbf {E} [X+Y]=\mathbf {E} [X]+\mathbf {E} [Y]}$; ${\displaystyle \mathbf {E} [cX]=c\mathbf {E} [X]}$. The theorem follows by induction.
${\displaystyle \square }$

The linearity of expectation gives an easy way to compute the expectation of a random variable if the variable can be written as a sum.

Example
Supposed that we have a biased coin that the probability of HEADs is ${\displaystyle p}$. Flipping the coin for n times, what is the expectation of number of HEADs?
It looks straightforward that it must be np, but how can we prove it? Surely we can apply the definition of expectation to compute the expectation with brute force. A more convenient way is by the linearity of expectations: Let ${\displaystyle X_{i}}$ indicate whether the ${\displaystyle i}$-th flip is HEADs. Then ${\displaystyle \mathbf {E} [X_{i}]=1\cdot p+0\cdot (1-p)=p}$, and the total number of HEADs after n flips is ${\displaystyle X=\sum _{i=1}^{n}X_{i}}$. Applying the linearity of expectation, the expected number of HEADs is:
${\displaystyle \mathbf {E} [X]=\mathbf {E} \left[\sum _{i=1}^{n}X_{i}\right]=\sum _{i=1}^{n}\mathbf {E} [X_{i}]=np}$.

The real power of the linearity of expectations is that it does not require the random variables to be independent, thus can be applied to any set of random variables. For example:

${\displaystyle \mathbf {E} \left[\alpha X+\beta X^{2}+\gamma X^{3}\right]=\alpha \cdot \mathbf {E} [X]+\beta \cdot \mathbf {E} \left[X^{2}\right]+\gamma \cdot \mathbf {E} \left[X^{3}\right].}$

However, do not exaggerate this power!

• For an arbitrary function ${\displaystyle f}$ (not necessarily linear), the equation ${\displaystyle \mathbf {E} [f(X)]=f(\mathbf {E} [X])}$ does not hold generally.
• For variances, the equation ${\displaystyle var(X+Y)=var(X)+var(Y)}$ does not hold without further assumption of the independence of ${\displaystyle X}$ and ${\displaystyle Y}$.

# Conditional Expectation

Conditional expectation can be accordingly defined:

 Definition (conditional expectation) For random variables ${\displaystyle X}$ and ${\displaystyle Y}$, ${\displaystyle \mathbf {E} [X\mid Y=y]=\sum _{x}x\Pr[X=x\mid Y=y],}$ where the summation is taken over the range of ${\displaystyle X}$.

### The Law of Total Expectation

There is also a law of total expectation.

 Theorem (law of total expectation) Let ${\displaystyle X}$ and ${\displaystyle Y}$ be two random variables. Then ${\displaystyle \mathbf {E} [X]=\sum _{y}\mathbf {E} [X\mid Y=y]\cdot \Pr[Y=y].}$