Let be random variables,
be another random variables, and
be random (column) vectors.
Suppose the vector-valued function[1] is bijective (it is also called one-to-one correspondence in this case).
Then, its inverse exists.
After that, we can transform to by applying the transformation ,
i.e. by ,
and transform to by applying the inverse transformation ,
i.e. by .
We are often interested in deriving the joint probability function
of ,
given the joint probability function of .
We will examine the discrete and continuous cases one by one in the following.
For continuous random variables, the situation is more complicated.
Let us investigate the case for univariate pdf, which is simpler.
Proof.
Under the assumption that is differentiable and strictly monotone,
the cdf ( exists since is strictly monotonic.)
Differentiating both side of the above equation (assuming the cdf's involved are differentiable) gives
Since , we can write as .
Also, we can summarize the above case defined function into a single expression by applying absolute value function to both side:
where the absolute value sign is only applied to since the pdf's must be nonnegative, and thus we do not need to apply the sign to them.
Remark.
- To explain this theorem in a more intuitive manner, we rewrite the equation in the theorem as
- where both side of the equation can be regarded as differential areas, which are nonnegative due to the absolute value signs.
- This equation should intuitively hold since they both represent the areas under the pdf's, which represent probabilities. For , it is the area of the region under the pdf of over an "infinitesimal" interval , which represent the probability for to lie in this infinitesimal interval . After transformation, we get another pdf of , and the original region is transformed to a region under pdf of over an infinitesimal interval with area . Since is bijective function (its strict monotonicity implies this), "correspond" to in some sense, and we know that the values in are "originated" from the values in , and so the randomness. It follows that the probability for lying in and lying in should be the same, and hence the two differential areas are the same.
Let us define Jacobian matrix, and introduce several notations in the definition.
Definition.
(Jacobian matrix)
Suppose the function is differentiable (then it follows that is differentiable).
The Jacobian matrix
in which is the component function of
for each , i.e.
.
Remark.
- We have .
Example.
Suppose , , and
.
Then,
,, and
Also, .
Then, ,
, and
Proof.
Partial proof:
Assume is differentiable and bijective.
First,
On the other hand,
we have
where , which is the preimage of the set under .
Applying the change of variable formula to this integral (whose proof is advanced and uses our assumptions), we get
Comparing the integrals in and , we can observe the desired result.
Definition.
(Moment generating function)
The moment generating function (mgf) for the distribution of a
random variable is
.
Remark.
- For comparison: cdf is .
- Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
- i.e., we can recover probability function from mgf.
- The proof to this result is complicated, and thus omitted.
Proof.
- The result follows from simplifying the above expression by
Proof.
Similarly,
- lote: law of total expectation
Remark.
- This equality does not hold if and are not independent.
In the following, we will use to denote .
Remark.
- When , the dot product of two vectors is product of two numbers.
- .
Proposition.
(Relationship between independence and mgf)
Random variables are independent if and only if
Proof.
'only if' part:
Assume are independent. Then,
Proof for 'if' part is quite complicated, and thus is omitted.
Analogously, we have marginal mgf.
Definition.
(Marginal mgf)
The marginal mgf of which is a member of random variables is
Proof.
Remark.
- If are independent,
- This provides an alternative, and possibly more convenient method to derive the distribution of , compared with deriving it from probability functions of .
- Special case: if and , then , which is sum of r.v.'s.
- So, .
- In particular, if are independent , then .
- We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
- Special case: if , then the expression for linear transformation becomes .
- So, .
Moment generating function of some important distributions
[edit | edit source]
Proposition.
(Moment generating function of binomial distribution)
The moment generating function of is .
Proof.
Proposition.
(Moment generating function of Poisson distribution)
The moment generating function of is .
Proof.
Proposition.
(Moment generating function of exponential distribution)
The moment generating function of is .
Proof.
- The result follows.
Proposition.
(Moment generating function of gamma distribution)
The moment generating function of is .
Proof.
- We use similar proof technique from the proof for mgf of exponential distribution.
Proposition.
(Moment generating function of normal distribution)
The moment generating function of is .
We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters.
As we will see, proving these propositions using mgf is quite simple.
Proposition.
(Distribution of linear transformation of normal r.v.'s)
Let .
Then, .
Proof.
- The mgf of is
- which is the mgf of , and the result follows since mgf identify a distribution uniquely.
Proposition.
(Sum of independent binomial r.v.'s)
Let , in which are independent. Then,
.
Proof.
- The mgf of is
- which is the mgf of , as desired.
Proposition.
(Sum of independent Poisson r.v.'s)
Let , in which are independent. Then,
.
Proof.
- The mgf of is
- which is the mgf of , as desired.
Proof.
- The mgf of is
- which is the mgf of , as desired.
Proposition.
(Sum of independent gamma r.v.'s)
Let , in which are independent. Then,
.
Proof.
- The mgf of is
- which is the mgf of , as desired.
Proposition.
(Sum of independent normal r.v.'s)
Let , in which are independent. Then
.
Proof.
- The mgf of (in which they are independent) is
- which is the mgf of , as desired.
We will provide a proof to central limit theorem (CLT) using mgf here.
Proof.
- Define . Then, we have
- which is in the form of .
- Therefore,
and the result follows from the mgf property of identifying distribution uniquely.
Remark.
- Since ,
- the sample mean converges in distribution to as .
- The same result holds for the sample mean of normal r.v.'s with the same mean and the same variance ,
- since if , then .
- It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e. converges in distribution to .
- The same result holds for the sample sum of normal r.v.'s with the same mean and the same variance ,
- since if , then .
- If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..
A special case of using CLT as approximation is using normal distribution to approximate discrete distribution.
To improve accuracy, we should ideally have continuity correction, as explained in the following.
Remark.
- The reason for doing this is to make to be at the 'middle' of the interval, so that it is better approximated.
Illustration of continuity correcction:
|
| /
| /
| /
| /|
| /#|
| *##|
| /|##|
| /#|##|
| /##|##|
| /|##|##|
| / |##|##|
| / |##|##|
| / |##|##|
| / |##|##|
*------*--*--*---------------------
i-1/2 i i+1/2
|
| /
| /
| /
| /
| /
| *
| /|
| /#|
| /##|
| /###|
| /####|
| /#####|
| /|#####|
| / |#####|
*---*-----*------------------------
i-1 i
|
| /|
| /#|
| /##|
| /###|
| /####|
| *#####|
| /|#####|
| / |#####|
| / |#####|
| / |#####|
| / |#####|
| / |#####|
| / |#####|
| / |#####|
*---------*-----*------------------
i i+1
- ↑ or equivalently, transformation between supports of and