From 32f4af5f369fa9f0b2988ecad7797f4bec3661c3 Mon Sep 17 00:00:00 2001 From: Holden Rohrer Date: Tue, 21 Sep 2021 17:12:46 -0400 Subject: notes and homework --- zhilova/04_events | 89 +++++++++++++++++++++++++++++++++++++++++++++ zhilova/05_random_variables | 48 ++++++++++++++++++++++++ zhilova/06_ev | 67 ++++++++++++++++++++++++++++++++++ zhilova/07_mgf | 22 +++++++++++ zhilova/08_jensen | 23 ++++++++++++ 5 files changed, 249 insertions(+) create mode 100644 zhilova/04_events create mode 100644 zhilova/05_random_variables create mode 100644 zhilova/06_ev create mode 100644 zhilova/07_mgf create mode 100644 zhilova/08_jensen (limited to 'zhilova') diff --git a/zhilova/04_events b/zhilova/04_events new file mode 100644 index 0000000..551d4cc --- /dev/null +++ b/zhilova/04_events @@ -0,0 +1,89 @@ +Bayes' Theorem is useful for determining something like ``how likely is +XYZ to have disease A if they pass test B?'' because it lets us convert +coditionals in the other direction (e.g. test given disease). + + Independent Random Events +(C, \bb B, P) is a probability space +With A, B \in \bb B and A, B \subseteq C, they are independent iff +P(A\cap B) = P(A)P(B). + +A group of events Ai, ... An in \bb B is + +(1) pairwise independent iff P(A_i \cap A_j) = P(A_i)P(A_j) (i \neq j). + +(2) triplewise independent iff P(A_i \cap A_j \cap A_k) = +P(A_i)P(A_j)P(A_k) (i \neq j \neq k \neq i). + +(3) mutually independent iff for all subsets C of {A1, ..., An}, +P(intersection of C) = product of all P(A) where A in C. + +3 implies 2 and 1, but 2 doesn't imply 1. + +Independence can also be defined equivalently as: +P(A | C) = P(A) + +A,B are conditionally independent if P(A\cap B | C) = P(A|C)P(B|C) + + Random Variables + +[What lol] + +X = X(w) : C \mapsto D where D is the range of X. + +Inverse functions can exist, I guess. + +P_X(A) = P({all w : X(w) in A}) + +Key Properties + +1) P_X(A) is a probability set function on D. +2) P_X(A) \geq 0 +3) P_x(D) = 1 +4) P_x(empty) = 0 +5+ P_x=(A) = 1 - P_x(D \setminus A) +6,7) monotonicity, sigma-additivitiy. + + Discrete r.v. have countable domain. +Ex: Binomial r.v. + +X ~ Binomial(n, p) +n in N, p in (0,1) + +D = {0, 1, ... n} + +P(X = x) = (n choose x)p^x(1-p)^{n-x} + +X ~ Poisson(\lambda) + +D = N^+. + +P(X = x) = \lambda^x e^{-\lambda}/x! + + Probability Mass Function (pmf) + +For r.v. with countable domain D, + +P_X(x) := P(X = x) (if x \in D, 0 otherwise) + +Properties of P_X(x), x \in D: + (Correspond directly to probability set function properties) + +1) Typically, P_X(x) > 0 forall x \in D. >= 0 also acceptable. + +2) sum over all x of D P_x(x) gives 1. + +3) {X in A} equivalent to {w in C : X(w) in A} + + r.v. of continuous type +Ex: Let X uniformly take values in [0, 1]. +P(X in (a, b]) = b - a. 0 \leq a < b \leq 1. + + Cumulative distribution type +Defined for discrete and continuous type r.v. + +F_X(x) := P(X \leq x). + +F_X : R -> [0,1] [couldn't it be from any ordered domain?] +1) 0 \leq F_X \leq 1 +2) non-decreasing +3) right-continuous diff --git a/zhilova/05_random_variables b/zhilova/05_random_variables new file mode 100644 index 0000000..fbf8bc0 --- /dev/null +++ b/zhilova/05_random_variables @@ -0,0 +1,48 @@ + Cumulative Distribution Function (CDF) +Def: CDF of a r.v. X, taking values in R is +F_X(x) = \Pr(X\leq x) = \Pr(X\in (-\infty, x] ) % to appease vim, ')' + +Th 1.5.1 (Properties of a CDF) +0) 0 \leq F_X(x) \leq 1 \forall x \in R +1) It is non-decreasing. x_1 \leq x_2 \in A, F_X(x_1) \leq F_X(x_2). +2) F_X(x) -> 0 as x -> -\infty +3) F_X(x) -> 1 as x -> +\infty +4) F_X(x) is right-continuous. + + Continuous R.V. +Over an uncountable domain D like (0, 1), R. + +Let there be a CDF F_X(x) = P(X \leq x). + +Assume there exists f_X(x) := d/dx F_X(x), the probability density +function. +[discontinuities might be able to be resolved with a delta function] +By the second fundamental theorem of calculus (?), +F_X(x) = P(X \leq x) = \int_{-\infty}^\infty f_x(t) dt. + +In the discrete case, we have the pmf (probability mass function) +where P_x(t) = P(X = t) + +P(a < X \leq b) for a < b = P_X(b) - P_X(a). + +Examples: +- Uniform Distribution +X ~ U[a, b] + = { 1/(b-a) for a \leq x \leq b + { 0 otherwise. + +- Exponential Distribution +X ~ Exp(\lambda) \lambda > 0 +f_X(x) = { \lambda e^{-\lambda x}, x \geq 0 + { 0 otherwise + +F_X(x) = { 1 - e^{-\lambda x}, x \geq 0 + { 0 otherwise + +- Normal Distribution +X ~ N(\mu, \sigma^2) \mu \in R, \sigma^2 > 0. +\sigma = stdev. \sigma^2 = variance. \mu = mean/center. + +f_X(x) = 1/\sqrt{2\pi \sigma^2} exp( - (x-\mu)^2 / {2\sigma^2} ) + +F_X(x) = \int_{-\infty}^x f_X(x) dx diff --git a/zhilova/06_ev b/zhilova/06_ev new file mode 100644 index 0000000..c13c159 --- /dev/null +++ b/zhilova/06_ev @@ -0,0 +1,67 @@ + Expectation/Expected Value/Mean Value/Average of an r.v.: + (Does not exist for all r.v.) +We must assume that \int_{-\infty}^\infty |x|f_x(x) dx < \infty, so + +E(X) := \int_{-\infty}^\infty xf_x(x) dx += {\bb E} X = E X. + +If discrete, +E(X) = \sum_{x\in D} xp_x(x) + + Higher (order) moments of X +moment of kth order := {\bb E}(X^k) +Again, they do not always exist, but they do exist if {\bb E}(|X^k|) +exists. + + Variance/dispersion of X +Var(X) = {\bb E}(X - {\bb E} X)^2 +aka quadratic deviation +\def\exp{{\bb E}} + +Thm: [ proof in textbook ] (1) +g : R \mapsto R. + +Let \int |g(x)| f_x(x) < \infty +Therefore, \exp g(X) = \int_{-\infty}^\infty g(x)f_x(x) dx + +Ex: + \exp X^2 = \int x^2 f_x(x) dx + \exp(X-a) = \int (x-a) f_x(x) dx + \exp\sin X = \int sin x f_x(x) dx + + Stdev +Stdev := \sqrt{Var(x)} + + Properties of E(x) +1) Linearity + Where E(X), E(Y) exist, and a, b \in R + E(aX + bY) = aE(X) + bE(Y) + By thm (1), \int axf_x(x) dx = a \int xf_x(x) dx. +2) E(a) = a +3) If g(x) \geq 0, E(g(X)) \geq 0, regardless of X. + +Example application: +Var(X) += E [X - E[X]]^2 += E [ X^2 - 2X * E[X] + [E[X]]^2 ] += E[X^2] - 2E[X]^2 + [E[X]]^2 + ^ linearity applied with E[X] as constant += E[X^2] - E[X]^2 + +On the reals (by property 3), +Var(X) \geq 0 +\to E(X^2) - E(X)^2 \geq 0 +\to E(X^2) \geq E(X)^2 [equality is strict unless X = a] + +More example: +Var(aX) = E[aX]^2 - (E[aX])^2 + = E[a^2X^2] - (aE[X])^2 + = a^2E[X^2] - a^2E[X]^2 + = a^2(Var(X)) + +Definitions: +1) centering: X - \exp X. \exp[X - \exp X] = 0. +2) rescaling: With c>0, cX. Var(cX) = c^2 Var X. +3) centering and standardization: centering and rescaling s.t. +Var(Y) = 1. + Y = (X - \exp X)/\sqrt{Var X} diff --git a/zhilova/07_mgf b/zhilova/07_mgf new file mode 100644 index 0000000..5d5a007 --- /dev/null +++ b/zhilova/07_mgf @@ -0,0 +1,22 @@ + Moment-generating Function +(Still technically lecture #6 but very different topic) +X := real r.v. +M_X(t) = \exp e^{tX} where t \in R. +Defined if \int_{-\infty}^\infty e^{tx} f_x(x) dx < \infty + for t \in (-h, h) for some h > 0. [I can't remember why the region + of convergence is symmetric about 0, but I remember some thm. about + that] + +e^{tx} gives a nice Taylor series. +For M_X(t) around 0, +M_X(t) = M_X(0) + M_X'(0) t + M_X''(0)t^2/2 + M_X'''(0) t^3/3! + ... +M_X^{(k)}(t) = {d^k\over dt^k} \exp{e^{tX}} = {d^k\over dt^k} +\int_{-\infty}^\infty e^{tx} f_x(x) dx += \int_{-\infty}^\infty x^k e^{tx} f_x(x) dx += \exp[X^k e^{tX}] + = [with t = 0] \exp[X^k]. + +Why is it useful? +Example: X ~ N(\mu, \sigma^2) *can* have its moments computed by +integration-by-parts (probably table method), but the mgf can be used +instead, which makes the determination easier. diff --git a/zhilova/08_jensen b/zhilova/08_jensen new file mode 100644 index 0000000..20a8158 --- /dev/null +++ b/zhilova/08_jensen @@ -0,0 +1,23 @@ +Definition: A fn. f : R -> R is called convex on an interval (a,b) if +f(cx + dy) \leq cf(x) + df(y) +\forall x, y \in (a, b) +\forall c \in (0, 1), d = 1-c. +Concave is -convex. + +Essentially stating that the function lies on or below a line segment +connecting f(a) and f(b) [or above in the case of concave]. + +Strictly convex: f(cx+dy) < cf(x) + df(y). + + Jensen's Inequality +X - r.v., E|X| < infty. E|f(x)| < infty. + f(E X) \leq E(f(x)). +If f is strictly convex, \leq -> "less than" unless X is a constant r.v. + +Further theorems: +(1) If f is differentiable on (a,b), +f is convex <=> f' is nondecreasing on (a,b). +f is strictly convex <=> f' is strictly increasing on (a,b) +(2) If f is twice differentiable on (a,b) +f is convex <=> f'' \geq 0 on (a,b) +f is strictly convex <=> f'' > 0 on (a,b) -- cgit