From 32f4af5f369fa9f0b2988ecad7797f4bec3661c3 Mon Sep 17 00:00:00 2001
From: Holden Rohrer <hr@hrhr.dev>
Date: Tue, 21 Sep 2021 17:12:46 -0400
Subject: notes and homework

---
 zhilova/04_events           | 89 +++++++++++++++++++++++++++++++++++++++++++++
 zhilova/05_random_variables | 48 ++++++++++++++++++++++++
 zhilova/06_ev               | 67 ++++++++++++++++++++++++++++++++++
 zhilova/07_mgf              | 22 +++++++++++
 zhilova/08_jensen           | 23 ++++++++++++
 5 files changed, 249 insertions(+)
 create mode 100644 zhilova/04_events
 create mode 100644 zhilova/05_random_variables
 create mode 100644 zhilova/06_ev
 create mode 100644 zhilova/07_mgf
 create mode 100644 zhilova/08_jensen

(limited to 'zhilova')

diff --git a/zhilova/04_events b/zhilova/04_events
new file mode 100644
index 0000000..551d4cc
--- /dev/null
+++ b/zhilova/04_events
@@ -0,0 +1,89 @@
+Bayes' Theorem is useful for determining something like ``how likely is
+XYZ to have disease A if they pass test B?'' because it lets us convert
+coditionals in the other direction (e.g. test given disease).
+
+    Independent Random Events
+(C, \bb B, P) is a probability space
+With A, B \in \bb B and A, B \subseteq C, they are independent iff
+P(A\cap B) = P(A)P(B).
+
+A group of events Ai, ... An in \bb B is
+
+(1) pairwise independent iff P(A_i \cap A_j) = P(A_i)P(A_j) (i \neq j).
+
+(2) triplewise independent iff P(A_i \cap A_j \cap A_k) =
+P(A_i)P(A_j)P(A_k) (i \neq j \neq k \neq i).
+
+(3) mutually independent iff for all subsets C of {A1, ..., An},
+P(intersection of C) = product of all P(A) where A in C.
+
+3 implies 2 and 1, but 2 doesn't imply 1.
+
+Independence can also be defined equivalently as:
+P(A | C) = P(A)
+
+A,B are conditionally independent if P(A\cap B | C) = P(A|C)P(B|C)
+
+    Random Variables
+
+[What lol]
+
+X = X(w) : C \mapsto D where D is the range of X.
+
+Inverse functions can exist, I guess.
+
+P_X(A) = P({all w : X(w) in A})
+
+Key Properties
+
+1) P_X(A) is a probability set function on D.
+2) P_X(A) \geq 0
+3) P_x(D) = 1
+4) P_x(empty) = 0
+5+ P_x=(A) = 1 - P_x(D \setminus A)
+6,7) monotonicity, sigma-additivitiy.
+
+    Discrete r.v. have countable domain.
+Ex: Binomial r.v.
+
+X ~ Binomial(n, p)
+n in N, p in (0,1)
+
+D = {0, 1, ... n}
+
+P(X = x) = (n choose x)p^x(1-p)^{n-x}
+
+X ~ Poisson(\lambda)
+
+D = N^+.
+
+P(X = x) = \lambda^x e^{-\lambda}/x!
+
+    Probability Mass Function (pmf)
+
+For r.v. with countable domain D,
+
+P_X(x) := P(X = x) (if x \in D, 0 otherwise)
+
+Properties of P_X(x), x \in D:
+    (Correspond directly to probability set function properties)
+
+1) Typically, P_X(x) > 0 forall x \in D. >= 0 also acceptable.
+
+2) sum over all x of D P_x(x) gives 1.
+
+3) {X in A} equivalent to {w in C : X(w) in A}
+
+    r.v. of continuous type
+Ex: Let X uniformly take values in [0, 1].
+P(X in (a, b]) = b - a. 0 \leq a < b \leq 1.
+
+    Cumulative distribution type
+Defined for discrete and continuous type r.v.
+
+F_X(x) := P(X \leq x).
+
+F_X : R -> [0,1] [couldn't it be from any ordered domain?]
+1) 0 \leq F_X \leq 1
+2) non-decreasing
+3) right-continuous
diff --git a/zhilova/05_random_variables b/zhilova/05_random_variables
new file mode 100644
index 0000000..fbf8bc0
--- /dev/null
+++ b/zhilova/05_random_variables
@@ -0,0 +1,48 @@
+    Cumulative Distribution Function (CDF)
+Def: CDF of a r.v. X, taking values in R is
+F_X(x) = \Pr(X\leq x) = \Pr(X\in (-\infty, x] ) % to appease vim, ')'
+
+Th 1.5.1 (Properties of a CDF)
+0) 0 \leq F_X(x) \leq 1 \forall x \in R
+1) It is non-decreasing. x_1 \leq x_2 \in A, F_X(x_1) \leq F_X(x_2).
+2) F_X(x) -> 0 as x -> -\infty
+3) F_X(x) -> 1 as x -> +\infty
+4) F_X(x) is right-continuous.
+
+    Continuous R.V.
+Over an uncountable domain D like (0, 1), R.
+
+Let there be a CDF F_X(x) = P(X \leq x).
+
+Assume there exists f_X(x) := d/dx F_X(x), the probability density
+function.
+[discontinuities might be able to be resolved with a delta function]
+By the second fundamental theorem of calculus (?),
+F_X(x) = P(X \leq x) = \int_{-\infty}^\infty f_x(t) dt.
+
+In the discrete case, we have the pmf (probability mass function)
+where P_x(t) = P(X = t)
+
+P(a < X \leq b) for a < b = P_X(b) - P_X(a).
+
+Examples:
+- Uniform Distribution
+X ~ U[a, b]
+ = { 1/(b-a) for a \leq x \leq b
+   { 0 otherwise.
+
+- Exponential Distribution
+X ~ Exp(\lambda) \lambda > 0
+f_X(x) = { \lambda e^{-\lambda x}, x \geq 0
+         { 0 otherwise
+
+F_X(x) = { 1 - e^{-\lambda x}, x \geq 0
+         { 0 otherwise
+
+- Normal Distribution
+X ~ N(\mu, \sigma^2) \mu \in R, \sigma^2 > 0.
+\sigma = stdev. \sigma^2 = variance. \mu = mean/center.
+
+f_X(x) = 1/\sqrt{2\pi \sigma^2}  exp( - (x-\mu)^2 / {2\sigma^2} )
+
+F_X(x) = \int_{-\infty}^x f_X(x) dx
diff --git a/zhilova/06_ev b/zhilova/06_ev
new file mode 100644
index 0000000..c13c159
--- /dev/null
+++ b/zhilova/06_ev
@@ -0,0 +1,67 @@
+    Expectation/Expected Value/Mean Value/Average of an r.v.:
+    (Does not exist for all r.v.)
+We must assume that \int_{-\infty}^\infty |x|f_x(x) dx < \infty, so
+
+E(X) := \int_{-\infty}^\infty xf_x(x) dx
+= {\bb E} X = E X.
+
+If discrete,
+E(X) = \sum_{x\in D} xp_x(x)
+
+    Higher (order) moments of X
+moment of kth order := {\bb E}(X^k)
+Again, they do not always exist, but they do exist if {\bb E}(|X^k|)
+exists.
+
+    Variance/dispersion of X
+Var(X) = {\bb E}(X - {\bb E} X)^2
+aka quadratic deviation
+\def\exp{{\bb E}}
+
+Thm: [ proof in textbook ] (1)
+g : R \mapsto R.
+
+Let \int |g(x)| f_x(x) < \infty
+Therefore, \exp g(X) = \int_{-\infty}^\infty g(x)f_x(x) dx
+
+Ex:
+    \exp X^2 = \int x^2 f_x(x) dx
+    \exp(X-a) = \int (x-a) f_x(x) dx
+    \exp\sin X = \int sin x f_x(x) dx
+
+    Stdev
+Stdev := \sqrt{Var(x)}
+
+    Properties of E(x)
+1) Linearity
+    Where E(X), E(Y) exist, and a, b \in R
+    E(aX + bY) = aE(X) + bE(Y)
+    By thm (1), \int axf_x(x) dx = a \int xf_x(x) dx.
+2) E(a) = a
+3) If g(x) \geq 0, E(g(X)) \geq 0, regardless of X.
+
+Example application:
+Var(X)
+= E [X - E[X]]^2
+= E [ X^2 - 2X * E[X] + [E[X]]^2 ]
+= E[X^2] - 2E[X]^2 + [E[X]]^2
+           ^ linearity applied with E[X] as constant
+= E[X^2] - E[X]^2
+
+On the reals (by property 3),
+Var(X) \geq 0
+\to E(X^2) - E(X)^2 \geq 0
+\to E(X^2) \geq E(X)^2 [equality is strict unless X = a]
+
+More example:
+Var(aX) = E[aX]^2 - (E[aX])^2
+        = E[a^2X^2] - (aE[X])^2
+        = a^2E[X^2] - a^2E[X]^2
+        = a^2(Var(X))
+
+Definitions:
+1) centering: X - \exp X. \exp[X - \exp X] = 0.
+2) rescaling: With c>0, cX. Var(cX) = c^2 Var X.
+3) centering and standardization: centering and rescaling s.t.
+Var(Y) = 1.
+    Y = (X - \exp X)/\sqrt{Var X}
diff --git a/zhilova/07_mgf b/zhilova/07_mgf
new file mode 100644
index 0000000..5d5a007
--- /dev/null
+++ b/zhilova/07_mgf
@@ -0,0 +1,22 @@
+    Moment-generating Function
+(Still technically lecture #6 but very different topic)
+X := real r.v.
+M_X(t) = \exp e^{tX} where t \in R.
+Defined if \int_{-\infty}^\infty e^{tx} f_x(x) dx < \infty
+    for t \in (-h, h) for some h > 0. [I can't remember why the region
+    of convergence is symmetric about 0, but I remember some thm. about
+    that]
+
+e^{tx} gives a nice Taylor series.
+For M_X(t) around 0,
+M_X(t) = M_X(0) + M_X'(0) t + M_X''(0)t^2/2 + M_X'''(0) t^3/3! + ...
+M_X^{(k)}(t) = {d^k\over dt^k} \exp{e^{tX}} = {d^k\over dt^k}
+\int_{-\infty}^\infty e^{tx} f_x(x) dx
+= \int_{-\infty}^\infty x^k e^{tx} f_x(x) dx
+= \exp[X^k e^{tX}]
+    = [with t = 0] \exp[X^k].
+
+Why is it useful?
+Example: X ~ N(\mu, \sigma^2) *can* have its moments computed by
+integration-by-parts (probably table method), but the mgf can be used
+instead, which makes the determination easier.
diff --git a/zhilova/08_jensen b/zhilova/08_jensen
new file mode 100644
index 0000000..20a8158
--- /dev/null
+++ b/zhilova/08_jensen
@@ -0,0 +1,23 @@
+Definition: A fn. f : R -> R is called convex on an interval (a,b) if
+f(cx + dy) \leq cf(x) + df(y)
+\forall x, y \in (a, b)
+\forall c \in (0, 1), d = 1-c.
+Concave is -convex.
+
+Essentially stating that the function lies on or below a line segment
+connecting f(a) and f(b) [or above in the case of concave].
+
+Strictly convex: f(cx+dy) < cf(x) + df(y).
+
+    Jensen's Inequality
+X - r.v., E|X| < infty. E|f(x)| < infty.
+    f(E X) \leq E(f(x)).
+If f is strictly convex, \leq -> "less than" unless X is a constant r.v.
+
+Further theorems:
+(1) If f is differentiable on (a,b),
+f is convex <=> f' is nondecreasing on (a,b).
+f is strictly convex <=> f' is strictly increasing on (a,b)
+(2) If f is twice differentiable on (a,b)
+f is convex <=> f'' \geq 0 on (a,b)
+f is strictly convex <=> f'' > 0 on (a,b)
-- 
cgit