regression.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

\font\bbten=msbm10
\font\bbsev=msbm7
\font\bbfiv=msbm5
\newfam\bbold
\textfont\bbold=\bbten
\scriptfont\bbold=\bbsev
\scriptscriptfont\bbold=\bbfiv
\def\bb{\fam\bbold}
\def\bmatrix#1{\left[\matrix{#1}\right]}
\def\fr#1#2{{#1\over#2}}
\def\E{{\bb E}}
\def\var{\mathop{\rm var}}
\def\cov{\mathop{\rm cov}}

We want to find a quadratic regression model (with errors) for a dataset
$$\{(X_1,Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)\}$$
where $X_k$ is a constant and $Y_k$ is a set of independent and
identically distributed random variables with
$$Y_k \sim {\cal N}(aX_1^2 + bX_1 + c; \sigma^2),$$
where a, b, c, and $\sigma$ are unknown model parameters.

We use a least-squared estimator for the quadratic model parameters,
giving $\hat a,$ $\hat b,$ and $\hat c$ from the least square solution
to the following linear system:
$$
\bmatrix{X_1^2&X_1&1\cr X_2^2&X_2&1\cr&\vdots\cr X_n^2&X_n&1}
\bmatrix{\hat a\cr\hat b\cr\hat c} =
\bmatrix{Y_1\cr Y_2\cr\vdots\cr Y_n}
$$

With $v_k$ being the $k$th column of the leftmost matrix $X$ and $y$
being the vector of $Y_k$ on the right, we get (note that these
approximate parameters are also random variables) the following
estimators:

% ugh they're all wrong but \hat a. Why do I not understand vectors.
$$\hat a = {v_1\cdot y\over ||v_1||} = \vec a\cdot y \qquad
\hat b = {v_2\cdot(y-\hat a v_1)\over ||v_2||} = {v_2\cdot(y-\vec a\cdot
yv_1)\over ||v_2||} = y\cdot{v_2-\vec a(v_1\cdot v_2)\over||v_2||} =
\vec b\cdot y
$$
$$\hat c = {v_3\cdot (y-\vec a\cdot y v_1-\vec b\cdot y v_2)\over
||v_3||} = y\cdot{v_3 - \vec a(v_1\cdot v_3) - \vec b(v_2\cdot v_3)\over
||v_3||} = \vec c\cdot y$$

Note that $\vec a,$ $\vec b,$ and $\vec c$ are constants determined only
by the values of $(X_1, X_2, \cdots, X_n).$

We now want to find an estimator for $\sigma^2,$ so we'll start by
treating the regression parameters as known and then adopting it to the
estimates we have. The derivation is omitted, but if the regression
parameters were known, the following is a fairly straightforward
most-likely-estimator.
$$\sigma^2 = \sum_{i=1}^n {(Y_i - aX_i^2 - bX_i - c)^2\over n}$$
From this we obtain statistic
$$S^2 = \sum_{i=1}^n {(Y_i - \hat a X_i^2 - \hat b X_i - \hat c)^2\over n}
= \sum_{i=1}^n {(y\cdot(\vec e_i - \vec a X_i^2 - \vec b X_i - \vec c))^2\over n}$$

Let $\vec R_i := e_i - \vec a X_i^2 - \vec b X_i - \vec c,$
giving a matrix $R$ where the $i$th row is $R_i.$
This sum is, then, $(Ry)\cdot(Ry).$
To find the expectation of this estimator to check if it's biased, we
take $$\E((Ry)^2) = \E((Ry)^2) - \E(Ry)^2 + \E(Ry)^2 = y^TR^TRy -
(R\E(y))^2 = \sum_{1\leq i,j\leq
n}\E(y_ir_{k_i}y_jr_{k_j})-\E(Y_i\cdot r_{k_i})\E(Y_j\cdot r_{k_j}) - (R\E(y))^2
$$$$
= \sum_{1\leq i,j\leq n} r_{k_i}r_{k_j}\cov(Y_i,Y_j) - (R\E(y))^2
= \sum_{1\leq i\leq n} r_{k_i}^2\sigma^2,$$
because $\cov(Y_i,Y_j)$ is $\sigma^2$ if $i=j,$ and $0$ otherwise (by
independence). $R\E(y) = (I-XX^T)X\bmatrix{a\cr b\cr c}.$

\iffalse
We might expand this statistic to
$$S^2 = \fr1n\sum_{1\leq i,j,k \leq n} Y_j\cdot R_{i_j} Y_k\cdot R_{i_k}$$
To transform this into an unbiased estimator, if it even is a decent
estimator, we try to find its expectation
$$\E[S^2] = \fr1n\sum_{1\leq i,j\leq n} \E[(Y_j R_{i_j})^2]
+
\fr1n\sum_{1\leq i,j\neq k \leq n} \E[Y_j R_{i_j}]\E[Y_k R_{i_k}]
$$$$
= \fr1n\sum_{1\leq i,j\leq n} [R_{i_j}^2\sigma^2 + R_{i_j}^2(aX_j^2+bX_j+c)^2]
+
\fr1n\sum_{1\leq i,j\neq k \leq n} \E[Y_jR_{i_j}]\E[Y_kR_{i_k}]
$$$$
= \sigma^2\sum_{1\leq i,j\leq n} R_{i_j}^2
+
\sum_{1\leq i,j,k\leq n}
R_{i_j}(aX_j^2+bX_j+c)R_{i_k}(aX_k^2+bX_k+c)
$$
\fi

\bye