\[p(⚀) + p(⚁) + p(⚂) + p(⚃) + p(⚄) + p(⚅) = \sum_{i = 0}^{n} p(x_i) = 1\]
\[ = \int p(x)dx = 1\]
\[p(A, B) = p(B, A)\]
\[p(B|A) = p(A, B)/P(A)\]
\[p(A|B) = \frac{p(A, B)}{p(B)}\Rightarrow p(A|B) \times p(B) = p(A, B)\] \[p(B|A) = \frac{p(B, A)}{p(A)}\Rightarrow p(B|A) \times p(A) = p(B, A)\] \[p(A|B) \times p(B) = p(B|A) \times p(A)\] \[p(A|B) = \frac{p(B|A)p(A)}{p(B)}\]
\[p(A|B) = \frac{p(B|A)p(A)}{\sum_{i=1}^{n} p(B, a_i) \times p(a_i)}\]
\[p(A|B) = \frac{p(B|A)p(A)}{\int p(B, a) \times p(a)da}\]
what is happening in numerator:
what is happening during the division:
\[\frac{w}{w+x+y+z}:\frac{w+x}{w+x+y+z} = \frac{w}{w+x}\]
\[p(θ|Data) = \frac{p(Data|θ)\times p(θ)}{p(Data)}\]
Bayes’ rule gets us from a prior belief, p(θ), to a posterior belief, p(θ|D), when we take into account some data D. Now suppose we observe some more data, which we’ll denote D’. We can then update our beliefs again, from p(θ|D) to p(θ|D’, D). Does our final belief depend on whether we update with D first and D’ second, or update with D’ first and D second?
\[p(θ|Data, Data') = p(θ|Data', Data)\]
So for correct modeling we need to know destribution family and conjugate prior:
\[P(k | n, θ) = \frac{n!}{k!(n-k)!} \times θ^k \times (1-θ)^{n-k} = {n \choose k} \times θ^k \times (1-θ)^{n-k}\] \[ 0 \leq θ \leq 1; n, k > 0\]
\[P(x; α, β) = \frac{x^{α-1}\times (1-x)^{β-1}}{B(α, β)}; 0 \leq x \leq 1; α, β > 0\] Beta function: \[Β(α, β) = \frac{Γ(α)\times Γ(β)}{Γ(α+β)} = \frac{(α-1)!(β-1)!}{(α+β-1)!} \]
In corpus based frequency dictionary noun не has frequency 0.05389. In some text (61981 words) this preposition appears 2540 times. How does it change our prior knoledge?
p(Data) — … we don’t need it since numerator is Beta destribution
Alpha and beta for prior destribution \[\mu = \frac{\alpha}{\alpha+\beta} \Rightarrow \alpha = \mu \times (\alpha+\beta) = 6198 \times 0.05389 \approx 3330.156\] \[\beta = 6198 - 3340.156 = 2857.844\]
Alpha and beta for posterior destribution \[\alpha_{posterior} = \alpha_{prior} + \# success = 3340.156 + 2540 \] \[\beta_{posterior} = \beta_{prior} + \# failure = 2857.844 + 6198 - 2540\]
Then we could calculate proportion of “не” in each of them. After that it is possible to approximate a Beta destribution from the destribution of “не” proportions from different novels and use it as a prior. What we will get is a shrinked mean. The true outliers will be vissible then.
The easeast and the worst way to fit the Beta destribution. From this systme of equations:
\[\mu = \frac{\alpha}{\alpha+\beta}\] \[\sigma = \frac{\alpha\times\beta}{(\alpha+\beta)^2\times(\alpha+\beta+1)}\]
…we can get \(\alpha\) and \(\beta\):
\[\alpha = \left(\frac{1-\mu}{\sigma^2} - \frac{1}{\mu}\right)\times \mu^2\] \[\beta = \alpha\times\left(\frac{1}{\mu} - 1\right)\]