[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/naive_bayeee.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Naive Bayes" type: concept related: [Maximum Likelihood Parameter Learning, Bayes Theorem] source: https://www.jemoka.com/posts/kbhnaive_bayeee/ confidence: high status: active --- constituents \(x \in \qty {0,1}^{m}\) for \(m\) features \(y \in \qty {0,1}\) for labels \(\phi_{j|y=1}\), for \(p\qty(x_{j} = 1 | y=1)\) \(\phi_{j|y=0}\), for \(p\qty(x_{j} = 1 | y=0)\) \(\phi_{y}\), for \(p\qty(y=1)\), the “cost prior” assumption ASSUME: features in \(x\) are conditionally independent given \(y\) That is, we assume that: \begin{align} p\qty(x|y) &= p\qty(x_1, x_2, \dots, x_{1000} | y) \\ &= p\qty(x_1|y) p\qty(x_2|y, x_1) p\qty(x_3|y, x_1, x_2) \dots p\qty(x_{1000}|y, x_1, \dots, x_{999}) \end{align} This is insane to compute! But if we assume all \(x\) slots are conditionally independent, we write: \begin{equation} p\qty(x|y) = p\qty(x_1|y) p\qty(x_2|y) \dots = \prod_{j=1}^{n} p\qty(x_{j}|y) \end{equation} requirements To figure out best parameters for Maximum Likelihood Parameter Learning: \begin{equation} \mathcal{L}\qty(\phi) = \prod_{i=1}^{n} p\qty(x^{(i)}, y^{(i)} \mid \phi) \end{equation} You get exactly what you expect: \begin{equation} \phi_{y} = p\qty(y=1) = \frac{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}}{n} \end{equation} \begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation} \begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation} and you can just check if \(p\qty(y|x)\) more likely using Bayes rule. additional information pseudocounting One problem with this approach is that it won’t handle OOD text that well. In particular, suppose you never see a particular feature being \(1\): \begin{equation} p\qty(x_{k} | y= 1) = 0 \end{equation} for some \(k\). So in practice, we actually estimate probability to add pseudocounts: Laplace Smoothing \begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{1+\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{2+\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation} \begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{1+ \sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{2+ \sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation}