I don't know a lot about Bayesian statistics, but I'd like to understand a few terms. I often hear "prior" and "posterior" thrown around, and here's my understanding of them after a look at Wikipedia:
It seems the prior (or prior probability) is the measure of uncertainness of an event without taking any evidence (specific features) into account.
Apparently the posterior (or again, posterior probability) is the conditional probability assigned after relevant evidence is taken into account.
So, now to construct an example that illustrates what I currently believe about these concepts: If, in a given corpus, 50% of the tokens are determiners, then the chance of selecting a token you know nothing about it and finding it to be a determiner is 50%. I believe that's the prior. However, if 70% of tokens occurring after verbs are determiners, then the posterior probability is the conditional probability P(determiner|verb) ["probability of a determiner given a verb"], so 70%.
In Bayes Theorem, which is the one part of Bayesian anything that I am one might almost say *too* familiar with, the prior is multiplied by the likelihood function and then normalized to obtain the posterior. So:
However, one confusing segment of the Wikipedia entry for a prior is:
"of an uncertain quantity p (for example, suppose p is the proportion of voters who will vote for the politician named Smith in a future election) is the probability distribution that would express one's uncertainty about p before the "data" (for example, an opinion poll) is taken into account."
That seems to suggest that we can't take *any* data into account in order to find it. Don't we then just have to guess? Sounds like more reading may be in order...