The main reason I love emacs is because sometimes, I will inadvertantly hit a strange key combination and -- something unexpected will happen! Today I learned through this method that M-c will capitalize a word. That's a feature I don't need often, but whenever I do -- emacs will be there!
Update: I learned this while preparing to work on creating slides from a paper. A few minutes later, as I was collecting all statements of each main idea from my tex file and assimilating them into one single, complete statement for each main idea, I was about to manually capitalize a word but I remembered in time: Meta-c!
Thursday, July 21, 2011
Monday, July 18, 2011
Bayes vs. Markov
I was perhaps unjustifiably surprised as I was going through the Naive Bayes classifier model to find that it looks very similar to something I'm already quite familiar with. Basically, if you start from Bayes' Theorem and go one direction (conditional probability), then make independence assumptions, you end up with the model for the NB classifier. If you go a different direction (chain rule) and then make independency assumptions, you end up with a Markov model. I'm guessing a lot of other models are quite similar too...
Prior, posterior
I don't know a lot about Bayesian statistics, but I'd like to understand a few terms. I often hear "prior" and "posterior" thrown around, and here's my understanding of them after a look at Wikipedia:
It seems the prior (or prior probability) is the measure of uncertainness of an event without taking any evidence (specific features) into account.
Apparently the posterior (or again, posterior probability) is the conditional probability assigned after relevant evidence is taken into account.
So, now to construct an example that illustrates what I currently believe about these concepts: If, in a given corpus, 50% of the tokens are determiners, then the chance of selecting a token you know nothing about it and finding it to be a determiner is 50%. I believe that's the prior. However, if 70% of tokens occurring after verbs are determiners, then the posterior probability is the conditional probability P(determiner|verb) ["probability of a determiner given a verb"], so 70%.
In Bayes Theorem, which is the one part of Bayesian anything that I am one might almost say *too* familiar with, the prior is multiplied by the likelihood function and then normalized to obtain the posterior. So:
or, equivalently:
However, one confusing segment of the Wikipedia entry for a prior is:
"of an uncertain quantity p (for example, suppose p is the proportion of voters who will vote for the politician named Smith in a future election) is the probability distribution that would express one's uncertainty about p before the "data" (for example, an opinion poll) is taken into account."
That seems to suggest that we can't take *any* data into account in order to find it. Don't we then just have to guess? Sounds like more reading may be in order...
It seems the prior (or prior probability) is the measure of uncertainness of an event without taking any evidence (specific features) into account.
Apparently the posterior (or again, posterior probability) is the conditional probability assigned after relevant evidence is taken into account.
So, now to construct an example that illustrates what I currently believe about these concepts: If, in a given corpus, 50% of the tokens are determiners, then the chance of selecting a token you know nothing about it and finding it to be a determiner is 50%. I believe that's the prior. However, if 70% of tokens occurring after verbs are determiners, then the posterior probability is the conditional probability P(determiner|verb) ["probability of a determiner given a verb"], so 70%.
In Bayes Theorem, which is the one part of Bayesian anything that I am one might almost say *too* familiar with, the prior is multiplied by the likelihood function and then normalized to obtain the posterior. So:
or, equivalently:
However, one confusing segment of the Wikipedia entry for a prior is:
"of an uncertain quantity p (for example, suppose p is the proportion of voters who will vote for the politician named Smith in a future election) is the probability distribution that would express one's uncertainty about p before the "data" (for example, an opinion poll) is taken into account."
That seems to suggest that we can't take *any* data into account in order to find it. Don't we then just have to guess? Sounds like more reading may be in order...
Sunday, July 17, 2011
Short vs. long papers
I'm confused about short vs. long papers. I suppose a publication in a given venue is a publication in a given venue, but are long papers more prestigious than short papers?
This varies a lot. It seems like in typical conferences, it's 4 vs. 8 pages. For RANLP, it's 7 vs 8 pages, so when I had a paper accepted as a short paper, I just had to shorten it a page. For IWPT, it's 4 vs 10! A 6-page difference?
I wish I had a wide readership I could query about their thoughts on this...
This varies a lot. It seems like in typical conferences, it's 4 vs. 8 pages. For RANLP, it's 7 vs 8 pages, so when I had a paper accepted as a short paper, I just had to shorten it a page. For IWPT, it's 4 vs 10! A 6-page difference?
I wish I had a wide readership I could query about their thoughts on this...
Saturday, July 16, 2011
Cluster, I hate you.
Working with huge datasets as I do, I have to use my school's cluster computing environment, which is really different from any other computer with which I am familiar. For example, in order to use programs such as emacs, svn, or java, I have to enable them using SoftEnv. This has to be done either each time you log on or with a startup script. I've been doing the former since basically all I use are emacs, svn, and java, and usually not all every time I log in, but planning to do the latter eventually... (Hint: This is foreshadowing about how my laziness may have been my salvation.)
Yesterday, late at night, I was struggling to do something I had done successfully before and kept getting bizarre errors. I finally tried unsuccessfully one last time to run my script, gave up and went to bed. This morning I began to tackle the problem again. My first step was to verify the error by rerunning the script.
Um....what error? Errors, I hate you when you exist, but I hate you even more when by not existing, you make me look crazy.
When I had slept and could think clearly, I found the problem was that, since I had to add java yesterday in order to use javac, it messed up my ability to use a particular jarfile by being the wrong version. Java versions, I hate you.
The lessons I learn from this are: First, on the cluster, if I'm having problems running something I have run before, I should log on in a new shell and see if that fixed the problem. Second, if I decrease my announced memory requirements and do all my processing at night, my process won't sit in the queue as long, so I'll find out more quickly if there are any immediate problems I need to solve.
Yesterday, late at night, I was struggling to do something I had done successfully before and kept getting bizarre errors. I finally tried unsuccessfully one last time to run my script, gave up and went to bed. This morning I began to tackle the problem again. My first step was to verify the error by rerunning the script.
Um....what error? Errors, I hate you when you exist, but I hate you even more when by not existing, you make me look crazy.
When I had slept and could think clearly, I found the problem was that, since I had to add java yesterday in order to use javac, it messed up my ability to use a particular jarfile by being the wrong version. Java versions, I hate you.
The lessons I learn from this are: First, on the cluster, if I'm having problems running something I have run before, I should log on in a new shell and see if that fixed the problem. Second, if I decrease my announced memory requirements and do all my processing at night, my process won't sit in the queue as long, so I'll find out more quickly if there are any immediate problems I need to solve.
Subscribe to:
Posts (Atom)