I got this up and running fairly quickly. It came in a nice jar file, and at first I was confused about the input format because it seemed that it would accept only XML format, but I couldn't figure out how to structure it. Finally, all else failed and I looked in the user manual, which told me how to run it on a plain text example. I quickly patched together a test text file featuring the sentence:
This is a test of using Morphadorner to adorn plain english (modern) texts.
ran it, and got the following output:
This This d This this 0
is is vbz is be 0
a a dt a a 0
test test n1 test test 0
of of pp-f of of 0
using using vvg using use 0
Morphadorner Morphadorner n1 Morphadorner morphadorner 0
to to pc-acp to to 0
adorn adorn vvi adorn adorn 0
plain plain j plain plain 0
english english n1 english english 0
( ( ( ( ( 0
modern modern j modern modern 0
) ) ) ) ) 0
texts texts n2 texts text 0
. . . . . 1
Somewhere in the user manual I found an explanation, not of the fields themselves in field order, but the sort of information that the fields might contain, and was able to match each field to the matching definition.
It was then that I realized a lemmatizer is not a morphological analyzer...
No comments:
Post a Comment