Monday, June 13, 2011


I got this up and running fairly quickly. It came in a nice jar file, and at first I was confused about the input format because it seemed that it would accept only XML format, but I couldn't figure out how to structure it. Finally, all else failed and I looked in the user manual, which told me how to run it on a plain text example. I quickly patched together a test text file featuring the sentence:

This is a test of using Morphadorner to adorn plain english (modern) texts.

ran it, and got the following output:

This    This    d       This    this    0
is      is      vbz     is      be      0
a       a       dt      a       a       0
test    test    n1      test    test    0
of      of      pp-f    of      of      0
using   using   vvg     using   use     0
Morphadorner    Morphadorner    n1      Morphadorner    morphadorner    0
to      to      pc-acp  to      to      0
adorn   adorn   vvi     adorn   adorn   0
plain   plain   j       plain   plain   0
english english n1      english english 0
(       (       (       (       (       0
modern  modern  j       modern  modern  0
)       )       )       )       )       0
texts   texts   n2      texts   text    0
.       .       .       .       .       1

Somewhere in the user manual I found an explanation, not of the fields themselves in field order, but the sort of information that the fields might contain, and was able to match each field to the matching definition.

It was then that I realized a lemmatizer is not a morphological analyzer...

