Monday, March 25, 2013

Traversing the HTK Tutorial: A Comedy of Errors

1. Create a grammar, which can then be visualize a word network:

Pillar-of-Autumn:SR rcantrel$ HParse gram wdnet
Segmentation fault: 11



Gr.... I verified using trace (-T 1) that it is reading the grammer:

Pillar-of-Autumn:SR rcantrel$ HParse -T 1 gram wdnet2
Creating HParse net from file gram
Generating Lattice with 5 nodes and 4 links
Writing Word Lattice to wdnet2
Segmentation fault: 11


Just to check what happens with an erroneous dictionary, I deleted a semicolon:

Pillar-of-Autumn:SR rcantrel$ HParse -T 1 gram wdnet2
Creating HParse net from file gram
  ERROR [+3130]  PVariable: Variable digit is undefined
 FATAL ERROR - Terminating program HParse


Fine, I decided to move on. (To be clear, that's the short (very short) version of how I decided to move on.)

2. Create the dictionary:

They have a set of steps that I can use to create my wordlist from sentences, but I decided to strip this one down to its essentials. Emotionally, I needed a win, and I was determined to get one. So I created a tiny dictionary:

ONE

I downloaded the beep dictionary. After several trials, realized that I needed to sort the dictionary in dictionary order:

Pillar-of-Autumn:SR rcantrel$ sort -d beep/beep-1.0  > beepsorted3
Pillar-of-Autumn:SR rcantrel$ HDMan -m -w wlist -l glorp test beepsorted3
Pillar-of-Autumn:SR rcantrel$
Okay, that was disturbingly silent...

Sure enough, even though the beep dictionary contained all the words in my wordlist, it couldn't find a single one!

Missing Words
-------------
ONE

Dictionary Usage Statistics
---------------------------
  Dictionary    TotalWords WordsUsed  TotalProns PronsUsed
 beepsorted3         0          0          0          0
        test         0          0          0          0

1 words required, 1 missing


Dictionary test created


At this point, I decided to move on...to brownies. Yummy delicious brownies. Then, while enjoying brownies, I discovered that the beep dictionary format isn't correct. I downloaded voxforge lexicon (Thanks, htk julius tutorial!) and I had my win in the form of a dictionary:

ONE             [ONE]           w ah n sp

5 comments:

  1. I have the same problem with the HParse command (the segmentation fault 11 problem)

    the problem is in this 2 lines
    OutputIntField('S',st,format&HLAT_LBIN,"%-4d",file);
    OutputIntField('E',en,format&HLAT_LBIN,"%-4d",file);

    In the file HNet.c, in the method WriteOneLattice(...).

    If you comment those lines, it works (but it doesn't print everything of course)

    Thats all i have for now :)

    Did you solve it?

    ReplyDelete
    Replies
    1. I managed to get it to run by putting in a bunch of checks for things being null, and it printed a network but it wasn't a complete network. I ended up figuring out how to manually create a network, and by extension how to create a python script that would create the network, but I haven't done the work of the script. An example of a network, taken from http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node130.html, is:

      # Define size of network: N=num nodes and L=num arcs
      N=4 L=8
      # List nodes: I=node-number, W=word
      I=0 W=start
      I=1 W=end
      I=2 W=bit
      I=3 W=but
      # List arcs: J=arc-number, S=start-node, E=end-node
      J=0 S=0 E=2
      J=1 S=0 E=3
      J=2 S=3 E=1
      J=3 S=2 E=1
      J=4 S=2 E=3
      J=5 S=3 E=3
      J=6 S=3 E=2
      J=7 S=2 E=2

      Delete
  2. checkout man.as/htk-tutorial also :)

    (you can use google translation mechanism to read it in some kind of english)

    ReplyDelete
    Replies
    1. Thanks, I'll check that out! I think I found most of the good English resources, so I'm very happy to be pointed to ones that I wouldn't have found with an English search. Now we'll see how that translation is... :D

      Delete
  3. I had the very same segmentation fault. Following what was done here
    http://pbrusco.github.io/htk.html
    I installed Ubuntu on my mac. The HParsing was silent. After all the unavoidable Seg Fault on the mac, it's a horror movie type of silense.

    ReplyDelete