![]() Why my recommendation is to just use a simple and fast tagger that’s roughly as Unfortunately accuracies have been fairly flat for the last ten years. ![]() Tags, and the taggers all perform much worse on out-of-domain data. My parser is about 1% more accurate if the input has hand-labelled POS It’s tempting to look at 97% accuracy and say something similar, but that’s not To be irrelevant it won’t be your bottleneck. The 4s includes initialisation time - the actual per-token speed is high enough If you do all that, you’ll find your tagger easy to write and understand, and anĮfficient Cython implementation will perform as follows on the standardĮvaluation, 130,000 words of text from the Wall Street Journal: Tagger Probably shouldn’t bother with any kind of search strategy you should just use a About 50% of the words can be tagged that way.Īnd unless you really, really can’t do without an extra 0.1% of accuracy, you Have unambiguous tags, so you don’t have to do anything but output their tags Then you can lower-case yourįor efficiency, you should figure out which frequent words in your training data Instead, features that ask “how frequently is this word title-cased, inĪ large sample from the web?” work well. Them because they’ll make you over-fit to the conventions of your trainingĭomain. If you only need the tagger to work on carefully edited text, you should useĬase-sensitive features, but if you want a more robust tagger you should avoid You should use two tags of history, and features derived from the Brown word Ignore the others and just use Averaged Perceptron. There are a tonne of “best known techniques” for POS tagging, and you should Recommendations suck, so here’s how to write a good part-of-speech tagger. We don’t want to stick our necks out too much. And academics are mostly pretty self-conscious when we write. Up-to-date knowledge about natural language processing is mostly locked away inĪcademia.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |