so there...
guess i need someplace like this to put textual thoughts up, not just images, as i do over at smudo.org
i’m on vacation. but still at work. need time to write on my thesis. today’s topic is pre-tagging and the creation of text corpora.
the general idea behind pre-tagging is to have a classifier, possibly trained on texts from a different domain, tag a new text prior to a human annotator dealing with the text in order create a high quality marked-up corpus.
now, there exist two, seemingly valid opinions regarding this: use pre-tagging, and do not use pre-tagging.
the people advocating the former view claim that pre-tagging may turn the process of manually marking up text into one of manually revising the text, which would reduce the burden for the human annotator.
contrary, people advocating the latter view claim that pre-tagging introduce a bias which affect the human annotator in such a way that he will fail to mark up things in the text that he would have seen had he been presented the raw text instead.
i know that it is very unlikely that someone will read this post, and if some one do, the chance that he or she will have anything to say about its content is small, next to none. but anyways, do you know of any references validating either of the points concerning pre-tagging, please let me know.
cheers,
f