Because our training does not contain all possible combinations, our models are especially poor when there are only zero or one occurrences of a POS sequence. To combat this we experimented with smoothing the POS sequence frequencies. We used two forms of smoothing. First Good-Turing smoothing [3] to ensure there were no contexts with zero occurrences, and then a form of back-off smoothing [6], i.e. using progressively smaller contexts to estimate the frequency of a context when only a few actually existed in the training set.
The results of smoothing slightly improved our results. Again given tagset 23, and a 6-gram, and POS sequence model of two before and one following gives results of: