AI RESEARCH
Chinese Word Boundary Recovery through Character Alignment Projection
arXiv CS.CL
•
ArXi:2605.28128v1 Announce Type: new Chinese word segmentation is especially fragile in non-standard text, where language learner errors and other character-level divergences disrupt the word boundaries assumed by downstream annotation and evaluation. This paper formulates Chinese word boundary recovery as an alignment-based projection task. Given a noisy source sentence and a cleaner target counterpart, we first align the two strings at the character level and then project target-side word boundaries back onto the source. Beyond the recovery method itself, we.