In this thesis, we present the development and evaluation of a suite of annotation tools for unrestricted Irish text, which go from tokenization, morphological analysis, part-of-speech tagging, right through to partial parsing. In order to develop such tools, a large body of texts is required for testing purposes. We, therefore, begin by describing our involvement in the creation of a 30 million word corpus of Irish texts (New Corpus for Ireland). From this corpus,
we randomly extracted 3,000 sentences which we annotated and manually corrected in order to create a Gold Standard Corpus for evaluation purposes. We then present the annotation tools. Firstly, we describe scaling a proof-of-concept implementation of finite-state tokenization and morphological analysis based on Xerox Finite State Tools (Uí Dhonnchadha, 2002, p146), to unrestricted text. After semi-automatic population of the finite-state morphology (FSM) lexical resources, the morphological analyser
contains a lexicon of ...