C-structures and f-structures for the British national corpus |
Wagner, Joachim; Seddah, Djamé; Foster, Jennifer; van Genabith, Josef
|
|
|
We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based
parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate
these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing
architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an
automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.
|
Keyword(s):
|
Machine translating; lexical functional grammar |
Publication Date:
|
2007 |
Type:
|
Conference item |
Peer-Reviewed:
|
Yes |
Language(s):
|
English |
Institution:
|
Dublin City University |
Funder(s):
|
Irish Research Council for Science Engineering and Technology; Science Foundation Ireland |
Citation(s):
|
Wagner, Joachim and Seddah, Djamé and Foster, Jennifer and van Genabith, Josef (2007) C-structures and f-structures for the British national corpus. In: Lexical Functional Grammar 2007, 28-30 July 2007, California, USA. |
Publisher(s):
|
CSLI Publications |
File Format(s):
|
application/pdf |
Related Link(s):
|
http://doras.dcu.ie/15205/1/jwagner_et_al_07.pdf |
First Indexed:
2010-02-18 05:05:06 Last Updated:
2015-03-23 05:23:02 |