Institutions
|
About Us
|
Help
|
Gaeilge
0
1000
Home
Browse
Advanced Search
Search History
Marked List
Statistics
A
A
A
Author(s)
Institution
Publication types
Funder
Year
Limited By:
Subject = Corpus;
20 items found
Sort by
Title
Author
Item type
Date
Institution
Peer review status
Language
Order
Ascending
Descending
25
50
100
per page
Bibtex
CSV
EndNote
RefWorks
RIS
XML
Displaying Results 1 - 20 of 20 on page 1 of 1
Marked
Mark
A corpus of Irish English – Past, present, future
(1999)
O'Keeffe, Anne; Barker, Gosia
A corpus of Irish English – Past, present, future
(1999)
O'Keeffe, Anne; Barker, Gosia
http://hdl.handle.net/10395/1797
Marked
Mark
Building a corpus to represent a variety of a language (Pre-published version)
(2010)
Clancy, Brian
Building a corpus to represent a variety of a language (Pre-published version)
(2010)
Clancy, Brian
Abstract:
Building a corpus to represent a variety of a language.
http://hdl.handle.net/10395/2722
Marked
Mark
CoFiF: A corpus of financial reports in French language
(2019)
Ahmadi, Sina; Daudert, Tobias
CoFiF: A corpus of financial reports in French language
(2019)
Ahmadi, Sina; Daudert, Tobias
Abstract:
In an era when machine learning and artificial intelligence have huge momentum, the data demand to train and test models is steadily growing. We introduce CoFiF, the first corpus comprising company reports in the French language. It contains over 188 million tokens in 2655 reports, covering reference documents, annual, semestrial and trimestrial reports. Our main focus is on the 60 largest French companies listed in France s main stock indices CAC40 and CAC Next 20. The corpus spans over 20 years, ranging from 1995 to 2018. To evaluate this novel collection of organizational writing, we use CoFiF to generate two character-level language models, a forward and a backward one, which we use to demonstrate the corpus potential on business, economics, and management research in the French language.
2019-08-10
http://hdl.handle.net/10379/15276
Marked
Mark
Corpus analysis (Pre-published version)
(2015)
O'Keeffe, Anne; Vaughan, Elaine Claire
Corpus analysis (Pre-published version)
(2015)
O'Keeffe, Anne; Vaughan, Elaine Claire
Abstract:
Corpus analysis.
http://hdl.handle.net/10395/2668
Marked
Mark
Corpus creation for sentiment analysis in code-mixed Tamil-English text
(2020)
Chakravarthi, Bharathi Raja; Muralidaran, Vigneshwaran; Priyadharshini, Ruba; McCrae, J...
Corpus creation for sentiment analysis in code-mixed Tamil-English text
(2020)
Chakravarthi, Bharathi Raja; Muralidaran, Vigneshwaran; Priyadharshini, Ruba; McCrae, John P.
Abstract:
Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark
This publication has emanated from research supporte...
http://hdl.handle.net/10379/16102
Marked
Mark
Corpus-based function-to-form approaches (Pre-published version)
(2018)
O'Keeffe, Anne
Corpus-based function-to-form approaches (Pre-published version)
(2018)
O'Keeffe, Anne
Abstract:
Corpus-based function-to-form approaches.
http://hdl.handle.net/10395/2666
Marked
Mark
Emotional Speech Corpus Construction, Annotation and Distribution
(2008)
Vaughan, Brian; Cullen, Charlie; Kousidis, Spyros; McAuley, John
Emotional Speech Corpus Construction, Annotation and Distribution
(2008)
Vaughan, Brian; Cullen, Charlie; Kousidis, Spyros; McAuley, John
Abstract:
This paper details a process of creating an emotional speech corpus by collecting natural emotional speech assets, analysisng and tagging them (for certain acoustic and linguistic features) and annotating them within an on-line database. The definition of specific metadata for use with an emotional speech corpus is crucial, in that poorly (or inaccurately) annotated assets are of little use in analysis. This problem is compounded by the lack of standardisation for speech corpora, particularly in relation to emotion content. The ISLE Metadata Initiative (IMDI) is the only cohesive attempt at corpus metadata standardisation performed thus far. Although not a comprehensive (or universally adopted) standard, IMDI represents the only current standard for speech corpus metadata available. The adoption of the IMDI standard allows the corpus to be re-used and expanded, in a clear and structured manner, ensuring its re-usability and usefulness as well as addressing issues of data-sparsitiy w...
https://arrow.dit.ie/dmccon/92
Marked
Mark
Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction
(2006)
Cullen, Charlie; Vaughan, Brian; Kousidis, Spyros; Wang, Yi; McDonnell, Ciaran; Campbel...
Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction
(2006)
Cullen, Charlie; Vaughan, Brian; Kousidis, Spyros; Wang, Yi; McDonnell, Ciaran; Campbell, Dermot
Abstract:
Detecting emotional dimensions [1] in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis [2]. In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted [3] and natural [4] speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s) [5] are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task [6] within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Kh...
https://arrow.dit.ie/dmccon/90
Marked
Mark
He's after getting up a load of wind: a corpus-based exploration of be +after + V-ing constructions in spoken and written corpora (Pre-published version)
(2018)
O'Keeffe, Anne; Amador-Moreno, Carolina P.
He's after getting up a load of wind: a corpus-based exploration of be +after + V-ing constructions in spoken and written corpora (Pre-published version)
(2018)
O'Keeffe, Anne; Amador-Moreno, Carolina P.
Abstract:
He's after getting up a load of wind: a corpus-based exploration of be +after + V-ing constructions in spoken and written corpora.
http://hdl.handle.net/10395/2687
Marked
Mark
Identifying Topic Shift and Topic Shading in Switchboard
(2018)
Spillane, Brendan; Wade, Vincent; Gilmartin, Emer; Saam, Christian; Clark, Leigh; Cowan...
Identifying Topic Shift and Topic Shading in Switchboard
(2018)
Spillane, Brendan; Wade, Vincent; Gilmartin, Emer; Saam, Christian; Clark, Leigh; Cowan, Benjamin R.
Abstract:
This paper highlights some of the ongoing work on the ADELE project, namely the identification and annotation of topic shift and topic shading in the Switchboard-1 Release-2 corpus. The purpose of this is to train an Artificial Neural Network to create a digital companion for the elderly that can communicate through informal,yet informed social dialogue, on a variety of topics of interest to a user over a prolonged time scale. To this end the project is focussing on topic shift and shading, the mechanisms which underpin the development of such conversations [6,8]. In the past, dialogue systems have predominantly focussed on practical tasks due to the complexity of modelling realistic everyday social talk [1]. With increasing awareness of the need for home robots and virtual home care agents to help assist in the provision of care for a rapidly ageing population, it is necessary to develop a more caring, involved, and personalised virtual care agent capable of such social dialogue.
http://hdl.handle.net/2262/92479
Marked
Mark
Introduction: International Journal of Corpus Linguistics (Pre-published version)
(2011)
O'Keeffe, Anne; Farr, Fiona
Introduction: International Journal of Corpus Linguistics (Pre-published version)
(2011)
O'Keeffe, Anne; Farr, Fiona
Abstract:
Introduction
http://hdl.handle.net/10395/2827
Marked
Mark
Introduction: Teanga (Pre-published version)
(2004)
O'Keeffe, Anne; Farr, Fiona
Introduction: Teanga (Pre-published version)
(2004)
O'Keeffe, Anne; Farr, Fiona
Abstract:
Introduction to 'Teanga' volume 21, pages 2-5.
http://hdl.handle.net/10395/2826
Marked
Mark
Like the wise virgins and all that jazz’ – using a corpus to examine vague language and shared knowledge
(2004)
O'Keeffe, Anne
Like the wise virgins and all that jazz’ – using a corpus to examine vague language and shared knowledge
(2004)
O'Keeffe, Anne
http://hdl.handle.net/10395/1683
Marked
Mark
Prototype Speech Corpus
(2007)
Campbell, Dermot; Wang, Yi; McDonnell, Ciaran
Prototype Speech Corpus
(2007)
Campbell, Dermot; Wang, Yi; McDonnell, Ciaran
Abstract:
DIT’s prototype speech corpus allows language learners and researchers access to real, informal dialogues–not just the transcripts of dialogues. Because the dialogues were created at a very high acoustic level they are capable of being slowed down using a time-scaling tool for more detailed study of speech production. The recording methodology used produces a natural dialogue exhibiting all the features of native-to-native interchanges. The speech corpus is capable of serving the needs of students and researchers of spoken language. LinguaTag is the first prototype tool within the language work package in the FP6 EU project SALERO to automatically tag speech for integration with lip synchronisation in the field of animation. It can also be used as a tagging tool for other, linguistic features such as speed of delivery and formulaicity. Erman and Warren (2000) have calculated that formulaic sequences constitute 58.6% of the spoken English discourse they analysed. Initial studies with...
https://arrow.dit.ie/dmccon/38
Marked
Mark
The Limerick corpus of Irish English: design, description and application
(2004)
O'Keeffe, Anne; Farr, Fiona; Murphy, Brona
The Limerick corpus of Irish English: design, description and application
(2004)
O'Keeffe, Anne; Farr, Fiona; Murphy, Brona
http://hdl.handle.net/10395/1696
Marked
Mark
The Limerick Corpus of Irish English: design, description and application
(2004)
Farr, Fiona; Murphy, Brona; O'Keeffe, Anne
The Limerick Corpus of Irish English: design, description and application
(2004)
Farr, Fiona; Murphy, Brona; O'Keeffe, Anne
Abstract:
This paper describes an on-going corpus development and application project at the Mary Immaculate College and the University of Limerick, Ireland. The Limerick Corpus of Irish English is a one-million word corpus of English as it is spoken in Ireland. The corpus is genre-based and consists primarily of casual conversational data. Details of the corpus design, development and applications, both research and pedagogic, are described. An illustrative example of the linguistic phenomenon of HEDGING is explored and some classroom activities based on the findings are developed.
http://hdl.handle.net/10344/4712
Marked
Mark
The SSIX corpora: three gold standard corpora for sentiment analysis in English, Spanish and German financial microblogs
(2019)
Gaillat, Thomas; Zarrouk, Manel; Freitas, André; Davis, Brian
The SSIX corpora: three gold standard corpora for sentiment analysis in English, Spanish and German financial microblogs
(2019)
Gaillat, Thomas; Zarrouk, Manel; Freitas, André; Davis, Brian
Abstract:
This paper introduces the three SSIX corpora for sentiment analysis. These corpora address the need to provide annotated data for supervised learning methods. They focus on stock-market related messages extracted from two financial microblog platforms, i.e., StockTwits and Twitter. In total they include 2,886 messages with opinion targets. These messages are provided with polarity annotation set on a continuous scale by three or four experts in each language. The annotation information identifies the targets with a sentiment score. The annotation process includes manual annotation verified and consolidated by financial experts. The creation of the annotated corpora took into account principled sampling strategies as well as inter-annotator agreement before consolidation in order to maximize data quality.
We would like to thank all the people involved in the creation of the Gold Standard. This work is funded by the SSIX Horizon 2020 project (Grant agreement No 645425)
http://hdl.handle.net/10379/14950
Marked
Mark
Using a corpus to enhance pragmatic awareness (Pre-published version)
(2012)
Clancy, Brian; O'Keeffe, Anne
Using a corpus to enhance pragmatic awareness (Pre-published version)
(2012)
Clancy, Brian; O'Keeffe, Anne
Abstract:
Using a corpus to enhance pragmatic awareness.
http://hdl.handle.net/10395/2720
Marked
Mark
Using a corpus to look at variational pragmatics: response tokens in British and Irish discourse
(2008)
O'Keeffe, Anne; Adolphs, Svenja
Using a corpus to look at variational pragmatics: response tokens in British and Irish discourse
(2008)
O'Keeffe, Anne; Adolphs, Svenja
http://hdl.handle.net/10395/1796
Marked
Mark
We can check it in the corpus shur: Framing the use of corpus and corpus methodologies through an investigation of the pragmatic marker shur in Irish English
(2013)
Clancy, Brian; Vaughan, Elaine Claire
We can check it in the corpus shur: Framing the use of corpus and corpus methodologies through an investigation of the pragmatic marker shur in Irish English
(2013)
Clancy, Brian; Vaughan, Elaine Claire
Abstract:
We can check it in the corpus shur: Framing the use of corpus and corpus methodologies through an investigation of the pragmatic marker shur in Irish English.
http://hdl.handle.net/10395/2798
Displaying Results 1 - 20 of 20 on page 1 of 1
Bibtex
CSV
EndNote
RefWorks
RIS
XML
Institution
Dublin Institute of Technology (3)
Mary Immaculate College (12)
NUI Galway (3)
Trinity College Dublin (1)
University of Limerick (1)
Item Type
Book chapter (6)
Conference item (8)
Journal article (6)
Peer Review Status
Peer-reviewed (14)
Non-peer-reviewed (3)
Unknown (3)
Year
2020 (1)
2019 (2)
2018 (3)
2015 (1)
2013 (1)
2012 (1)
2011 (1)
2010 (1)
2008 (2)
2007 (1)
2006 (1)
2004 (4)
1999 (1)
built by Enovation Solutions