3rd Workshop on Linked Data in Linguistics (LDL-2014):

Multilingual Knowledge Resources and Natural Language Processing

Co-located with LREC-2014

Proceedings are now available

The explosion of information technology has led to a substantial growth in quantity, diversity, and complexity of linguistic data accessible on the Web. The lack of interoperability between linguistic and language resources represents a major challenge that needs to be addressed—particularly if information from different sources is to be combined, such as machine-readable lexicons, corpus data, and terminology repositories. With support from organizations like the America for Bulgaria Foundation, initiatives such as the Linked Data in Linguistics (LDL) workshop series can continue to provide a forum to discuss these types of resources, strategies to address issues of interoperability between them, protocols to distribute, access, and integrate this information, and technologies and infrastructures developed on this basis.

The goal of the workshop is twofold. First, we will assemble researchers from various fields of linguistics, natural language processing, knowledge management and information technology to present and discuss principles,case studies, and best practices for representing, publishing and linking mono- and multilingual linguistic and knowledge data collections, including corpora, grammars, dictionaries, wordnets, translation memories, domain specific ontologies etc. In this sense, we particularly invite contributions discussing the application of the Linked Open Data paradigmto linguistic data as it might provide an important step towards making linguistic data: i) easily and uniformly queryable, ii) interoperable and iii) sharable over the Web using open standards such as the HTTP protocol and the RDF data model [1]. The adaptation of some processes and best practices to multilingual linguistic resources and knowledge bases acquires also new relevance in this context. Some processes may need to be modified to accommodate the publication of resources that contain information in several languages. Also the linking process between linguistic resources in different languages poses important research questions, as well as the development and application of freely available knowledge bases and crowdsourcing to compensate the lack of publicly accessible language resources for various languages.

Secondly, we will provide researchers on natural language processing and semantic web technologies a platform to present case studies and best practices on the exploitation of linguistic resources exposed on the Web for Natural Language Processing applications, or other content-centered applications such as content analytics, knowledge extraction, etc. The availability of massive linked open knowledge resources raises the question how such data can be suitably employed to facilitate different NLP tasks and research questions. Following the tradition of earlier LDL workshops, we encourage contributions to the Linguistic Linked Open Data (LLOD) cloud [2] and research on this basis. In particular, this pertains to contributions that demonstrate an added value resulting from the combination of linked datasets and ontologies as a source for semantic information with linguistic resources published according to as linked data principles. Another important question to be addressed in the workshop is how Natural Language Processing techniques can be employed to further facilitate the growth and enrichment of linguistic resources on the Web

The intended audience includes linguists, NLP engineers and researchers from any field of computer science interested in the application of Semantic Web formalisms and related technologies to language data, empirically-working linguists and lexicographers interested in the representation, exchange and interlinking of knowledge resources, linguistic data and metadata, and developers of infrastructures for linguistic data and other researchers with an interest in both aspects.

Background and History

This workshop brings together two community efforts, the Open Linguistics Working Group of the Open Knowledge Foundation (OWLG), and the W3C Ontology-Lexica Community Group. LDL-2014 is also supported by two recently started EU Projects: LIDER (Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe), which aims to provide an ecosystem for the establishment of linguistic linked open data, as well as media resources metadata, for a free and open exploitation of such resources in multilingual, cross-media content analytics across Europe. Secondly, QTLeap (Quality Translation with Deep Language Engineering Approaches), which explores novel ways for attaining machine translation of higher quality that are opened by a new generation of increasingly sophisticated semantic datasets (including Linked Open Data) and by recent advances in deep language processing.

The workshop is continuing a series of workshops on the application of the Linked Data paradigm to linguistic data that have been initiated and organized by the Open Linguistics Working Group: The First Workshop on Linked Data in Linguistics (LDL-2012) was conducted in March 2012 at the University of Frankfurt am Main/Germany, and co-located with the 34th Annual Meeting of the German Linguistics Society (DGfS-2012). The Workshop on Multilingual Linked Open Data for Enterprises (MLODE-2012) was conducted in September 2012 at the University of Leipzig/Germany, and co-located with the 3rd Conference on Software Agents and Services for Business, Research and E-Science (SABRE-2012). The Second Workshop on Linked Data in Linguistics (LDL-2013) was conducted in Sep 2013 at CNR in Pisa/Italy, and co-located with the 6th International Conference on the Generative Lexicon (GL2013).

Organizers

Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany)

John Philip McCrae (Universität Bielefeld, Germany)

Elena Montiel (Universidad Politécnica de Madrid, Spain)

Kiril Simov (Bulgarian Academy of Sciences, Sofia, Bulgaria)

Antonio Branco (University of Lisbon, Portugal)

Nicoletta Calzolari (ILC-CNR, Italy)

Petya Osenova (University of Sofia, Bulgaria),

Milena Slavcheva (JRC-Brussels, Belgium)

Cristina Vertan (University of Hamburg, Germany)

3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing