EuroGOV: Engineering a Multilingual Web Corpus ?>

EuroGOV: Engineering a Multilingual Web Corpus

Börkur Sigurbjörnsson, Jaap Kamps, and Maarten de Rijke.

In: Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005. Lecture Notes in Computer Science. 2006.

Link: springerlink

Abstract

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian governmental web sites. The corpus contains over 3 million documents written in more than 20 different European languages. In this paper we provide a detailed description of the EuroGOV collection.

Bibtex

@inproceedings{sigurbjornsson2006eurogov, author  = {B"orkur Sigurbj"ornsson and Jaap Kamps and Maarten de        Rijke}, title   = {EuroGOV: Engineering a Multilingual Web Corpus}, editor  = {C. Peters and F.C. Gey and J. Gonzalo and G.J.F. Jones        and M. Kluck and B. Magnini and H. Müller and M. de Rijke }, booktitle = {Accessing Multilingual Information Repositories:         6th Workshop of the Cross-Language Evaluation Forum,         CLEF 2005}, series  = {Lecture Notes in Computer Science},  volume  = {4022}, pages   = {825--836}, publisher = {Springer-Verlag}, year   = {2006}, doi    = {http://dx.doi.org/10.1007/11878773_90},}
Comments are closed.