The Classical Language Toolkit is happy to present our Google Summer of Code projects for 2018:
- Eleftheria Chatziargyriou (Aristotle University of Thessaloniki): Extending NLP functionality for Germanic Languages.
- Andrew Deloucas (Harvard University): The Road to CDLI’s Corpora Integration into CLTK: An Undertaking
- James Gawley (University of Buffalo): Expanding the CLTK with Synonyms, Translations and Word Embeddings
As we see it, this year’s projects each have the potential to move the CLTK forward in its own way.
Eleftheria’s project takes on one of the CLTK’s primary development paths, extending the core tools to a new language and/or language group. Her project will be the first substantial introduction of resources for Germanic languages in the project, a welcome addition.
Andrew’s project moves in a similar direction, but with a different focus. The incorporation of cuneiform languages in the CLTK is a clear desideratum. This project not only needs to extend and develop existing corpora, resources, and tools but needs to rethink how these tools work at a basic level with consideration of encoding and multiple transliteration schemes, among other things. A challenge to be sure, but an opportunity to open up future development for other pictographic/iconographic text systems.
James’s project capitalizes on an emerging priority among CLTK developers and contributors – the use of the project as the basis for cutting-edge research in natural language processing for historical languages. He will develop the CLTK’s ML models and datasets for the discovery of intra-language synonyms and inter-language translations. Another clear benefit of James’s work will be that by formalizing model production for Latin and Greek, he will also provide a path forward for the development of similar resources for other CLTK languages.
Congratulations again to this year’s CLTK Summer of Code participants. To the participants: the organizers and mentors can’t wait to start working with you on taking your projects from proposal to working code to commits to the codebase. To our development community: look forward to upcoming blog posts from each of the participants with more detailed introductions. And to everyone whose project was not selected this year, we want to acknowledge your hard work on the proposals and encourage you to try again next year. The simple fact is that there are more great ideas for NLP work on historical languages than GSoC spots. We look forward to seeing how your work develops going forward and encourage you to submit again next year.