Translated by Luca De Filippis
The last weekend, in Madrid, Casa del Lector was the best scenario to celebrate Lenguando: the first national meeting on language and technology. The pioneering initiative, brought successfully to reality by our colleagues at Molino de Ideas, Cálamo & Cran and Xosé Castro, was driven, among other sponsors, by Daedalus‘s Stilus.
The spirit of the conference was to bring together in the same space translators, proofreaders, philologists and other communication and language professionals, with an emphasis on the technological revolution of the sector, among other issues.
The talks about the advances in language technology and the simultaneous workshops on their practical application were the most anticipated. In particular, the workshop given in the main auditorium by Concepción Polo (who’s writing this post) on behalf of the team of Stilus was one of the most anticipated by the attendees, according to the organization.
Corpus Linguistics applied to proofreading
With the intention of presenting innovative content and above all practical, in the workshop we considered the possible applications of Corpus Linguistics (CL) in the specific area of professional automatic proofreading.
The first aspect that aroused the interest was the disclosure (for many) of the new features of lemmatized and morphological search finally offered by the academic corpora Nuevo Diccionario Histórico (CDH) and Corpus del Español del Siglo XXI (CORPES XXI). Another key content was the brief comparison between the capabilities of these new corpora of the Spanish Royal Academy and those of the less known, although magnificent and veteran Corpus del Español by Mark Davies.
After presenting the theory, some reflections followed: how and for what purpose a professional can apply Corpus Linguistics in decision-making process of proofreading and, also, how to automatize proofreading patterns with Word macros, for example.
In the last part of the workshop we explained how an intelligent automatic proofreader is able to address contextual issues that remain outside the autonomous user’s reach. It was time to examine and understand the pseudo C++ code on which Stilus’ linguistic rules are based. The surprise among the participants without experience in Natural Language Processing laid both in the potential of this technology and in the mere fact of being able to interpret C rules that handled formal, morphological, syntagmatic and even semantic elements.
Presentation of Stilus Macro
Indeed, the availability of tagged corpora allows carrying out empirical research on syntactic and lexical phenomena of a language on an unimaginable scale, and its application to computational linguistics is highly beneficial. Still, the examination of corpora shows that there are thousands of incorrect sequences of words that can be detected without needing morphosyntactic support, and this is precisely the purpose of Stilus Macro: an add-on —still in development— we presented at the end of the workshop, which is capable of running with high speed more than 230.000 context-independent patterns for spell, grammar and style proofreading with Word; a task essentially simple, but unfeasible from a human point of view.
For more information, access the full presentation.