Home > Blogs & Events > Blogs
January 3, 2019
The digital universe grows by a massive amount of structured and unstructured content every day. This flood of data comes in a broad range of standardized and proprietary formats and languages for uncounted application needs around the planet. Much of this data never leaves the silo for which it was created, but business analysts and information scientists have long researched how it might be useful in other applications, markets, and languages. For example, tagging the topics or categories of documents, data, and applications could make them reusable, discoverable by other applications, and perhaps even salable.
For decades theorists have encouraged programmers to document their code both internally and in repository indexes and begged content authors to create taxonomies, structure their writings, insert tags, and harvest terminology. The problem is that most programmers and content authors – already stressed by deadlines and productivity measures fail to enhance their work with metadata (that is, information about the content) that would broaden its usefulness. Although recent advances – such as OpenCalais, which automatically enriches data with links to Thomson Reuters news articles – have simplified the task for narrow use cases, no systematic approach has emerged to support this task and the multilingual challenges in this area remain virtually unaddressed.
Automating such content enrichment process activities is the mission that the FREME project undertook in February 2015.
Last week, Vistatec's CTO Phil Ritchie briefed us on the development of Ocelot, an open-source translation editor that the company is using as a deployment platform for FREME services and the backbone for its new "Deep Content" service. To start his demo, he opened a technical documentation file for translation. As soon as he did:
FREME's funders expect that the project's enhanced access to public and private sector data will generate hundreds of multilingual applications and reuse billions of open data records. If the six e-services validated by this project work as well as they did in Vistatec's technology preview, we see far more opportunity in sharing and reuse than just documents and their translations. For example, it could make marketing software far more intelligent in processing the many inputs that it gets from the marketing supply and campaign management chain. It can make application code smarter and more transparent to its developers. And with its metadata available to analytics programs, it will make all this multilingual content in innumerable formats far more analyzable.
The European Commission has been funding language technology projects for nearly a decade under the Seventh Framework Project (FP7) and Horizon 2020. At conferences that we've attended and addressed, the EC has highlighted its goal of sparking innovation in the European development community. Projects like FREME and Falcon show that its money is well spent. Initiatives like these add real value and make cutting-edge research technology accessible to the public in Europe and beyond.
An earlier version of this post omitted iMinds, the eighth partner in the FREME consortium.
Subscribe to our newsletter for updates on the latest research, industry trends, and upcoming events.
SubscribeOur consulting team helps you apply CSA Research insights to your organization’s
specific challenges, from growth strategy to operational excellence.