Artificial intelligence ,

Automated content enrichment

FREME Services Power Vistatec Automated Content Enrichment

January 3, 2019

CSA Research

The digital universe grows by a massive amount of structured and unstructured content every day. This flood of data comes in a broad range of standardized and proprietary formats and languages for uncounted application needs around the planet. Much of this data never leaves the silo for which it was created, but business analysts and information scientists have long researched how it might be useful in other applications, markets, and languages. For example, tagging the topics or categories of documents, data, and applications could make them reusable, discoverable by other applications, and perhaps even salable.
For decades theorists have encouraged programmers to document their code both internally and in repository indexes and begged content authors to create taxonomies, structure their writings, insert tags, and harvest terminology. The problem is that most programmers and content authors – already stressed by deadlines and productivity measures fail to enhance their work with metadata (that is, information about the content) that would broaden its usefulness. Although recent advances – such as OpenCalais, which automatically enriches data with links to Thomson Reuters news articles – have simplified the task for narrow use cases, no systematic approach has emerged to support this task and the multilingual challenges in this area remain virtually unaddressed.
Automating such content enrichment process activities is the mission that the FREME project undertook in February 2015.

The project. Funded by a European Commission Horizon 2020 Innovation Action, it will determine whether six multilingual and semantic technologies are ready for use in real-life business cases: 1) the Internationalization Tag Set (ITS) for tagging information related to content internationalization and localization; 2) Linked Data based on the Natural Language Processing Interchange Format (NIF), the DBpedia Ontology, and RDF; 3) an entity processing service to recognize, link, and classify entities – such as names, places, and events – in multilingual texts; 4) a terminology service to identify, manage, and annotate terms; 5) a machine translation service; and 6) a publishing service to package and export content in the open EPUB3 format.
The goal. FREME's mission is to "build an open, innovative, commercial-grade framework of e-services for multilingual and semantic enrichment of digital content. Its e-services will be capable to process (harvest and analyze) content, capture datasets, and add value throughout content and data value chains across sectors, countries, and languages." The term the project employs is "content enrichment," but what these language technologies and linked data services allow for is the automated creation of metadata that can accompany code, data, and files wherever they go.
The partners. The eight FREME partners are: language technology firms Tilde in Latvia and Vistatec in Ireland; web analytics firm Wripl in Ireland; agricultural and food sciences specialist Agroknow in Greece; iMinds, a use case partner for the publishing industry, in Belgium; and three research institutes, ISMB in Italy and DFKI and InfAI, both in Germany.

Last week, Vistatec's CTO Phil Ritchie briefed us on the development of Ocelot, an open-source translation editor that the company is using as a deployment platform for FREME services and the backbone for its new "Deep Content" service. To start his demo, he opened a technical documentation file for translation. As soon as he did:

Ocelot immediately began executing a pipeline of instructions behind the scenes. It called a cloud-based MT server to translate the document, identified acronyms that it looked up in DBPedia, harvested terms, and presented several other metadata elements to the editor. In just a few seconds, it produced metadata that would help a content author, translator, or reviewer better understand what the file contains and how it could be used.
The Linked Data service used the URLs assigned to all of the participating components and software to define and document a relationship between them. These stored links allow the objects to communicate their attributes to others, so their persistence means that the internet becomes a vast database of semantic and business connections among those objects.
As users open more files and applications, Ocelot picks up more attributes about what they do and how they do it, thus increasing the value of the content they contain – and eliminating much of the tedious work associated with today's mostly manual content-enrichment schemes. In the translation and localization arena, both end-buyers and language service providers will benefit with relatively little effort from the intelligent content that Vistatec's innovation brings to the table. Besides that, search engines will be able to zero in with a laser-like focus on the attributes these files carry with them, thus making today's tools seem crude compared to what they can find using intelligent content.

FREME's funders expect that the project's enhanced access to public and private sector data will generate hundreds of multilingual applications and reuse billions of open data records. If the six e-services validated by this project work as well as they did in Vistatec's technology preview, we see far more opportunity in sharing and reuse than just documents and their translations. For example, it could make marketing software far more intelligent in processing the many inputs that it gets from the marketing supply and campaign management chain. It can make application code smarter and more transparent to its developers. And with its metadata available to analytics programs, it will make all this multilingual content in innumerable formats far more analyzable.
The European Commission has been funding language technology projects for nearly a decade under the Seventh Framework Project (FP7) and Horizon 2020. At conferences that we've attended and addressed, the EC has highlighted its goal of sparking innovation in the European development community. Projects like FREME and Falcon show that its money is well spent. Initiatives like these add real value and make cutting-edge research technology accessible to the public in Europe and beyond.
An earlier version of this post omitted iMinds, the eighth partner in the FREME consortium.

Artificial intelligence Automated content enrichment

Stay Informed with CSA Research

Subscribe to our newsletter for updates on the latest research, industry trends, and upcoming events.

Ready to Explore CSA Research Insights?

Access exclusive data, reports, and analysis that power smarter decisions across the global content industry.

Reliable
Comprehensive
Data-Driven
Research

Visit the platform

Meet Our Analyst

CSA Research

Recent Blogs

April 11, 2025 Peter Coleman

Powerling and OXO Merge: A Partnership towards A Global Content Service Provider (GCSP)

On April 10, 2025, Powerling and OXO (ranked #73 and #93, respectively, on CSA’s Ranking of the Largest LSPs in the World for 2024) announced their strategic me...

March 5, 2025 Peter Coleman

How Official English Language Changes US Policy

On March 1st US President Trump signed an executive order (EO) designating English as the official language of the United States. This action takes the first st...

December 5, 2024 CSA Research

Human Vs. AI Interpreting – a Real-Life Comparison

For the last 10 years, I have written hundreds of pages of research on interpreting in its various forms. I personally tested countless interpreting technologie...

October 22, 2024 Rebecca Ray

Spanish-Speakers: Informal or Formal?

Do you work for a brand that addresses Spanish-speaking prospects and customers formally during some phases of their experience with your company, while informa...

October 8, 2024 CSA Research

Unlocking the Power of Upselling and Cross-Selling

Upselling and cross-selling aren’t just buzzwords—they’re critical strategies that allow LSPs to deliver more value to their clients while driving growth. For p...

October 1, 2024 Arle Lommel

The Language Sector Slowdown: A Multifaceted Outlook

After we published our recent Q3 2024 update on market sizing for the language sector, which was also covered in a public webinar, this blog addresses some of t...

September 6, 2024 Alison Toon

The Global Enterprise Content Production Line

In today’s interconnected world, a global enterprise’s success hinges on its ability to produce, refine, and deliver content across multiple languages and cultu...

August 26, 2024 Rebecca Ray

Developers: Open Windows in Your Silo to Collaborate

Partnering with localization teams to achieve internationalization compliance on time every time means working closely together – especially as your processes a...

August 19, 2024 CSA Research

Breakfast Aisle Branding: Lessons in Differentiation

For language service providers, finding the right way to stand out in a crowded marketplace is a lot like navigating the breakfast cereal aisle at your local gr...

Artificial intelligence ,

Automated content enrichment