Loading......

Home > Blogs & Events > Blogs

Buyer budgeting ,
Intelligent content ,
Standards ,
Technology adoption

TBX:2019: A New Version of the ISO Standard Raises the Bar

July 17, 2019

Arle Lommel

blog-image

Localization industry veterans may recall when the OSCAR standards group in the now-defunct Localization Industry Standards Association introduced TermBase eXchange (TBX) way back in 2002, based on earlier work from 1999. Released in the early days of XML, it promised to be a major step forward for making terminological data useful. After it was adopted as an international standard (ISO 30042) in 2008, it seemed that it had reached maturity and a firm place as a star among language industry standards. However, TBX never quite lived up to its potential. A new version, released this year, could rehabilitate its position and prepare it for the next generation of content applications.

Translation tools vendors claimed to support TBX, but they never quite managed to interoperate properly with competing and complementary terminology tools. As a result, many – if not most – LSPs and translators continued to exchange terminology via spreadsheets or CSV files, even though these mechanisms have serious problems, such as inconsistent format and encoding and a lack of vital metadata. Even though TBX represented a good solution to such difficulties, users preferred the apparent simplicity of a spreadsheet.
 

TBX Grows Up

The situation has recently changed. Over the past few years TBX underwent a major overhaul to address its limitations and prepare it to meet new goals. A steering committee in ISO Technical Committee 37 – comprised of representatives from ASTM, CSA Research, FH Köln, LTAC Global, Universidad de Las Palmas de Gran Canaria, Kent State University, and the XLIFF committee – recently completed the 2019 version of ISO 30042 that ISO published. This new edition streamlines the format and addresses many of the complaints about and limitations of the 2008 version. Some of the major changes are:

  • Updated XML syntax. The earlier version adopted a syntax where data categories appeared as attributes in the XML code. Since it appeared, XML best practice has shifted to using tag names for this purpose. As a result, TBX now supports two “styles” of XML: the original DCA (Data Categories as Attributes) and a newer DCT (Data Categories as Tag names). In the long term, practice may evolve to DCT exclusively and the current format provides a migration path for existing TBX implementations. Other changes that apply to both styles are designed to make it easier to parse and work with TBX files and to use terminological data with XLIFF.
  • Dialects simplify adoption. Perhaps the biggest impediment in the past has been that TBX does not define a single format for terminological data, but instead a way to represent the different formats various termbases use. As a result, many different data sets with different models have proven to be incompatible for interchange purposes. The newer version defines several “dialects” of TBX intended for common use and data interchange. The availability of standard dialects will remove a lot of guesswork and provide specific implementation targets for tool developers. In addition, the official dialects – TBX-Core, TBX-Min, and TBX-Basic – “telescope” into each other: Each one is a progressive superset of the preceding one, which facilitates interoperability between them. The standard also provides approaches for handling customized data categories and for developing custom dialect extensions.
     
  • Required dialect names. The 2008 version was problematic because implementers often ignored the requirement to declare what variant of TBX they were using in a separate file attached to every document: As a result, when someone received a file, there often was no way of knowing what data categories it would contain. The new version makes this declaration mandatory by using a dialect name rather than a separate file so that implementers know what to expect from a given TBX document. No longer will someone receive a “TBX file” with no guidance concerning which data categories it implements. Creators of customized extensions to dialects are required to post formal dialect definitions as links (using XML namespace for the DCT style) where users can find the information they need to ensure reliable interchange scenarios.

In order to simplify implementation, the TBX Steering Committee set up TBXInfo.net with guidance, tools, and resources for implementers. This site helps ensure that materials needed to work with TBX are open to the public and freely available. By contrast, the standard itself – which carries a price of CHF158 (~US$160) – has been streamlined and shortened to reduce cost. In most cases, only developers will need to purchase ISO 30042 because other interested parties will find answers to other questions at the TBXinfo.net site.
 

TBX Plays a Vital Role in the Intelligent Content World

Why does this matter to language service providers and enterprise content creators? The most common type of translation error is failure to comply with terminology. Although TBX cannot resolve every problem, it does provide a standards-based approach to exchanging data about terms and implementing best practices for terminology management. Managing and controlling terminology is also a key requirement for creating intelligent content and translating it. Terminology management is thus set to become more important in the language industry, especially as TBX guides processes past spreadsheets to automated workflows and deployment of terminological resources.

The changes to TBX have modernized it and prepared it for the next generation of content applications. The new version resolves many of the challenges that implementers of the previous version faced and sets TBX up to fill a vital role in the language industry and intelligent content applications.

Stay Informed with CSA Research

Subscribe to our newsletter for updates on the latest research, industry trends, and upcoming events.

Subscribe

Ready to Explore CSA Research Insights?

Access exclusive data, reports, and analysis that power smarter decisions across the global content industry.

Reliable
Comprehensive
Data-Driven
Research
Visit the platform

Meet Our Analyst

writer_profile_image

Arle Lommel

VP Of Research

After obtaining a BA in linguistics in 1997, I began working for the now-defunct Localization Industry Standards Association (LISA), where I headed up standards development and worked on quality assessment models. At the same time, I completed a PhD in ethnographic research at Indiana University in 2011. In 2012 I began work for the German Research Center for Artificial Intelligence (DFKI) in Berlin, Germany, where I headed up development of the Multidimensional Quality Metrics (MQM) system for quality evaluation and worked on various EU and German government-funded projects. In 2015 I returned to the United States and began working for CSA Research in January 2016. In my life I have lived in Alaska, Utah, Indiana, Hungary, and Germany. I speak English, Hungarian, and German, as well as bits and pieces of many other languages.

Connect with Arle Lommel

Recent Blogs

Turn Research Into Action

Our consulting team helps you apply CSA Research insights to your organization’s
specific challenges, from growth strategy to operational excellence.