Wednesday, May 13, 2009

Is XML easier to Translate?


I've always been a proponent of XML content. I think it's faster to get things translated, and with a CMS it alleviates unnecessary emails and creates a translation work flow. However, there are a couple of fundamental problems in XML construction that you should address before setting up a translation work flow. One common problem is using CDATA sections.

Problem:
It's a very bad idea to use CDATA sections to separate
database content from database structure if you want to translate this content. CDATA sections make it impossible for third party tools to parse/separate the text and tags/elements within the CDATA sections. CDATA sections are designed this way because they are explicitly used to hold text and tags to prevent parsing. This makes it almost impossible for third party tools like Translation Memory Systems (TRADOS/SDLX) to parse the text and tags within the CDATA sections. TRADOS/SDLX exposes the CDATA content for translation without further parsing. Therefore, you get all the tags as "real text".

Solution:
You should considers using name spaces instead by placing the database structure, XML into it's own name space and have the database content in a second name space (or have it without a name space). SDL TRADOS Snippet might be a more or less usable workaround here, but it limits the tags you can "hide", and text within CDATA sections that are not closed will be hidden as well.

I hope this helps those of you out there who use XML as a content editor, or are planning to send XML files for translation. It's always a good idea to communicate with you translation vendor, or potential translation vendor when they are bidding on your project. Work together on finding ways to solve potential structural problems within the XML file. You certainly don't want unnecessary content translated, and this will allow you to preempt any exporting problems you might face with deleted or missing tags when you want to publish your translated content.

2 comments:

Madeline Ann Clayton said...

Lionel - telling it like it is! Kudos.

Iwan Davies said...

Agree with all of that. When it comes to handling tags or formatting marks in CDATA sections containing translatable text in the past, I've had to create filters in Okapi Rainbow - I can remember one particularly painful instance of having to deal with RTF code in one particular set of XML documents... Ouch!