Instruction introduction
One of the challenges with software localization is the sheer variety of file formats. There may be a lot of instructions on how to translate all these different software files types. However, the dynamic nature of software development means that these detailed instructions are often out of date - obsolete because of the sheer speed of the market. Software companies and their language services partners often waste time following these out-of-date instructions.
The ideal, therefore, is to write your software in such a way as to eliminate the need for localization and translation instructions altogether. A savvy person might ask "How much cheaper will my software localization project be if I spend money on creating the software so that instructions are not needed?" Unfortunately the answer is likely "Not very much and probably not at all."
So why spend money for no apparent gain? There are three reasons:
1) better quality
2) faster turnaround
3) money saved on your own engineering resources
This article will look at a couple of areas where software itself can be improved to streamline software localization.
Turn down the volume
First, look at the volume of instructions. We often receive huge amounts of instructions on what to translate in RC and Java properties files. These formats are well understood and should need no instructions, but we are duty bound to read your instructions nonetheless. If it does not need explaining, then don't explain it. Since problems arise when you deviate from standard file formats, do explain any deviations you induce. As a general rule it is good to stick with non-adulterated, well understood file formats, as these require no special instructions.
A common and painful "solution" to the problem of instructions is to extract all translatable text from the source files and paste it into an Excel spreadsheet. It is tempting to think that extracting software text and pasting it into Excel "helps" the translator; however the opposite is actually true. By eliminating non-translatable code, the amount of context the translator has is reduced. An alphabetically sorted list gives the translator no chance whatsoever to build up a mental picture of what the software might look like once it is running. Additionally, since Excel is one of the worst file formats for manipulating text, this approach costs you a lot of money.
From a technical perspective, translation tools do not process Excel files very well and we often end up copying the translatable text into a text file to get around this problem. Once the text is translated, we then have to paste the translation back into Excel. This manual process racks up billable hours, adding cost and delays to your localization project, as well as possibly introducing errors. Let Excel work with numbers and not text localization.
An obvious aside: CSV files are even more problematic than XLS/XLSX files, despite CSV files being plain text files. The problem with CSV files is that they need to be used on the native operating systems, inducing all manner of character conversion when saved out of Excel. Even when Excel is not used there are frequently problems with phrases containing commas induced by CSV converters. All in all, CSV is worse than Excel, which is about as bad as it gets.
Your text is "special"
The final point I would like to mention in this article is the use of special characters, in particular the use of single (') and double (") quotation marks in Java properties files. During localization, these creatures cause considerable concerns, so it is a good idea to design away the problem from the start.
Quotation problems arise because single and double quotation marks are used in translations even where there are none in the (usually English) source. In German, for example, double quotation marks are used to indicate software options, whilst the single quotation mark, also known as apostrophe, is used extensively in both French and Italian (In an English manual you may encounter something along the lines of After you have made your selection press the Next button. In English we use capitalization or bold to indicate software options. The equivalent phrase in German would read: After you have made your selection press the "Next" button.) Some software companies disallow translators to insert double quotation marks where none appear in the source. However, this leads to a stilted translation in German. Yet things get more difficult when you take the same approach with single quotation marks, as these are essential in French and Italian. Leaving them out means the translations will be incorrect.
From a design point of view it is therefore preferable to enclose strings in double quotation marks rather than single quotation marks. Doing this allows French and Italian translators to add apostrophes (single quotation marks) as and when they are needed, without breaking the code.
So why not just ask the translators to escape the quotation marks when needed?
Programmers, that's why. Developers often do not know which escape sequence to use in the bit of code that translators later convert. It therefore becomes a question of trial and error in deciding which escape sequence to use. We see many different examples of escaping a single quotation mark (e.g. '', \', \\' and \\\'). However, developers can be an inconsistent bunch, and between any two developers, there may not be a firm standard. Often escape sequences differ from file to file and in some cases even from string to string. Finding out which sequence to use is expensive both for the localizers and for your engineers who have to check in code multiple times until the correct escape sequences are found. The trial-and-error method prolongs the localization cycle considerably and racks up your costs.
Another reason why using escape sequences for apostrophes is not a good idea is that spell checkers do not always know how to handle them. Therefore, as a general rule, do not use single quotation marks to delimit your strings.
This still leaves the problem of how to handle double quotation marks. As with the above example with single quotation marks, we have seen a myriad of double quotation mark escape schemes. The best of the bunch, from a processing point of view, is to double up the double quotation marks. This works well in RC files, so for consistency - and thus speeding your localization project - use doubled double quote marks in other software files as well.
Instructive ending
In summary, reduce the volume of your instructions by eliminating the need for them. Get rid of the exceptions, and the exceptions to the exceptions (ad infinitum). Lengthy instructions increase the effort and the time required to process your files.
Also, remove Excel from your localization process. Leave word whacking to software designed for such purposes.
Last, but perhaps most importantly, understand and be consistent in the use of characters which need to be escaped. The words "trial" and "error" are as painful as they sound, and a fine way to torture your localization team.