Non-obtrusive i18n
Note: This approach builds on the concept of Non-Obstrusive HTML Replacing discussed earlier. I’d recommend reading that before reading this post.
Treating an HTML page as XML really starts to open up some powerful abilities. Following on the theme of being “non-obtrusive” to the HTML file, Internationalization (i18n) can be achieved in very elegant way.
The Typical Approach
Typically, i18n is handled by putting keys in the code which then are used to look up values in a file targeted for a particular language. For example, you might have a file for English that looks like:
<translations xml:lang="en">
<pair key="title">Welcome to our site!</pair>
<pair key="goto">Goto</pair>
<pair key="articles">Articles</pair>
</translations>
and a Spanish version that looks like this:
<translations xml:lang="es">
<pair key="title">Bienvenido a nuestro sitio!</pair>
<pair key="goto">Ir a los</pair>
<pair key="articles">artÃculos</pair>
</translations>
And the code would typically use these values when building the page:
let $lang-code := "en"
let $translation-pairs := doc()/translations[fn:lang($lang-code)]/pair
let $title := $translation-pairs[@key="title"]/text()
let $goto := $translation-pairs[@key="goto"]/text()
let $articles-string := $translation-pairs[@key="articles"]/text()
return
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>{$title}</title>
</head>
<body>
<div>{$title}</div>
<div><span>{$goto}</span> <a href="articles.html">{$articles-string}</a></div>
</body>
</html>
Which produces the following HTML for English:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Welcome to our site!</title>
</head>
<body>
<div>Welcome to our site!</div>
<div><span>Goto</span> <a href="articles.html">Articles</a></div>
</body>
</html>
and this for Spanish:
<html version="-//W3C//DTD XHTML 1.1//EN" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Windows (vers 24 July 2007), see www.w3.org"/>
<title>Bienvenido a nuestro sitio!</title>
</head>
<body>
<div>Bienvenido a nuestro sitio!</div>
<div><span>Ir a los</span> <a href="articles.html">artÃculos</a></div>
</body>
</html>
This works fine except for that it makes the code to build the HTML hard to understand because you don’t really know what the page looks like until the code is executed and the page rendered in the browser. An html file from the designer or front-end coder has got to be hacked up to have all the values replaced with keys so then then the cost of changing the HTML is even greater because you have to make sure you’ve got all the string value replaced correctly with keys in the code.
The non-ob i18n approach
Instead of the code building HTML and using keys everywhere, let’s put a custom attribute in the HTML that indicates the i18n key for that element. Using the above example, we’ll create an html file called “home.html” and give it the following contents:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title i18n="title">Welcome!</title>
</head>
<body>
<div i18n="title">Welcome!</div>
<div><span i18n="goto">View</span> <a href="articles.html" i18n="articles">Articles</a></div>
</body>
</html>
Custom attributes are valid XML, although there is no i18n attribute in HTML. However, HTML authoring tools will ignore the attribute so that this shouldn’t affect the front-end coders ability to change markup. This isn’t really “non”-intrusive, but it is “minimally”-intrusive. Either the developer or the front-end coder can add the attribute to the html file.
With the custom attribute “i18n” set on any element whose text we want to change, we can write a translate() function that swaps the text of the element using the values from the translation file.
module namespace i18n = "http://non-ob/i18n";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare function translate($xhtml as element(), $lang-code as xs:string) {
let $lang-doc := (fn:doc()/translations[fn:lang($lang-code)])[1]
let $xhtml := dispatch($xhtml, $lang-doc)
return $xhtml
};
declare function passthru( $x as node(), $lang-doc ) as node()* {
for $z in $x/node() return dispatch( $z, $lang-doc )
};
declare function dispatch( $node as node(), $lang-doc ) as node()* {
typeswitch ( $node )
case text() return $node
case comment() return $node
case element( * ) return (
if ($node/@i18n)
then (
element {fn:QName ("http://www.w3.org/1999/xhtml", fn:local-name($node))} {
(for $attr in $node/@*
where fn:not(fn:local-name($attr) eq "i18n")
return $attr),
$node/*,
$lang-doc//pair[@key=$node/@i18n]/text()
}
)
else (element { fn:QName ("http://www.w3.org/1999/xhtml",fn:local-name( $node ) )} { ( $node/@*, passthru( $node, $lang-doc) ) })
)
default return element { fn:QName ("http://www.w3.org/1999/xhtml",fn:local-name( $node ) )} { ( $node/@*, passthru( $node, $lang-doc) ) }
};
The code first gets the translation file from the DB for the give language code, then it just descends the HTML making a copy of every node, but if the node has the i18n attribute, it gets the value of the attribute and uses that as the key to key-value pairs in the translation doc, and then change the value of the text in memory. What is returned is transformed HTML. The i18n attribute is no longer in the HTML.
So now our XQuery page looks like:
xquery version '1.0-ml'; import module namespace i18n = "http://non-ob/i18n" at "i18n.xqy"; declare namespace html = "http://www.w3.org/1999/xhtml"; let $html-filepath := "C:/Users/ryansemerau/Documents/home.html" let $xhtml := xdmp:tidy(xdmp:document-get($html-filepath))[2]/html:html let $lang-code := "en" return i18n:translate($xhtml, $lang-code)
The code for the page only has one call to translate() as the last thing to do before sending the HTML to the browser. This keeps the code and the html file easy to read. All that’s needed to internationalize strings is to create a key-value pair in the appropriate translation file and then use the i18n attribute in the html file. No XQuery code needs to change. The front-end coder can change or enhance the HTML, internationalize strings, and the markup does not need to go through a developer .
Again this is something that would be very hard to do in Java, and if it could be done, would probably be very slow. This approach uses elegance, not frameworking, to transform data from one form to another.
Interesting approach. Thank you.
I noticed that you placed your i18n attributes quite deeply. For example, your “goto” and “articles” i18n attributes are placed on elements that may need to be reordered when translated. For example, German may choose to translate “View articles” as “Artikel anzeigen”.
Does your solution allow placing i18n attributes on elements that contain non-text node child elements? In other words, could you remove the i18n attributes “goto” and “articles” and add a single i18n attribute “goto-articles” on the containing div? That would be the appropriate level at which to distinguish a translatable segment.
Just one comment – prefixing the attribute with “data-” will make it valid html.
E.g. data-i18n=”title”