Technical details on techniques used in the development competition
The post on winning the development competition has generated a lot of interest (relative to my other posts) so I’d thought I’d go into more detail about the technology we used.
First, there are some guiding principles that I follow when it comes to xquery web apps on MarkLogic. They are not set in stone, and individual projects may deviate. But these are my starting points:
- Embrace the functional programming paradigm and the XML data model. Don’t fight it. Don’t try to turn XQuery into Java, or XML into an object model. Don’t try to reproduce hibernate, spring, jsp, php, rails, or any other object-oriented framework in MarkLogic. Let the server do its thing. You bought the server, so use it for what it is. Otherwise, write your own server.
- Use a conceptually simple data model. Don’t start off with creating a data model structured with how it will be queried. Think of the entities that you will need and keep the number of them low. Have each type of entity have minimal references to other entities and keep them coarse. Keep it simple.
- Make the application data- and configuration-driven for flexibility. Write the application so that it works according to the data it has. Facets are an example of this in search. The facets that are displayed are based on what came back in the results. You don’t have to specify up front what the individual facets will be. The server will figure that out at runtime. Build navigation and other options off of values in the DB. Write modular code that is discoverable at runtime. This is an advanced topic that can’t be sufficiently covered here, but the Masquerade approach (discussed below) is example of this.
- Leave the HTML files intact. A rich interface requires plenty of HTML work so leave it alone. Don’t shred the HTML into XQuery. Don’t make the HTML only work on through a server. Let the front-end development take place in a static context, and then make replacements as needed dynamically at runtime. When the HTML needs to be modifued (which is constantly) the front-end dev can easily do it because the file didn’t have to change to be served on the server. The Nonob approach discussed below is an example of this.
- Let collections and permissions restrict the data access for the user. Don’t write your own access restriction. Write code so that it just operates on the correct data based on the user’s permissions and the current collection in use.
- Code to “bare metal” as much as possible. Use the built-in APIs if they can do what you need, even if you have to compose them together a little. Avoid custom functions just to “keep the code clean”. Avoid creating “gateway” functions, that is, functions that get the data the “right” way for you. Let the code get the data it needs, when and how it needs it. Not only does this reduce the amount of code you write, it allows you do surgically get just the data you want using predicates you want. It also reduces bug risk because the built-in APIs are going to be much more likely to be bug-free than something custom you wrote.
- Use static javascript and CSS assets. MarkLogic is a powerful markup machine so use it to handle all the markup, and let the js and CSS be static and cacheable. This way most requests to the server are just for new markup that will have very little jas or CSS in it.
Ok, so how exactly do you do this? Below are some approaches, patterns, libraries, and techniques that my team used on the application:
- Non-obtrusive HTML replacement (nonob).In this approach, the front-end dev creates the static HTML with whatever js and CSS assets he wants, using dummy data in the HTML, so that it’s fully mature right off of the filesystem. He then checks the HTML and assets into source control and the XQuery dev then finds which parts on the HTML page need to be dynamically replaced and writes replacement rules in an XQuery file. The XQuery dev checks in his code and leaves the HTML alone. The MarkLogic server executes the XQuery file, pulling in the HTML, makes the replacements using the the rules in the file and pulling in whatever data it needs, and returns the resulting HTML to the browser. The assets are also in MarkLogic so those requests just simply return the requested js and CSS files that the front-end dev put it. This is probably the single biggest time saver in development. There are several reasons why:
- Front-end development and server development can happen in parallel. Given a design, both the front-end dev and XQuery dev can determine what needs to be done and they can bot start working on their areas. It’s common for the front-end dev to check in HTML, which then the XQuery dev refers to for replacements, even while the front-end dev keeps making changes to that HTML.
- Changes can be made at any time by front-end or XQuery devs. At any point, if either front-end devs or XQuery devs need to change code or HTML, they can do it without having to consult with each other. There are limits, but this is true for the most part.
- XQuery code only needs to focus on making the replacements, not building the page. The only XQuery code to write for HTML pages is getting the necessary data to make the replacements. Let the front-end guy deal with doing all the HTML.
- The same XQuery code can do replacements on different HTML files. You can have one HTML page for desktop, one for tablet, one for mobile, one for IE6, whatever, and the same XQuery code can be used to do the replacements for any of the files. You can also have multiple views (Large image, no image, banner, etc) of a page and the same XQuery code is used. It’s just that the input HTML page changes.
- Masquerade.This is a technique that abstracts the addressing of a resource from the code that will service that resource. Friendly URLs are a common example of this where there are rules or configuration that map URLs to pages. Masquerade takes it a few steps farther by querying the DB for code that has indicated that it responds to that URL and the appropriate code is used to generate the HTML based on the session or user agent. Here is how it works:
- Request comes into the rewriter in MarkLogic
- The rewriter queries the DB (particular collection, and using the permission set of the user) to find a “masq” that has an “url” element that matches the URL path. An example XML file for a masq may look like this:
<masq key="aboutus"> <url type="preferred">/about-us</url> <url>/aboutus</url> <url>/about-us.html</url> <view type="mobile" xml:lang="en"> <html>about-us.html</html> <code>about-us.xqy</code> </view> <view type="desktop" xml:lang="en"> <html>about-us-mobile.html</html> <code>about-us.xqy</code> </view> </masq>
- The rewriter picks the best “view” of the masq that fits the client.
- The rewriter calls the code that the view specifies and passes the starting HTML that the view specifies
- The code uses nonob to make replacements on the HTML and returns the HTML back to the browser.
- Making a new page and new page rule involves writing the HTML, the code, and creating a masq XML file. No changes to the rewriter are needed because masqs are queried at runtime
- Pages that are off-limits to the certain users can be protected by setting the permissions on the masq XML file so that the query for masqs would not find the restricted ones for that user. You can also set the permissions on the xqy file.
- Nonob i18n.In order to facilitate the page being in multiple languages, the nonob i18n library does the following:
- In the static HTML that the front-end developer writes, custom HTML attributes are added to all the part of the page that need to be i18n-ized. These custom attributes do not interfere with any HTML coding tools and browsers ignore attributes they don’t understand, but the attributes are removed before being sent to the browser at part of the nonob replacement.
- Here’s an example of i18n attribute on a div. The value of the i18n attribute is the key to lookup the translation in a resource file.
<div id="mydiv" i18n="myapp.welcome-page.title">Welcome to my site!</div>
- Here’s an example where you want to i18n-ize an input field:
<input type="textbox" i18n="pages.register.name.placeholder" i18n-target="value" value="Name"/>
- The resource files are stored by language. And example of one might be:
<resource-bundle xml:lang="spa"> <resource key="pages.register.name.placeholder">Nombre</resource> <resource key="pages.register.submit">Submito</resource> ... </resource>
- The resource to use is determined at runtime using the user’s language preferences sent in the HTTP headers (or previously set settings) and the resource bundle language available. So if the browser says, “ru,es,en” then the framework will pick the most appropriate language to use, which in our example is Spanish (es), because there is no Russian version.
- After nonob has made the replacements (which is does in one pass through the HTML), and then i18n-nonob does it’s replacements in one pass, where it finds all the i18n attributes, gets the values from the resource files for the given language, makes the replacements, and returns the HTML to the browser.
- This requires one line of code to use the library, and the only effort is to add the i18n attributes to the static HTML and create the resource files. Note that the page can dynamically set i18n attributes if it wants in the nonob step because the i18n-nonb step is just going to parse the whole HTML again after nonob is done.
- Nonob + i18n nonob takes about 50 milliseconds usually
- Nonob + masquerade + i18n-nonob provides a lot of flexibility and permutations with minimal coding.
Those are the more important parts that contributed the most to rapid development. We also already had a configure and deployment tool, and an authentication library so we didn’t have to spend much time with those. The book I am writing goes into better detail about how to do some of these things.
Enjoy