<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>XQuery Web Application Development</title>
	<atom:link href="http://xquerywebappdev.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://xquerywebappdev.wordpress.com</link>
	<description>Two-tier architecture using XQuery</description>
	<lastBuildDate>Thu, 26 Jan 2012 21:55:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='xquerywebappdev.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>XQuery Web Application Development</title>
		<link>http://xquerywebappdev.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://xquerywebappdev.wordpress.com/osd.xml" title="XQuery Web Application Development" />
	<atom:link rel='hub' href='http://xquerywebappdev.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Using a Semantic Data System for Genealogy Data</title>
		<link>http://xquerywebappdev.wordpress.com/2012/01/24/using-a-semantic-data-system-for-genealogy-data/</link>
		<comments>http://xquerywebappdev.wordpress.com/2012/01/24/using-a-semantic-data-system-for-genealogy-data/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 20:13:00 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=882</guid>
		<description><![CDATA[Last spring at my brother&#8217;s wedding I had an opportunity to talk to my aunt who is getting a degree in Genealogy from Brigham Young University. Far from being a quaint little hobby of grandmas, genealogy is some serious business to professionals in the industry. My aunt told me about a project she had been [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=882&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last spring at my brother&#8217;s wedding I had an opportunity to talk to my aunt who is getting a degree in Genealogy from Brigham Young University. Far from being a quaint little hobby of grandmas, genealogy is some serious business to professionals in the industry. My aunt told me about a project she had been working on where she had filled up four full-sized whiteboards completely with information in the effort to come to conclusions for her project. Naturally I asked about what computer applications she uses and what format the data is in. That&#8217;s when the rush of frustration was unleashed on the state of the technology for genealogy.</p>
<p>While there are some good sites that host indexes that you can search, there really was nothing that met her needs to compiling and coalescing the information and for capturing and presenting her conclusions. The amateur software was obviously not sufficient for what she needed and there seemed to be not good data format for any of the data except for pedigrees, which actually is the result of research, ie the conclusion, not the pieces of data that lead up to determining a pedigree.</p>
<p>I told my aunt that I imagined that genealogy data is pretty messy, that it&#8217;s often incomplete, inconsistent, contradictory, in different levels of detail, quality, uncertainty, and authority. And rather trying to fit messy data into defined buckets or categories or pedigrees, what there really needs to be is something that embraces the messiness and just captures everything and what the system produces is probable profile of an individual. Individuals and families become more of a statistical probability of a convergence of loose data points, rather that a &#8220;Person&#8221; in the system. This may sound a little harsh and impersonal, but we often don&#8217;t have data sufficient to make an authoritative call on what a &#8220;Person&#8221; is based on the data available. So we have to switch to thinking of the data as revealing information about an individual with a non-definite amount of certainty.</p>
<p>My aunt of course lit up at this, and after I said I have experience in a system that can do just that, I was invested. I bought several books on genealogy for professionals and my understanding of them and the field increased tremendously. A professional genealogist is part historian, part biographer, part information analyst, part linguist, part author, and forensic expert, and other things too I&#8217;m sure. Evidence is paramount. If you can&#8217;t provide sources and correct citations for the information and conclusions you&#8217;ve drawn, then you have no conclusions that anyone can accept. The most common publishing avenue is in document form, as in reports, book, and articles. The sharing of the data itself is not where it needs to be, often being trapped in footnotes, in-document tables and charts, and of course whiteboards (or photos of whiteboards).</p>
<h2>The RootsTech Developer Challenge</h2>
<p>The <a href="http://rootstech.org/challenges/overview">RootsTech Developer Challenge</a> was a good opportunity for me to make reality ideas I have had on how to address the need for improved genealogy systems. The challenge is billed more for &#8220;apps&#8221; to enhance the public&#8217;s engagement with genealogy and to foster increased interest in the field. I did not think the problem or next step is better apps. I think the problem is how the data is handled. Even amazing apps aren&#8217;t going to be able to compensate for issues with a weak system for handling the data. What I thought is that we need a fundamentally different way to handle genealogy data and embrace the messiness.</p>
<p>What we needed was a system that is designed to take in data on just about anything, from individuals to locations, from factoids of an individual&#8217;s life to full and complete pedigrees. We needed a system that did not seek for the one correct data set on a person but rather a high tolerance of uncertainty, contradiction, and even error. Some genealogy systems in the past sought to create a single pedigree for everyone but what ended up is people overwriting other&#8217;s previous entries and errors that were introduced were perpetuated, causing a gradual degradation of the quality of the data in general in the system. We needed a system that could take in anything and with quality and certainty measures be able to tag the data points so users could know the quality of the data. And we need the users themselves to be able to set some of the quality and certainty measures.</p>
<p>We also need users to be able to upload the results of their work, research and conclusions, not just in report or final form, but even the fragments and bits that they have. Users need to be able to make these data public, keep them private, or share them with others that they choose. Users need to be able to incorporate these data in their searches, and they need to be able to pick and choose what kind of data (public, private or shared) when they search. They also need to be able to choose the quality and certainty level of data they want to include in their search to improve the signal to noise ratio of their results.</p>
<p>And why can&#8217;t the system go through the data and infer some results? If John Smith was divorced from Alice in 1834, why can&#8217;t the system infer that John Smith is male, was married before 1834, and that his wife was probably named Alice Smith? And why can&#8217;t the system infer that the John Smith in that divorce record is the same John Smith in another census record that says he lived in Pleasantdale, Maine and had two kids, or at least provide a level of probability? Why can&#8217;t then the information between these two records be combined when I search for &#8220;Smith divorced Maine&#8221; and have a hit, showing me all the combined information for person with the quality and certainty levels displayed?</p>
<p>Why can&#8217;t users add their conclusions, assertions, and inferences, too? Why can&#8217;t Sally add that she found an obituary for John Smith in Pleasantdale in an old microfiched copy of a newspaper and it says he was survived by three children? And after all these derivative data points are added in the system, why can&#8217;t I trace each one of them back to their original sources, with contact information of the people who captured or entered the information?</p>
<p>It might sound nice but pie-in-the-sky, but that&#8217;s what I did and that&#8217;s what I entered into the RootsTech Developer Challenge. Granted the application is ugly (think Geocities&#8230;<em>beta</em>) and I&#8217;m sure it has bugs, but everything I described above is included and it works and works fast (all subsecond with a 4GB database), with the exception of being able to include certainty levels in searches. I don&#8217;t expect it to win and I am kind of done with that challenge because I feel like I figured it out and now I want to move on with it, but I enjoyed the challenge.</p>
<h2>How it Works</h2>
<p>The data model was the single most important factor of getting this system to work. Relational data models with tables and columns is just insufficient because the data could be virtually anything and could be virtually in any structure, and that just doesn&#8217;t fit in tables and columns. XML is better because it can allow any structure in a document with an field and any values, but that still doesn&#8217;t make searching the data any easier. To search the data and make sense of the values you need to know ahead of time what fields are available. A <a href="http://en.wikipedia.org/wiki/Semantic_data_model">Semantic Data Mode</a>l provides a way for any data to be of any type and associated with any other thing, be it a value or an entity, so that&#8217;s the model I chose.</p>
<p>The system I built at its heart is a Semantic Data system with some modifications. Semantic data is in Subject &#8211; Predicate &#8211; Object form. The subject is anything, often it is an id or key of some &#8220;thing&#8221; but it can be anything. The Predicate is the &#8220;type&#8221; of relationship between the Subject and the Object. A Predicate can be for a value (&#8220;Eye color is&#8221;) or it can be for the relationship between two things (&#8220;is Father of&#8221;). The Object is either an value (&#8220;Blue&#8221;) or the id or key of some entity (&#8220;123-John-Smith&#8221;). The Predicate usually tells you if the Object is a value or another entity, which would probably be used elsewhere as a Subject in some other fact.</p>
<p>This kind of triple, S-P-O, is called a <a href="http://searchcio-midmarket.techtarget.com/definition/tuple">tuple</a> and can be used to model virtually any kind of data. For genealogy, there may be a tuple like:</p>
<p>&#8220;123456789&#8243; &#8220;first-name&#8221; &#8220;John&#8221;</p>
<p>and another tuple</p>
<p>&#8220;123456789&#8243; &#8220;last-name&#8221; &#8220;Smith&#8221;</p>
<p>and another</p>
<p>&#8220;123456789&#8243; &#8220;married-to&#8221; &#8220;4443333222&#8243;</p>
<p>and another</p>
<p>&#8220;4443333222&#8243; &#8220;birthplace&#8221; &#8220;Chicago&#8221;</p>
<p>and another</p>
<p>&#8220;11111111&#8243; &#8220;father-of&#8221; &#8220;4443333222&#8243;</p>
<p>and another</p>
<p>&#8220;11111111&#8243; last-name&#8221; &#8220;Grant&#8221;</p>
<p>etc.</p>
<p>So in the above example, we know that there is someone with a name of &#8220;John Smith&#8221; who married someone who was born in Chicago and whose father&#8217;s last name was Grant. Even without complete information, we can &#8220;walk the graph&#8221; and go from person to person using relationships defined in the data and view the data we have for each person and at each step. Note too that the relationships can be bi-directional. We may know that A is the father of B or we may know that B is the daughter of A. Either way we can walk the relationship graph and <em>infer</em> the parent-child relationship even if that relationship is only original defined one direction. Better yet, the system can make the inference and add that inference as a new tuple into the system.</p>
<p>For a collaborative semantic genealogy data system, Subject &#8211; Predicate &#8211; Object is actually insufficient. Three other aspects need to be added: Time, Source, and Quality. If we know that in an 1850 census is says John Smith was 34, we want to capture that original data as-is with no interpretation in order to preserve the data integrity. So we would have:</p>
<p>&#8220;12345-page-4-line-2&#8243; (or whatever we decide the subject should be) &#8220;Age is&#8221; &#8220;34&#8243;</p>
<p>But that fact was not always true, just when the census worker captured the information. So we need to add Time, which in this case is &#8220;1850&#8243; or as much as we know. So now we have Subject &#8211; Predicate &#8211; Object &#8211; Time (S-P-O-C):</p>
<p>&#8220;12345-page-4-line-2&#8243;  &#8220;Age is&#8221; &#8220;34&#8243; &#8220;1850&#8243;</p>
<p>But what if that data is coming from something written in someone&#8217;s Bible. We want to capture that too, So now we&#8217;ll have Subject &#8211; Predicate &#8211; Object &#8211; Time &#8211; Source (S-P-O-C-E):</p>
<p>&#8220;12345-page-4-line-2&#8243;  &#8220;Age is&#8221; &#8220;34&#8243; &#8220;1850&#8243; &#8220;Alison-Grant-Family-Bible&#8221;</p>
<p>Now the source would be a key to some other data that would have detailed information about the source and contact information for whoever got the information from the source. But how reliable is this data? Is it for the right John Smith? Is the information legible in the Bible? Is it firsthand, secondhand, or more? We need a quality value, so know we have Subject &#8211; Predicate &#8211; Object &#8211; Time &#8211; Source &#8211; Quality (S-P-O-C-E-Quality):</p>
<p>&#8220;12345-page-4-line-2&#8243;  &#8220;Age is&#8221; &#8220;34&#8243; &#8220;1850&#8243; &#8220;Alison-Grant-Family-Bible&#8221; &#8220;4&#8243;</p>
<p>Assuming a scale from 1 to 10 and 10 is absolutely certain. Now we have a model that can capture virtually any data, for anything, true at a particular time, of any quality, and noting the source. For sharing and privacy we can either add those into the tuples, or use system permission controls to secure the tuples themselves.</p>
<h2>Yeah But How Can This Be Fast?</h2>
<p>We covered conceptually how to model the data to achieve the goals I listed at the beginning. Fortunately there are several products in the industry for Semantic or Graph Databases. I chose to use MarkLogic because it was familiar to me, I knew it was extremely fast, can scale to the petabyte level, but most importantly it has very advanced language tools. Other graph database may be fast linking different pieces of data together, but I also needed one that was fast when searching for first names that sound like &#8220;John&#8221;, or contain &#8220;John&#8221;, or are spelled similar to &#8220;John.&#8221; I needed something that could find word stem hits, like when I search for &#8220;run&#8221; I get &#8220;ran&#8221;, &#8220;runs&#8221;, &#8220;running&#8221; also. For this free text searching (which is almost entirely against the Object in the tuple) I need a powerful search system. So MarkLogic gave me both the free text searching and the fast data linking, and security down to the user and tuple level, which can scale to petabytes. I suppose this system could be build on other platforms, but I&#8217;ll leave it to the reader as an exercise to prove out.</p>
<p>The implementation details and actually code used can be pretty complicate and lengthy, but for those interested almost all the magic is in the prolific use of cts:queries in MarkLogic.  For searches such as &#8220;Family of John Smith&#8221; there is a Predicate Resolution step which builds a list of all Predicate types that are part of a &#8220;family&#8221; type which is determined by rules (&#8220;husband-to&#8221;, &#8220;brother-of&#8221; etc). Then this Predicate Set is sent as exact values to match in the P. The queries against Objects (O) are stemmed, case-insensitive, diacritic-insensitive, punctuation-insensitive, and whitespace-insensitive. I also created a Double Metaphone custom index on all the values for Objects of Predicate types of names and locations. So I took all unique values of O for those types and created a file for each type of the calculated Double Metaphone value, which MarkLogic provides an API for. Then the Object Resolution step include the original value typed and all the values from the Double Metaphone index (which are existing values in O) which have the same Double Metaphone value and which are then also included in the O set to the query. All linking and joins are done through the Subject (S). For tuples that have been linked &#8220;Same-As&#8221;, I query all Objects, get their subjects and filter them to the ones that have Same-As matched within the result set. This is the Subject Resolution Step. And voilà, I have my results.</p>
<p>Relevancy is not so good yet, and all the sharing and access control I&#8217;ve done via yet another field to the tuple: Sharing. But I&#8217;m rethinking that and considering using document permissions instead. Below are a couple of functions so people can see some of the code. It is most of the code in a free text search field that expects that you are going to include a first name and last name in your search.</p>
<h2>What&#8217;s Left?</h2>
<p>I don&#8217;t think I&#8217;ve answered the need of what my aunt needs for her individual research, but this provides the engine and capability to build off of. There lots of application interface coding left to do, too. Getting data for this has actually been pretty hard. Any genealogy data dumps I could get were on CDs from the 90s in Infobases format, with threatening EULAs about not using this data in another system. So I didn&#8217;t use that. Oddly there is a county in Maine that has a decent amount of census data online (trapped in HTML pages): Aroostock County. The State of Maine has several databases posted online (mostly in Access databases) like military record, divorces, court proceedings, and Revolutionary War Land Grants. I spent about half of my time getting the data and processing it into a format I could use, eventually in semantic form.</p>
<p>This system can be used as a way for researcher to enter share, and publish, their work and conclusions, and they can do so without any original data being in the system. They can make the Subjects individuals on the big genealogy sites, and the can make the sources point to those specific URLs elsewhere. This would then effectively be a big meta data system. The data need not be limited to name and date information, but can have photographs, scans, video, pdfs, and any other kind of data. But I really hope I can either get data or work with one of the big website to incorporate these capabilities. It&#8217;s just time and money.</p>
<p><pre class="brush: java;">
declare function free-search($phrase, $sources, $start, $end) {

	let $terms := fn:tokenize($phrase, &quot; &quot;)[. != &quot;&quot;]
	let $subjects := reduce((), $terms)[$start to $end]
	let $subject := get-all-same-as-subjects($subjects)
	let $subjects :=
		cts:search(/t,
			cts:and-query((
				cts:element-value-query(xs:QName(&quot;s&quot;), $subjects, &quot;exact&quot;)
			,
				get-cts-search-source-query($sources)
			))
		)/s/text()
	let $subjects := fn:distinct-values($subjects)

	return
		&lt;results&gt;
			{
			for $s in $subjects
			let $family-name-hit := (/t[s = $s][p = $type:person-family-name]/o/text())[1]
			let $given-name-hit := (/t[s = $s][p = $type:person-given-name]/o/text())[1]
			let $name := fn:concat($given-name-hit, &quot; &quot;, $family-name-hit)
			let $query :=
				cts:and-query((
					cts:element-value-query(xs:QName(&quot;s&quot;), $subjects, &quot;exact&quot;)
				,
					cts:element-word-query(xs:QName(&quot;o&quot;), $terms)
				,
					get-cts-search-source-query($sources)
				))

			let $t := cts:search(/t, $query)

			return
				&lt;hit&gt;
					&lt;s&gt;{$s}&lt;/s&gt;
					&lt;name&gt;{$name}&lt;/name&gt;
					{
					for $at in $t
					return
						&lt;highlight&gt;
							{$at/p}
							{$at/o}
							{$at/c}
							{$at/e}
						&lt;/highlight&gt;
					}
				&lt;/hit&gt;
			}
		&lt;/results&gt;
};

declare function reduce($subjects, $terms) {
  let $matched-subjects :=
    if (fn:empty($subjects))
    then
      cts:search(/t,
        cts:element-word-query(xs:QName(&quot;o&quot;), $terms[1])
      )/s/text()
    else
      cts:search(/t,
        cts:and-query((
          cts:element-value-query(xs:QName(&quot;s&quot;), $subjects, &quot;exact&quot;)
        ,
          cts:element-word-query(xs:QName(&quot;o&quot;), $terms[1])
        ))
      )/s/text()
   return
     if (fn:count($terms) = 1)
     then fn:distinct-values($matched-subjects)
     else reduce(fn:distinct-values($matched-subjects), $terms[2 to fn:last()])
};

declare function get-all-same-as-subjects($subject) {
get-same-as-subjects($subject, ())
};

declare function get-same-as-subjects($check-subjects, $found-subjects) {

let $other-subjects := fn:distinct-values(
for $check-subject in $check-subjects
return
cts:search(/t,
cts:and-query((
cts:element-value-query(xs:QName(&quot;p&quot;), $type:person-same-as, &quot;exact&quot;)
,
cts:or-query((
cts:element-value-query(xs:QName(&quot;o&quot;), $check-subject, &quot;exact&quot;)
,
cts:element-value-query(xs:QName(&quot;s&quot;), $check-subject, &quot;exact&quot;)
))
))
)/(s|o)/text()
)

let $new-subjects :=
for $other-subject in $other-subjects
return
if ($other-subject = $found-subjects)
then ()
else $other-subject

return
if ($new-subjects)
then get-same-as-subjects($new-subjects, ($new-subjects,$found-subjects))
else $found-subjects

};
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/882/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/882/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/882/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=882&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2012/01/24/using-a-semantic-data-system-for-genealogy-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>Code for translating content using Google Translate</title>
		<link>http://xquerywebappdev.wordpress.com/2012/01/17/code-for-translating-content-using-google-translate/</link>
		<comments>http://xquerywebappdev.wordpress.com/2012/01/17/code-for-translating-content-using-google-translate/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 23:25:23 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[Tips n' Tricks]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=874</guid>
		<description><![CDATA[Often I use machine translated content when I am developing so that I have content in various language before I get the official translation. I&#8217;m not too concerned that the translation is correct, just as long as it is representative of content in that language. I&#8217;ve found that creating a website from the beginning in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=874&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Often I use machine translated content when I am developing so that I have content in various language before I get the official translation. I&#8217;m not too concerned that the translation is correct, just as long as it is representative of content in that language. I&#8217;ve found that creating a website from the beginning in at least two different languages helps me avoid coding myself in a box and making assumptions that are only valid in a single-language website, particularly in my queries for content in the DB and in how the web page layout handles strings of different lengths.</p>
<p>I usually create an initial resource bundle of some sort in English and then use Google Translate to translate the strings in the bundle into various target languages. I automate this so that I just push a button or have the script execute in a trigger so that when I change or add to the resource bundle in English I can re-translate it easily.</p>
<p>Here is the type of script I use to make the call to Google Translate:</p>
<p><pre class="brush: java;">
declare function local:get-google-translation($text, $source-lang, $target-lang) {

  let $url := fn:concat(&quot;http://translate.google.com/translate_a/t?client=t&amp;amp;text=&quot;, xdmp:url-encode($text), &quot;&amp;amp;hl=&quot;, $source-lang, &quot;&amp;amp;tl=&quot;, $target-lang, &quot;&amp;amp;multires=1&amp;amp;sc=1&quot;)

  let $response := xdmp:http-get($url,
    &lt;options xmlns=&quot;xdmp:http-get&quot;&gt;
      &lt;format xmlns=&quot;xdmp:document-get&quot;&gt;text&lt;/format&gt;
    &lt;/options&gt;
  )

  return fn:tokenize($response[2], '&quot;')[2]
};

let $text := &quot;Rome (CNN) -- Transcripts published Tuesday capture the dramatic conversations between port officials and a cruise ship captain, who a judge ruled can be held under house arrest while Italian authorities investigate his role in last week's disaster.&quot;
let $source-lang := &quot;en&quot;
let $target-lang := &quot;es&quot;

return local:get-google-translation($text, $source-lang, $target-lang)


=&gt; ROMA (CNN) - Las transcripciones publicadas el martes la captura de las conversaciones entre los funcionarios del puerto espectacular y un capitán de barco de crucero , que puede ser un juez dictaminó bajo arresto domiciliario mientras las autoridades italianas investigar su papel en el desastre de la semana pasada .
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/874/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/874/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/874/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=874&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2012/01/17/code-for-translating-content-using-google-translate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>Node.js is good for solving problems I don&#8217;t have</title>
		<link>http://xquerywebappdev.wordpress.com/2011/11/18/node-js-is-good-for-solving-problems-i-dont-have/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/11/18/node-js-is-good-for-solving-problems-i-dont-have/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 23:54:14 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>
		<category><![CDATA[node.js]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=857</guid>
		<description><![CDATA[I have recently starting programming with Node.js and I like how simple and easy it use to write HTTP server code with it. Just because it&#8217;s easy doesn&#8217;t mean it&#8217;s appropriate for my needs or that it&#8217;s ready for prime time. What I have noticed in learning and using Node is that it was created [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=857&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have recently starting programming with Node.js and I like how simple and easy it use to write HTTP server code with it. Just because it&#8217;s easy doesn&#8217;t mean it&#8217;s appropriate for my needs or that it&#8217;s ready for prime time. What I have noticed in learning and using Node is that it was created primarily as a response to a problem that I just don&#8217;t have, or in fact that most web applications shouldn&#8217;t have.</p>
<p>Node was created to provide an event-based web server programming model that better utilizes threads on the server, particularly when it comes to IO operations (like filesystem reads or database calls). So rather than a thread having to wait for an IO operation to finish before program execution continues, the thread implements a callback to be called when the IO operation is finished. This way the threads are able to server more request because they aren&#8217;t waiting for expensive operations to complete.</p>
<p>Who has this problem? Whose web application performance bottleneck is that their threads are waiting for IO to complete? If this is your problem, then I don&#8217;t think you&#8217;ve got a very good web application implementation. Let me explain why.</p>
<h3>HTML Generation is usually not the slowest part</h3>
<p>Given that a web application is correct, available, and secure, users care most about speed. They don&#8217;t care about your hardware utilization or how many requests are handled per server, and they also don&#8217;t care how fast your web app is on average, they care about how fast it is for them. Page load time and time-until-usable are what users are concerned about.</p>
<p>Looking at cnn.com, there are 89 requests, totalling a little less than 1MB, which took 4.24s for it to load for me. Of those 89 requests, 3 were HTML requests from cnn.com (1 for the HTML page and 2 for weather). The HTML from cnn.com is about 30KB&#8230;out of 1MB! So if you want to speed up your site, where is the best place to focus? HTML from the server that makes up 3% of the total weight, 3% of the total number of  requests, and 9% of the total page load time? Or would you focus on reducing the number of requests, the size of assets, and the caching of those assets?</p>
<p>Cnn.com&#8217;s HTML took 336 ms to get to me. Let&#8217;s say you made that 10x faster. You would have then reduced the total page time by 300 ms or about 7% of the total page load time and still get about 4 seconds for page load. You could have a <a title="The 1 Millisecond Page" href="http://xquerywebappdev.wordpress.com/2011/08/19/the-1-millisecond-page/">1 ms HTML response time</a> and still have a slow site. The HTML generation and return time is usually not where the problem is for web application performance.</p>
<p>Most of the assets on a web page are static (meaning they don&#8217;t change per request) so they can be served by a cache server (so the origin server isn&#8217;t hit) and by the browser (so not even the cache server is hit). The origin server can generate the HTML and server up the static assets if needed, but it really shouldn&#8217;t do that very often because the browser cache and cache servers should be serving them. So then what you really need is a content server that is geared toward HTML generation, whether it be static or dynamic. So you have the origin server generating dynamic but cacheable HTML (like templated by little-changing info pages), and for handling dynamic but non-cacheable HTML (like search).</p>
<p>The content server should not need to do hardly any IO. Why would an HTML content server need to write to the filesystem? Even if it does, why does the web visitor need to wait on the result of that file write operation before seeing the server response? If you really need to write to the filesystem, spawn a thread or offload that operation to something else that can queue up write operations. Your content server doesn&#8217;t need to do it; it just need to invoke something else to do it.</p>
<p>If your content server is serving up dynamic content, what else can it be doing before it gets the data from the database? It&#8217;s primarily going to be formatting and creating presentation using the data from the DB, and if it has something it can be doing in the meantime I&#8217;m arguing it should be doing it. Something else can communicate with other services and cache HTML fragments or whatever. All the content server does is process content, so if it has to wait for the data, it waits.</p>
<p>But why would there be any IO for data that takes much time at all? If the data is so far removed from the presentation engine (the content server) that it blocks for any noticeable amount of time, you got a problem with data retrieval. The answer isn&#8217;t to create a callback for when the data finally arrives from the DB, the answer is to fix the problem of data coming back so slow from the DB.</p>
<h3>Functional programming facilitates optimized and parallelized execution</h3>
<p>One of the reasons I like functional programming is because the execution engine is able to parallelize function calls because functions only operate on data coming into the function and only output a result. Function don&#8217;t change properties or state on objects in memory. Since there&#8217;s no shared state or objects that can be accessed by two different processes, all operations are threadsafe. Better yet, with lazy evaluations like what MarkLogic does for many things, you can capture the result of a function call in a variable, but the execution engine doesn&#8217;t need to actually make the function call until you access something on that variable, which could be at any later point in  your program. In fact, if you never access the variable the execution engine may never actually call the function that returns the value for that variable. Order of execution becomes much less important because the functions have no side effects and can be executed whenever the execution engine decides. The execute of one function does not affect another, so you can execute them all at once, or whenever resources are available. With Node, you&#8217;d be writing code to do all that: optimizing the method calling yourself. Instead, use a functional language and let the execution engine do it for you.</p>
<h3>The problem I have is processing a lot of data quickly</h3>
<p>I have megabytes and gigabytes of data to query and format for display on a web page. I need to be able to find a needle in a haystack and transform it into presentation quickly, for every request. First I need to get the speed down for just one user because that is as fast as I can go (unless another user were to cache it). Then I need that speed to remain fairly constant at scale, both with web traffic and amount of content. I am less concerned about how many requests each server can handle because I can scale horizontally if needed for both traffic and content size. With MarkLogic I have extremely fast access to the content I need. There&#8217;s no IO blocking to speak of. Even if there were, the execution engine will do some optimizing so parts of my code can execute in parallel. I spend time reducing query times, not coding callbacks for them.</p>
<h3>Node enthusiasts are front-end coders not wanting to do server coding</h3>
<p>I have used Javascript for over 15 years. I learned it before I learned Java. It&#8217;s really not too bad. I think what has happened in the web developer community is that some people who know front-end programming have gotten all excited that they can use their front-end skills to program the server. In fact, they think that they can even move a lot of processing that used to be done on the server up into the browser, using the programming languages and techniques they are used to, and all of the sudden it&#8217;s revolutionary and cutting edge. That&#8217;s a big reason CouchDB gained popularity, because there was no need for server programming. With HTML5, some have the idea that we don&#8217;t even hardly need a backend service at all, just to persist some state once in awhile.</p>
<p>So the Node community has tried to sell Node as solving a fundamental problem with server programming (blocking IO calls) but that&#8217;s really not the problem with web page speed or even server speeds, especially per request. I think the real reason is that they are mostly novices that want to use Javascript for the server side but they use the &#8220;blocking&#8221; argument to convince others. All the Node enthusiasts I know, some personally, are not very skilled server programmers but have pretty strong front-end skills. This revolution is more about front end coders not having to deal with the server side than any breakthroughs about how to do the server side. And the exuberance and arrogance from enthusiasts is meant to shame non-enthusiasts into thinking they&#8217;re old school, antiquated, or unable to learn new things, that this is the future and in a few years we&#8217;ll all be programming in Javascript and if you don&#8217;t get on board you&#8217;ll be out of a job (I heard this first-hand). Node <em>has</em> to be adopted, otherwise all these front-end coders will have to learn server programming.</p>
<p>But there are lots of things I like about Node, but not the community. I plan on using Node for easy HTTP server programming and for handling a large number of connections. But I need a Big Data server and a content server to generate dynamic and personalized HTML and to handle search. I&#8217;ll offload the HTML assets and cache as much as possible to cache servers, and I&#8217;ll optimize the front-end code to increase performance. Blocking calls, including IO, are just not one of my problems.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/857/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/857/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/857/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=857&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/11/18/node-js-is-good-for-solving-problems-i-dont-have/feed/</wfw:commentRss>
		<slash:comments>41</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>How to set system variables on a MarkLogic App Server</title>
		<link>http://xquerywebappdev.wordpress.com/2011/11/11/how-to-set-system-variables-on-a-marklogic-app-server/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/11/11/how-to-set-system-variables-on-a-marklogic-app-server/#comments</comments>
		<pubDate>Sat, 12 Nov 2011 00:03:58 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[Tips n' Tricks]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=850</guid>
		<description><![CDATA[Sometimes you want to be able to set variables at the system level and have your code be able to retrieve those values at run time. For example, if you want to know what lane you are on (dev, test, prod, etc.) or what endpoint you need to call for a service which would depend [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=850&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Sometimes you want to be able to set variables at the system level and have your code be able to retrieve those values at run time. For example, if you want to know what lane you are on (dev, test, prod, etc.) or what endpoint you need to call for a service which would depend on what box you are running on. MarkLogic doesn&#8217;t have a formal way of setting system variables but there is a little trick I learned today that mimics this pretty well.</p>
<p>Global Namespaces can be added at the Group or Application level in MarkLogic. Through the Admin Interface on port 8001 or through the API you can set a prefix and namespace URI which is accessible in the code. You set it on the Group and then all App Servers in the Group will be able to access it, or you can set it on the App Server. The App Server&#8217;s namespace will override an existing Group namespace.</p>
<p>So if I wanted to set the type of lane my code is running in, I could set a namespace at the Group level that has a prefix of &#8220;lane&#8221; and a URI of &#8220;prod&#8221;. The following code would get the value:</p>
<pre>fn:namespace-uri-for-prefix("lane", &lt;lane:blah/&gt;)
=&gt; prod</pre>
<p>And if I wanted to set some endpoint, I could create a namespace on the Group with the name &#8220;endpoint&#8221; and URI &#8220;http://mysite:6005&#8243;</p>
<pre>fn:namespace-uri-for-prefix("endpoint", &lt;endpoint:blah/&gt;)
=&gt; http://mysite:6005</pre>
<p>Since these are global namespaces you don&#8217;t have to declare the namespace in the prolog, so you don&#8217;t need any more code than shown above.</p>
<p>Granted this is not using global namespaces for their intended purpose, but it seem to work pretty well.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/850/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/850/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/850/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=850&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/11/11/how-to-set-system-variables-on-a-marklogic-app-server/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>My Three Favorite New Features of MarkLogic 5</title>
		<link>http://xquerywebappdev.wordpress.com/2011/11/04/my-three-favorite-new-features-of-marklogic-5/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/11/04/my-three-favorite-new-features-of-marklogic-5/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 19:14:35 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[MarkLogic 5]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=838</guid>
		<description><![CDATA[There are three new features in MarkLogic 5 that I am especially excited to see: better binary content handling, configuration importing and exporting, and retrieving the original URL of the request before URL rewriting.  All of these save me development time and amount of code that I need to write. Better binary content handling MarkLogic [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=838&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There are three new features in MarkLogic 5 that I am especially excited to see: better binary content handling, configuration importing and exporting, and retrieving the original URL of the request before URL rewriting.  All of these save me development time and amount of code that I need to write.</p>
<h2>Better binary content handling</h2>
<p>MarkLogic has always been able to store binary files in the database, but if the files were too big or if you had too many files, your caches may have been adversely affected and your database merges may have taken longer than they needed to. In the past, when we had a lot of binary content that we wanted to serve off of a MarkLogic-powered website we would keep the binary files on the files system and just put the metadata file in the MarkLogic database. This worked fine, even streaming the files off the filesystem through MarkLogic, but we had to code the implementation and we always had to make sure the metadata files were in sync with the binary files on the filesystem. We don&#8217;t have to do this anymore with MarkLogic 5.</p>
<p>MarkLogic 5 introduces Rich Media Support which means that large binary files are handled differently than XML and text files under the covers in the server. There is a configurable threshold for the size of a binary file to be considered &#8220;large&#8221; as to be handled in a more efficient way. These large binary files are handled by MarkLogic as efficiently as if you saved them to the filesystem yourself. But you don&#8217;t need to use an special API or different functions that you would use for the XML and text files. You just insert the file using xdmp:document-insert() and MarkLogic will handle the rest.</p>
<h2>Configuration importing and exporting</h2>
<p>The Administration Interface on port 8001 provides a nice graphical, point-and-click interface for managing and configuring a MarkLogic installation. But for mature implementations, you&#8217;ll probably want a way to declare the settings for the servers, database, forests, etc. and script the configuration changes. There are several good implementations that do this outside of MarkLogic, but now you can just export the settings of an installation and get the full configuration settings in a XML file. You can import this XML file into a separate machine to stand up an installation with the exact same settings. You can also check in the configuration settings file into source control, make changes to it, and re-import the file back into the MarkLogic installation to affect those changes. As part of troubleshooting you can take a fresh export of the settings of an installation and compare those settings to the configuration settings file you had in source control to see if there were any inadvertent changes to the installation.</p>
<h2>Getting the original URL of the request</h2>
<p>This may seem to be a minor feature but one that can save me code and complexity. It&#8217;s always been possible to get the request URL from within XQuery code by calling xdmp:get-request-url(). But this returns the URL after the URL rewriter has rewritten the URL. What if you wanted to get the URL before the URL was rewritten? In previous versions of MarkLogic you&#8217;d have to get the request URL (by calling xdmp:get-request-URL()) in the URL rewriter itself and adding the original URL as a parameter to the rewritten URL. For example,</p>
<p><code>fn:concat("/new/url?orig-url=", xdmp:get-request-url())</code></p>
<p>Then in subsequent code you&#8217;d get the original URL by getting the request field, like <code>xdmp:get-request-field("orig-url")</code>. That works but it can be a pain if you forget to add the URL as a parameter, or you make in error in the code to retrieve it. But now in MarkLogic 5 you can just call <code>xdmp:get-original-url()</code> which will return the URL as it was before the URL rewriter changed it. Less code I have to write. Less complexity. Fewer bugs.</p>
<p>MarkLogic is fast in terms of performance but also in terms of development time. I spent ten years in the Java world and time-to-market was extremely important, and is still is now. I have never been able to implement mature, high-performance, enterprise solutions faster on any other platform than on MarkLogic. The new features of MarkLogic 5 that excite me the most are the ones that reduce that time-to-market for me even more. Most if not all of these features are the results of customers lobbying for them, and MarkLogic has listened. I have been vocal about binary content handling and now it&#8217;s part of the server. I&#8217;m looking forward to this new version so I can continue to push the boundaries of delivering solutions for my customers in less time and with less risk.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/838/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/838/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/838/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=838&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/11/04/my-three-favorite-new-features-of-marklogic-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>How about &#8220;variably-structured data&#8221;?</title>
		<link>http://xquerywebappdev.wordpress.com/2011/09/09/hierarchical-and-variably-structured-data/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/09/09/hierarchical-and-variably-structured-data/#comments</comments>
		<pubDate>Fri, 09 Sep 2011 16:56:58 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=835</guid>
		<description><![CDATA[&#8220;Unstructured&#8221; and &#8220;Semi-structured&#8221; never seem to be very good terms describing the data model used for XML and MarkLogic. And sometimes a term that people are unfamiliar with is better because then they don&#8217;t use pre-existing notions to deduce an incorrect understanding of the term. Often it&#8217;s better if you can to create a new [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=835&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>&#8220;Unstructured&#8221; and &#8220;Semi-structured&#8221; never seem to be very good terms describing the data model used for XML and MarkLogic. And sometimes a term that people are unfamiliar with is better because then they don&#8217;t use pre-existing notions to deduce an incorrect understanding of the term. Often it&#8217;s better if you can to create a new term with a new definition for them.</p>
<p>Similarly, database guys have pointed out to me that XML is still &#8220;relational&#8221; in that there are still relationships among the data. So maybe a better descriptor is &#8220;tabular data&#8221; when referring to databases that use tables, rows, and columns.</p>
<p>So I have been using different terms once in awhile that seem to get the points across better. I refer to Oracle, MySQL, and SQL Server as databases that store data in a &#8220;tabular&#8221; form. And I refer to MarkLogic as storing &#8220;variably-structured data&#8221; which immediately begs for a definition which I say:</p>
<blockquote><p>Variably structured data is data that doesn&#8217;t necessarily conform to a schema imposed on it by the database, although the data may have internal structure and that internal structure may be the same as other data in the system, or it may vary to a degree. The internal of the structure may vary over time as the data itself changes.</p></blockquote>
<p>It seems like there&#8217;s really nothing &#8220;semi&#8221; structured about XML at all. It&#8217;s structured, but the data itself gets to define its own structure. That&#8217;s key point #1: the system doesn&#8217;t impose structure, rather the data does it on itself. Key point #2: that structure can vary across documents, and within a document over time. Hence, the structure is variable.</p>
<p>Maybe I&#8217;m going against some long-standing academic history here but I don&#8217;t like &#8220;semistructured&#8221; and I want to stop using it. What do you all think?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/835/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/835/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/835/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=835&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/09/09/hierarchical-and-variably-structured-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>Technical details on techniques used in the development competition</title>
		<link>http://xquerywebappdev.wordpress.com/2011/08/23/technical-details-on-techniques-used-in-the-development-competition/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/08/23/technical-details-on-techniques-used-in-the-development-competition/#comments</comments>
		<pubDate>Tue, 23 Aug 2011 20:32:53 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=830</guid>
		<description><![CDATA[The post on winning the development competition has generated a lot of interest (relative to my other posts) so I&#8217;d thought I&#8217;d go into more detail about the technology we used. First, there are some guiding principles that I follow when it comes to xquery web apps on MarkLogic. They are not set in stone, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=830&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The <a title="MarkLogic decidedly wins development competition" href="http://xquerywebappdev.wordpress.com/2011/08/20/marklogic-decidedly-wins-development-competition/">post on winning the development competition</a> has generated a lot of interest (relative to my other posts) so I&#8217;d thought I&#8217;d go into more detail about the technology we used.</p>
<p>First, there are some guiding principles that I follow when it comes to xquery web apps on MarkLogic. They are not set in stone, and individual projects may deviate. But these are my starting points:</p>
<ol>
<li><strong>Embrace the functional programming paradigm and the XML data model.</strong> Don&#8217;t fight it. Don&#8217;t try to turn XQuery into Java, or XML into an object model. Don&#8217;t try to reproduce hibernate, spring, jsp, php, rails, or any other object-oriented framework in MarkLogic. Let the server do its thing. You bought the server, so use it for what it is. Otherwise, write your own server.</li>
<li><strong>Use a conceptually simple data model.</strong> Don&#8217;t start off with creating a data model structured with how it will be queried. Think of the <a title="Avoid data coupling as well as code coupling" href="http://xquerywebappdev.wordpress.com/2011/08/20/avoid-data-coupling-as-well-as-code-coupling/">entities </a>that you will need and keep the number of them low. Have each type of entity have minimal references to other entities and keep them coarse. Keep it simple.</li>
<li><strong>Make the application data- and configuration-driven for flexibility.</strong> Write the application so that it works according to the data it has. Facets are an example of this in search. The facets that are displayed are based on what came back in the results. You don&#8217;t have to specify up front what the individual facets will be. The server will figure that out at runtime. Build navigation and other options off of values in the DB. Write modular code that is discoverable at runtime. This is an advanced topic that can&#8217;t be sufficiently covered here, but the Masquerade approach (discussed below) is example of this.</li>
<li><strong>Leave the HTML files intact.</strong> A rich interface requires plenty of HTML work so leave it alone. Don&#8217;t shred the HTML into XQuery. Don&#8217;t make the HTML only work on through a server. Let the front-end development take place in a static context, and then make replacements as needed dynamically at runtime. When the HTML needs to be modifued (which is constantly) the front-end dev can easily do it because the file didn&#8217;t have to change to be served on the server. The Nonob approach discussed below is an example of this.</li>
<li><strong>Let collections and permissions restrict the data access for the user.</strong> Don&#8217;t write your own access restriction. Write code so that it just operates on the correct data based on the user&#8217;s permissions and the current collection in use.</li>
<li><strong>Code to &#8220;bare metal&#8221; as much as possible.</strong> Use the built-in APIs if they can do what you need, even if you have to compose them together a little. Avoid custom functions just to &#8220;keep the code clean&#8221;. Avoid creating &#8220;gateway&#8221; functions, that is, functions that get the data the &#8220;right&#8221; way for you. Let the code get the data it needs, when and how it needs it. Not only does this reduce the amount of code you write, it allows you do surgically get just the data you want using predicates you want. It also reduces bug risk because the built-in APIs are going to be much more likely to be bug-free than something custom you wrote.</li>
<li><strong>Use static javascript and CSS assets.</strong> MarkLogic is a powerful markup machine so use it to handle all the markup, and let the js and CSS be static and cacheable. This way most requests to the server are just for new markup that will have very little jas or CSS in it.</li>
</ol>
<p>Ok, so how exactly do you do this? Below are some approaches, patterns, libraries, and techniques that my team used on the application:</p>
<ol>
<li><strong>Non-obtrusive HTML replacement (<a title="Non-obtrusive HTML Replacing (non-ob)" href="http://xquerywebappdev.wordpress.com/non-obtrusive-html-replacing-non-ob/">nonob</a>).</strong>In this approach, the front-end dev creates the static HTML with whatever js and CSS assets he wants, using dummy data in the HTML, so that it&#8217;s fully mature right off of the filesystem. He then checks the HTML and assets into source control and the XQuery dev then finds which parts on the HTML page need to be dynamically replaced and writes replacement rules in an XQuery file. The XQuery dev checks in his code and leaves the HTML alone. The MarkLogic server executes the XQuery file, pulling in the HTML, makes the replacements using the the rules in the file and pulling in whatever data it needs, and returns the resulting HTML to the browser. The assets are also in MarkLogic so those requests just simply return the requested js and CSS files that the front-end dev put it. This is probably the single biggest time saver in development. There are several reasons why:
<ul>
<li><strong>Front-end development and server development can happen in parallel.</strong> Given a design, both the front-end dev and XQuery dev can determine what needs to be done and they can bot start working on their areas. It&#8217;s common for the front-end dev to check in HTML, which then the XQuery dev refers to for replacements, even while the front-end dev keeps making changes to that HTML.</li>
<li><strong>Changes can be made at any time by front-end or XQuery devs.</strong> At any point, if either front-end devs or XQuery devs need to change code or HTML, they can do it without having to consult with each other. There are limits, but this is true for the most part.</li>
<li><strong>XQuery code only needs to focus on making the replacements, not building the page.</strong> The only XQuery code to write for HTML pages is getting the necessary data to make the replacements. Let the front-end guy deal with doing all the HTML.</li>
<li><strong>The same XQuery code can do replacements on different HTML files.</strong> You can have one HTML page for desktop, one for tablet, one for mobile, one for IE6, whatever, and the same XQuery code can be used to do the replacements for any of the files. You can also have multiple views (Large image, no image, banner, etc) of a page and the same XQuery code is used. It&#8217;s just that the input HTML page changes.</li>
</ul>
</li>
<li><strong>Masquerade.</strong>This is a technique that abstracts the addressing of a resource from the code that will service that resource. Friendly URLs are a common example of this where there are rules or configuration that map URLs to pages. Masquerade takes it a few steps farther by querying the DB for code that has indicated that it responds to that URL and the appropriate code is used to generate the HTML based on the session or user agent. Here is how it works:
<ul>
<li>Request comes into the rewriter in MarkLogic</li>
<li>The rewriter queries the DB (particular collection, and using the permission set of the user) to find a &#8220;masq&#8221; that has an &#8220;url&#8221; element that matches the URL path. An example XML file for a masq may look like this:
<pre>&lt;masq key="aboutus"&gt;
    &lt;url type="preferred"&gt;/about-us&lt;/url&gt;
    &lt;url&gt;/aboutus&lt;/url&gt;
    &lt;url&gt;/about-us.html&lt;/url&gt;
    &lt;view type="mobile" xml:lang="en"&gt;
         &lt;html&gt;about-us.html&lt;/html&gt;
        &lt;code&gt;about-us.xqy&lt;/code&gt;
    &lt;/view&gt;
    &lt;view type="desktop" xml:lang="en"&gt;
          &lt;html&gt;about-us-mobile.html&lt;/html&gt;
         &lt;code&gt;about-us.xqy&lt;/code&gt;
     &lt;/view&gt;
&lt;/masq&gt;</pre>
</li>
<li>The rewriter picks the best &#8220;view&#8221; of the masq that fits the client.</li>
<li>The rewriter calls the code that the view specifies and passes the starting HTML that the view specifies</li>
<li>The code uses nonob to make replacements on the HTML and returns the HTML back to the browser.</li>
<li>Making a new page and new page rule involves writing the HTML, the code, and creating a masq XML file. No changes to the rewriter are needed because masqs are queried at runtime</li>
<li>Pages that are off-limits to the certain users can be protected by setting the permissions on the masq XML file so that the query for masqs would not find the restricted ones for that user. You can also set the permissions on the xqy file.</li>
</ul>
</li>
<li><strong>Nonob i18n.</strong>In order to facilitate the page being in multiple languages, the nonob i18n library does the following:
<ul>
<li>In the static HTML that the front-end developer writes, custom HTML attributes are added to all the part of the page that need to be i18n-ized. These custom attributes do not interfere with any HTML coding tools and browsers ignore attributes they don&#8217;t understand, but the attributes are removed before being sent to the browser at part of the nonob replacement.</li>
<li>Here&#8217;s an example of i18n attribute on a div. The value of the i18n attribute is the key to lookup the translation in a resource file.
<pre>&lt;div id="mydiv" i18n="myapp.welcome-page.title"&gt;Welcome to my site!&lt;/div&gt;</pre>
</li>
<li>Here&#8217;s an example where you want to i18n-ize an input field:
<pre>&lt;input type="textbox" i18n="pages.register.name.placeholder"
i18n-target="value" value="Name"/&gt;</pre>
</li>
<li>The resource files are stored by language. And example of one might be:
<pre>&lt;resource-bundle xml:lang="spa"&gt;
    &lt;resource key="pages.register.name.placeholder"&gt;Nombre&lt;/resource&gt;
    &lt;resource key="pages.register.submit"&gt;Submito&lt;/resource&gt;
    ...
&lt;/resource&gt;</pre>
</li>
<li>The resource to use is determined at runtime using the user&#8217;s language preferences sent in the HTTP headers (or previously set settings) and the resource bundle language available. So if the browser says, &#8220;ru,es,en&#8221; then the framework will pick the most appropriate language to use, which in our example is Spanish (es), because there is no Russian version.</li>
<li>After nonob has made the replacements (which is does in one pass through the HTML), and then i18n-nonob does it&#8217;s replacements in one pass, where it finds all the i18n attributes, gets the values from the resource files for the given language, makes the replacements, and returns the HTML to the browser.</li>
<li>This requires one line of code to use the library, and the only effort is to add the i18n attributes to the static HTML and create the resource files. Note that the page can dynamically set i18n attributes if it wants in the nonob step because the i18n-nonb step is just going to parse the whole HTML again after nonob is done.</li>
<li>Nonob + i18n nonob takes about 50 milliseconds usually</li>
<li>Nonob + masquerade + i18n-nonob provides a lot of flexibility and permutations with minimal coding.</li>
</ul>
</li>
</ol>
<p>Those are the more important parts that contributed the most to rapid development. We also already had a configure and deployment tool, and an authentication library so we didn&#8217;t have to spend much time with those. The <a title="New Book Coming Out: “MarkLogic Server 4.2: XQuery Application Development”" href="http://xquerywebappdev.wordpress.com/2011/02/28/new-book-coming-out-marklogic-server-4-2-xquery-application-development/">book </a>I am writing goes into better detail about how to do some of these things.</p>
<p>Enjoy</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/830/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/830/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/830/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=830&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/08/23/technical-details-on-techniques-used-in-the-development-competition/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>Avoid data coupling as well as code coupling</title>
		<link>http://xquerywebappdev.wordpress.com/2011/08/20/avoid-data-coupling-as-well-as-code-coupling/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/08/20/avoid-data-coupling-as-well-as-code-coupling/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 18:39:30 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>
		<category><![CDATA[Tips n' Tricks]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=824</guid>
		<description><![CDATA[We talk a lot about reducing code coupling as a way to reduce the interdependency in the code so that changes to one piece of code can be made more safely. Implementations can be refactored, code can be modified, and new features can be added more easily if the code you are changing only has [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=824&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We talk a lot about reducing code coupling as a way to reduce the interdependency in the code so that changes to one piece of code can be made more safely. Implementations can be refactored, code can be modified, and new features can be added more easily if the code you are changing only has 5 interface points with other code, rather than 50.</p>
<p>Isn&#8217;t it the same with data? When you define a table in a relational database, you are defining what data parts an entity can have. And when you put data in that table, you are mixing that entity&#8217;s data with all the other entities that have data in that table. You are essentially coupling the data among all the entities in that table. Now you can&#8217;t separate the entities very easily and you can&#8217;t change the data structure or types for an entity without affecting all the other entities.</p>
<p>In a document-centric model, if each entity has it&#8217;s own document you can insert, delete and modify them with no effect on other entities. You can also create new types of entities, and even convert some old entities into the new type, without affecting other entities.</p>
<p>But even if you are using XML you still may gravitate to data coupling if the data gets sharded too much. I use a rule of thumb that an entity is something that has it&#8217;s own lifecycle. It can reference other entities (like a foreign key) but those are references to other entities, not sharded data parts of the same entity.</p>
<p>For example, a Person entity may have a &#8220;name&#8221; element which does not exist independently and does not have its own lifecycle. So &#8220;name&#8221; is an element of a Person. But that Person may be in a Company and that Company does exist independently from any Person and it has it&#8217;s own lifecycle, so Company should be a separate entity and each Person entity would have an id reference to the Company that it belongs to.</p>
<p>Now what happens if a Company entity gets deleted when a Person references it? Then you have an ID to nothing, but the Person still exists, still can be viewed, modified, and potentially reassigned, but the Person entity is still complete. And when a Person referencing some Company is deleted, the Company doesn&#8217;t know, doesn&#8217;t care.</p>
<p>I typically expect that the number of data models for entities in a MarkLogic web application should be able to be counted on one hand. And usually starting with what makes conceptual sense is the best approach: what &#8220;entities&#8221; do there seem to be that have their own lifecycles and how do they relate to each other?</p>
<p>This reduces data coupling so changes can be made easily and with low risk. Entities can be inserted and deleted without affecting other entities. It also tends to foster queries that are simple and &#8220;fully searchable&#8221; so that they utilize the indexes appropriately for maximum performance and minimal disk reads.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/824/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/824/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/824/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=824&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/08/20/avoid-data-coupling-as-well-as-code-coupling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>MarkLogic decidedly wins development competition</title>
		<link>http://xquerywebappdev.wordpress.com/2011/08/20/marklogic-decidedly-wins-development-competition/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/08/20/marklogic-decidedly-wins-development-competition/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 16:32:42 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=812</guid>
		<description><![CDATA[The challenge A few weeks ago I got called out of my coding lair and was brought into a meeting with management where they informed me and 5 other people that there was going to be a friendly internal competition. A web application for tracking portfolios, initiatives, projects, and deliverables had been thoroughly designed for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=812&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h3>The challenge</h3>
<p>A few weeks ago I got called out of my coding lair and was brought into a meeting with management where they informed me and 5 other people that there was going to be a friendly internal competition. A web application for tracking portfolios, initiatives, projects, and deliverables had been thoroughly designed for desktop\tablet and phone use. There would be two teams, each consisting of 1 QA and 2 developers. We were to stop our current work and devote our time and efforts on this, and in 1 week the two teams would reveal what they had done. Management would take care  of clearing our schedules. At the end of the week, the implementation that won would go on to being used for real by our group (about 120 people) and possibly other groups within the organization.</p>
<p>We were allowed to use any technology we wanted. The requirements (if we could meet them) were that the web application used our organization&#8217;s login accounts, was behind SSL, worked on a desktop and tablet (which had one design) and phone (which had a modified design). We were given high-fidelity mockups and access to the designer and the main user, and we were told to come back in a week and show how much we got done.</p>
<h3>The requirements</h3>
<p>I can&#8217;t post any screenshots because I don&#8217;t have permission, but I will describe the application a little. The main page of the application showed a timeline of all the projects for all the initiatives for a given portfolio. There may be 10 to 15 initiatives and each initiative may have 10 projects. Projects are 1 to 12 months and have usually a few deliverables. There are about 12 portfolios. The timeline on the main page has different colors for each deliverable indicating status, which can be modified via the main page if the user has permissions to do so.</p>
<p>There are three levels of access in the application: View, Edit, and Admin, and users are scoped to the particular portfolios that they should have any access to. Admin users can add other users and assign portfolios and roles. Users are added just by entering the user&#8217;s existing internal organization account username and the web app will import the user information from LDAP.</p>
<p>Editors can add and modify initiatives and projects, which each have their own screens. Initiatives have names, descriptions, year, and a few other fields. Projects have name, description, start and end dates, overall budget, dependencies on other portfolios, labor calculator (list of individuals and their bill rate, percentage engagement, number of days engagement, etc with dynamically calculated labor rate total), deliverables (with name, are delivery date) and a few other fields. Adding and removing labor rate line items and deliverables is in-browser, not a screen refresh, and the deliverable dates are scoped to be between the start and end dates of the project.</p>
<p>There is also a summary screen that is similar to the timeline page except it show all the initiatives and projects for a portfolio in sort of a spreadsheet view where you can collapse and expand initiatives to show or hide their projects. The budget totals are also shown for initiatives and projects.</p>
<p>That&#8217;s the gist of this application. It is non-trivial and had a very rich design and interaction. My team had an excellent QA, excellent front end dev, and me who was the only one who knew MarkLogic. The other team chose to implement theirs using a Javascript front-end architecture communicating with CouchDB (later Java with MongoDB) on the backend. The two teams involved very skilled people. If these two technology approaches were going to go head-to-head, these were the people to do it.</p>
<h3>The results</h3>
<p>So how did it turn out? We deployed our implementation to a server behind SSL hooked up to our auth scheme in my organization. We implemented all the requirements, and then some. We had three experiences fully implemented: desktop\tablet, phone, and dumbphone (meaning no CSS or javascript and limited functionality) which the server would pick for you based on detected device abilities. We had SOAP\XML and REST\JSON services. We had the entire application translated into 9 languages: English, Spanish, Russian, Greek, Japanese, Korean, Chinese, Arabic, and something else (I can&#8217;t remember).</p>
<p>We also ran the application through a series of speed, load, penetration, and injection tests. Average HTML page size was 1 or 2 KB. Average response time for HTML was 250 ms (assets were all cached after the first hit). With 1000 concurrent users, there was no perceived change in performance. There were no failures or warnings with session hijacking, cross-site scripting, malicious code injection, malicious character injection, or any other security test. This was using an old-ish former Apache server which was not even half-power for a MarkLogic recommended setup. One ED Node, no caching.</p>
<p>We did this all in one week.</p>
<p>The other team did not finish. They had a few pages that showed some data, but it was not deployed, didn&#8217;t have authentication, you couldn&#8217;t create or change data, no phone or other experiences, at least based on what they showed. No other languages than English either. No security, penetration, load, or injection tests. For whatever reason, that&#8217;s how things ended up.</p>
<h3>What does it prove?</h3>
<p>To me, this demonstrated hands down the speed at which you can implement non-trivial, fully featured, mature, enterprise-class, performant, and secure web applications very quickly, <em><strong>in only one week, with only three guys,</strong></em> sometimes.</p>
<p>So that application my team created is being used for real now, and it has had some bug fixes and other tweaks, especially with IE (did I mention it was cross-browser compatible?). Oh and did I mention the build and deployment time is less than 30 seconds?</p>
<p>It&#8217;s hard to argue with end results. People&#8217;s time is expensive, and the less time you need to spend on people creating frameworks, building stacks, adding in security and other enterprise stuff, the better. And with all the claims and arguments that get tossed around, all I really think is: scoreboard.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/812/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/812/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/812/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=812&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/08/20/marklogic-decidedly-wins-development-competition/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
		<item>
		<title>The 1 Millisecond Page</title>
		<link>http://xquerywebappdev.wordpress.com/2011/08/19/the-1-millisecond-page/</link>
		<comments>http://xquerywebappdev.wordpress.com/2011/08/19/the-1-millisecond-page/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 02:26:53 +0000</pubDate>
		<dc:creator>Ryan Semerau</dc:creator>
				<category><![CDATA[commentary]]></category>

		<guid isPermaLink="false">http://xquerywebappdev.wordpress.com/?p=806</guid>
		<description><![CDATA[I had an epiphany today while writing some code. I was writing a utility for web apps to query the WURFL dataset to determine if the remote client is on a mobile device or not. The WURFL dataset has all the manufactured mobile devices with their user agents and capability profiles. There are about 15,000 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=806&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I had an epiphany today while writing some code. I was writing a utility for web apps to query the <a href="http://wurfl.sourceforge.net/">WURFL</a> dataset to determine if the remote client is on a mobile device or not. The WURFL dataset has all the manufactured mobile devices with their user agents and capability profiles. There are about 15,000 devices and consequently about 15,000 device records. Not everyone may think it&#8217;s a good idea to query at runtime 15,000 records for every request, but I really wanted to make it work.</p>
<p>Everyone wants to get their query times low, and there&#8217;s a lot of satisfaction knocking time off and running tight and lean. After some tinkering, I got my query time for this mobile device detection down to 0 seconds. That&#8217;s Zero. It was a moment of Zen for me. Seeing the profiler say 0 seconds is just not the same as 0.001 seconds or anything else. In theory of course it took some time, but having the calculation come out to 0 for a very non-trivial feature got me thinking: how many other features can I get down to 0 seconds?</p>
<p>What if all the features of a web page took 0 seconds? Impossible? I have written several applications and the target server time for generating a page is around 250 milliseconds, or 1/4 of a second. This includes checking for authorization, querying the data, formatting the data, dynamically generating the HTML, and even translating it into a foreign language, all on the fly. So I know 250 milliseconds is possible because I&#8217;ve done it repeatedly. I also realize that going from 500 milliseconds to 250 milliseconds is easier than going from 250 milliseconds to 0 milliseconds.</p>
<p>But processors and RAM are getting faster, SSD drives may totally change the time profile of data retrieval, MarkLogic will itself get faster, as will web application code. So at some point, someone is going to write a web application where the server time to generate the HTML for it will take 1 millisecond. That will be faster than the TCP overhead of the computers trying to negotiate a connection to each other.</p>
<p>Gone are the days of pages taking 10 seconds to load or your visitor goes away. Now the expectation is that server-side HTML response is sub-second. For my stuff, I aim for 250 milliseconds, and that&#8217;s very doable, and I often seen times of 60 to 70 milliseconds (under load with 1000 concurrent users, on one box, checking auth for each request, etc. etc). But now I&#8217;m thinking that&#8217;s too long. I&#8217;m going to expect sub-100 millisecond pages for now, but I&#8217;m going to ultimately be looking to cross the 1 millisecond boundary, but a lot of that will come from hardware improvements.</p>
<p>Maybe this is like the sound barrier or the 4 minute mile, where it sounds crazy until it is actually done by someone, and then it&#8217;s the new standard. For dynamic, non-trivial web applications, I don&#8217;t think any technology would be more likely than MarkLogic to be able to pull it off. And I don&#8217;t think it&#8217;s really that far off in the future.</p>
<p>Happy coding.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/xquerywebappdev.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/xquerywebappdev.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/xquerywebappdev.wordpress.com/806/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xquerywebappdev.wordpress.com&amp;blog=13094960&amp;post=806&amp;subd=xquerywebappdev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://xquerywebappdev.wordpress.com/2011/08/19/the-1-millisecond-page/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/94a1fe92ae42a403272231e6a67a9d49?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">xquerywebappdev</media:title>
		</media:content>
	</item>
	</channel>
</rss>
