Moving to higher levels in the Capability Maturity Model for MarkLogic applications
The Capability Maturity Model
The Capability Maturity Model is a development model from Carnegie Mellon that describes five levels of maturity in the development processes of software applications. The first time I came across the CMM was when I worked for Andersen Consulting and they had posted the levels of the CMM in the restrooms (which I think was a very good idea, actually). At the time I was a novice programmer and was just concerned about getting my stuff to work. There were two levels in my world: works, and doesn’t work. I didn’t come to appreciate the CMM until many years later.
The fact that there even could be different levels of maturity might even cause some people to realize for the first time that they really need to up their game. I’ve found it helpful when my projects start feeling pain, I review the CMM and try to see where we really are.
These are the levels in the CMM (lifted straight from wikipedia):
Level 1 – Initial (Chaotic)
It is characteristic of processes at this level that they are (typically) undocumented and in a state of dynamic change, tending to be driven in an ad hoc, uncontrolled and reactive manner by users or events. This provides a chaotic or unstable environment for the processes.Level 2 – Repeatable
It is characteristic of processes at this level that some processes are repeatable, possibly with consistent results. Process discipline is unlikely to be rigorous, but where it exists it may help to ensure that existing processes are maintained during times of stress.Level 3 – Defined
It is characteristic of processes at this level that there are sets of defined and documented standard processes established and subject to some degree of improvement over time. These standard processes are in place (i.e., they are the AS-IS processes) and used to establish consistency of process performance across the organization.Level 4 – Managed
It is characteristic of processes at this level that, using process metrics, management can effectively control the AS-IS process (e.g., for software development ). In particular, management can identify ways to adjust and adapt the process to particular projects without measurable losses of quality or deviations from specifications. Process Capability is established from this level.Level 5 – Optimizing
It is a characteristic of processes at this level that the focus is on continually improving process performance through both incremental and innovative technological changes/improvements.
Of course different organizations and different projects may never want to be on a very high level because it can be costly and can be bureaucratic. If you are freewheeling, cutting-edge and don’t have much to lose, then you probably aren’t going to care much about these. But when the cost of failure or the cost of not increasing maturity levels exceeds the costs of increasing maturity levels, then you are going to want to take it that next level of maturity.
Your Technical Debt Is Increasing
Technical Debt almost always grows as a product increases in lifetime and complexity. Ward Cunningham wrote (again lifted straight from Wikipedia):
Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.
My experiences has been that technical debt is allowed to grow and reach a crisis point because the development process is at a low level in the Capability Maturity Model. Note that initially it’s ok to start off with technical debt and you are in that Level 1 “Chaos” level precisely because things have not yet been defined, you are in an exploratory mode, and what you are developing is new. The problem comes when you don’t switch gears properly and use a mature model to reduce technical debt as your product grows. Remember that for every new feature or improvement you must make sure that you do not break them in each subsequent release.
CMM for MarkLogic applications
I am not referring to the MarkLogic Server itself, but rather to implementations built using the server. So you as a customer of MarkLogic have decided to use MarkLogic Server to solve solve business need in your organization. Although the MarkLogic Server may provide you with technology to do things you’ve never been able to do before, it will not fix any broken or immature processes you have in your organization to develop or maintain those custom applications you’ve written. And while MarkLogic is becoming a trendsetter and is a disruptive technology, not everyone is too pleased with the success of MarkLogic and would love to see failed efforts using this technology. So don’t let your immature processes be mistaken for faults of the technology.
Once you put your business on the line, or people’s lives on the line, or your reputation on the line, or your money on the line because you chose to power your application on MarkLogic, you are going to need to be at a higher maturity level than you were when you first developed the application. MarkLogic seems to target fewer big-ticket clients so I’m guessing that the majority of MarkLogic customers have a lot on the line on their MarkLogic implementations. And rather than spending a lot of effort in improving the technical skills of your developers in writing whizbang code, I would offer that it would be more valuable to invest in having your developers develop processes, techniques, and technologies to improve the maturity for being able to improve and maintain your investment.
Descriptions of Levels for MarkLogic Development
Below is what I would describe for each level in the CMM for MarkLogic development.
Level 1 – Initial (Chaotic)
This is usually where new development on MarkLogic starts out. One or a few developers at an organization are tasked with figuring out how to leverage the technology for some new effort or offering. Usually the developers know little and must asked the developer email distribution list questions, or use MarkLogic consultants. Whatever code first produces the desired results ends up as the production code.
What you need to know\be able to do at this level: Understand XQuery, XPath, xdmp functions. Be able to load your data. Be able to use the Admin Console on 8001 to configure MarkLogic so that your implementation works.
Level 2 – Repeatable
After some time, the developers and the organization have some experience and knowledge that they share or review with each other. The “better” practices are known and settled on. Developers are knowledgeable about the XQuery syntax, XPath syntax, the xdmp functions, and features of MarkLogic Server. Developers and sys admins are confident in making changes, writing code, and running the software for business critical solutions. I would guess that most of the solutions that are 6 months or older are in this category.
What you need to know\be able to do at this level: Be well versed in the MarkLogic API. Be able to create the proper indexes for your data and queries. Know how to use cts:search and search: search. Be able to create a proper data model for your data. Be able to write HTML interfaces for your implementation. Be able to use and understand the logs. Know how to not create deadlocks. Be comfortable with Functional Programming instead of Object Oriented Programming.
Level 3 – Defined
Knowledge about the technology is recorded, shared and maintained. Policies are established around deploying, backing up, and restoring the application and scripts are created to automate many of these activities. Developers identify “best practices” for code on MarkLogic. Code and implementation is robust and mature. Standard build and deployment practices are implemented. Automated tests are used at part of every deployment.
What you need to know\be able to do at this level: How to set up Enodes, Dnodes, and cache settings appropriately. Create a deployment mechanism so that the Admin Console does not need to be used. Set up automatic backups and be able to restore from them consistently. Be able to deploy to new boxes consistently. Create range indexes. Implement faceted searched. Use a knowledgebase for MarkLogic knowledge. Unit, functional, and end-user testing using XQuery.
Level 4 – Managed
Metrics are recorded by logging statements in the code. Sys admins have configured the implementations to alert them after specified thresholds are reached for various metrics. Developers are skilled at estimating work effort, risk levels, and impact of code changes. Management is skilled at estimating staffing needs for given business requirements that will be addressed via MarkLogic implementations. Security analysts are skilled at evaluating the risk levels of MarkLogic implementations and recommending actions to mitigate them. Releases are consistently and automatically deployed from lane to lane with little or no human intervention. Restores are predictable, automated, and successful.
What you need to know\be able to do at this level: Write meaningful logging statements at appropriate levels. Set up system alerting based on applications health (not just system or OS health). Know the functions that can potentially permit injected code to be evaluated. Know how to run the application under different users with differing roles and permissions. Use amps appropriately. Use code reviews to look for adherence to coding standards and to find faulty code.
Level 5 – Optimizing
System analysts review MarkLogic implementations using historical metrics to determine optimal use of resources. Quality Assurance engineers identify areas of performance improvement. High availability implementations are possible. Maintenance costs are low, and sys admins spend little time directly attending to the implementation. Developers spend little time on maintenance and most of their time on new features, new uses of the technology, and new techniques.
What you need to know\be able to do at this level: Train new developers to MarkLogic on best practices and coding standards. Create automation tools for testing, deploying, managing, and gathering metrics of MarkLogic implementations. Know how to use xdmp:plan, xdmp:estimate, and other profiling techniques to tune queries.
MarkLogic Server is a relatively new product, as is the community around it, which means that there are not already many of the maturity tools out there as there are for other platforms, like automated deployments, static code analysis tools, and robust IDEs. As the customers’ implementations mature, I would expect the tools in the community and from MarkLogic will mature also.
I’m interested to know what you all think about this. Do you feel like your development processes around MarkLogic development are mature enough? What have you done to create a mature process with MarkLogic?