XCACORE - XML Interface to the CaTISSUE Core APIs
This interface has been developed to communicate with the NCI CaBIG CaTissue? Core application, and the same will likely work (with only minor edits) for all CaBIG APIs.
XCACORE has been designed and written by Gunther Schadow under the CaTISSUE adoption program.
Copyright (c) 2006, 2007, Regenstrief Institute, Inc. All rights reserved
WHY DO I WANT TO USE THIS?
If you want to move data in and out of the CaTISSUE Core system programmatically, XCACORE is the easiest way to do that. It does not require you to turn into a little ant chasing along the data graph leaving behind you a spaghetti-code trail of API calls, no, with XCACORE you simply transform your data into the structure defined by the CaTISSUE UML model and let a very small XSLT script take care of the API calls.
SO WHAT CAN I DO WITH IT?
At this time you can
- Create objects from the XML data,
- Retrieve XML data for existing objects,
PREREQUISITES
You need
- ant
- saxon XSLT processor.
For various reasons I am stuck to saxon-8.1.1 release, but it should in principle work with subsequent releases.
HOW DO I GET STARTED?
First you want to download the system. We do not make a zip or tarball release, but you can get this with anonymous subversion.
svn co http://aurora.regenstrief.org/svn/xcacore
If you just want to have a peak in the code, you can [source:trunk browse here].
Check out the file
etc/example.xml
This file will show you many of the things you can do. You can run this example using the ant build.xml file included here. Just say
ant -Dusername=... -Dpassword=... example
which will then upload some test data. Be advised that this will load dummy data into your database, so you may want to do it on a test instance first or figure out a way to clean up afterwards (I don't know of an easy way to clean up test data.)
The example should be pretty self documenting, but I'll explain a few things. The principle is that you write XML data accoriding to the CaTISSUE Core UML model (or any other UML model of an application that has a similar API, such as most caBIG applications.) Your XML tag names and structure are determined by that API and its data structure and are not our responsibility here. Whatever that structure, XCACORE adds to it very few XML elements ("tags") and some XML attributes in a special namespace, and then can interpret your XML to communicate with the CaBIG service through its client API.
RULES OF THE XML REPRESENTATION
All UML properties (attributes and association-ends) are represented by XML elements. The rules are exactly like the API (and indeed depend on the API names.) For example, in an element of class User the "emailAddress" stands for the property of the same name. In the Java API these properties are reflected by accessor methods "getEmailAddress" and "setEmailAddress".
All XML elements are interpreted as such object properties.
Literal data is in XML
- text nodes or, e.g. <firstName>John</firstName>
- @value attribute, e.g. <firstName value="John">
these are equivalent forms. The type of this data can be specified with the @q:as attribute (see below). Nothing else needs to be specified for String data or any data convertible to Java data by saxon.
XCACORE Elements and Attributes
The namespace "http://regenstrief.org/XCACORE" is for controling the interpretation and actions of the API.
- q:session - the toplevel element that opens a remote API session
- @q:class - specifies the class of object which you are describing
- this is the only way to say what class of object you have or want
- no other magic inference is done, especially
- the XML element NEVER tells the name of the class
- @q:as - specifies the type of literal data
- date - uses a date format to convert to a java.util.Date
- string - saxon will usually get that right
- number - saxon will usually get that right
- @q:action - tells what you want to do with the data:
- create - create the object
- search - search for an object like the one you're specifying
- qbe - a query by example, like search but more powerful, see @q:op
- update - update an object
You can nest actions. For instance, you can use q:action="search" to retrieve a User object to put as prinicpalInvestigator on a CollectionProtocol?.
- @q:collection - specifies the kind of collection
- set - a java.util.Set
- list - a java.util.List
- bag - a bag (currently also java.util.List)
- anything else is interpreted as a fully qualified name of a java Collection class.
- @q:op - allows specifying a comparison operator for a property when writing queries by example:
- eq - equals
- le - less or equal
- lt - less than
- gt - greater than
- ge - greater or equal
- ..Property - compares with other property (rather than constant)
- like - use SQL LIKE operator
- ilike - case insensitive LIKE (check with the Hibernate Restrictions class for details on these comparison operations.)
A special operator is @q:op="join" which can be used to restrict a query with another related object. For example, one can find a collection protocol event by collection protocol name and studyCalendarEventPoint as in:
<collectionProtocolEvent q:action="qbe" q:class="CollectionProtocolEvent">
<studyCalendarEventPoint q:as="number" value="1"/>
<collectionProtocol q:op="join">
<shortTitle>NormalSerPlasPeds</shortTitle>
</collectionProtocol>
</collectionProtocolEvent>
- q:item - name of the elements inside a collection.
- @q:backLink - can be specified on an element instead of @q:class and @q:action, and will result in the 'parent' object. This is used to make bidirectional connections between two objects. The parent object is the object created in the parent XML element, but skipping a collection element. In the following example we create a specimen with an event collection, where the included events also refer back to the specimen so created.
<specimen q:action="create" q:class="FluidSpecimen">
<specimenEventCollection q:collection="set">
<q:item q:class="ReceivedEventParameters">
<q:specimen q:backLink="."/>
...
</q:item>
</specimenEventCollection>
</specimen>
Presently the value of the q:backLink attribute makes no difference, it is recommended to use a single period ".".
- q:followUp - an element that can be nested as the final child of another element to perform actions with that element after the action on it has been performed. This can be used for example, to create a specimen, add events, add aliquots, and then add a discard event:
<specimen q:action="create" q:class="FluidSpecimen">
...
<q:followUp>
<event q:action="create" q:class="FrozenEventParameters" .../>
<child q:action="create" q:class="FluidSpecimen" .../>
<child q:action="create" q:class="FluidSpecimen" .../>
...
<child q:action="create" q:class="FluidSpecimen" .../>
<event q:action="create" q:class="DiscardEventParameters" .../>
</q:followUp>
</specimen>
nested inside a q:followUp element one can refer to the object that is the result of the action of the parent element using the @q:context attribute.
- @q:context - similar to @q:backLink, but refers to the object that contains q:followUp elements. The @q:context can be used in any nesting depth under a q:followUp element, and always refers to the object represented by the parent of the nearest q:followUp ancestor. In the following example we create a specimen, commit it to the system and then add events:
<specimen q:action="create" q:class="FluidSpecimen">
...
<q:followUp>
<event q:action="create" q:class="FrozenEventParameters">
<q:specimen q:context="."/>
...
</event>
...
</q:followUp>
</specimen>
The value of the q:context attribute must be "." to refer to that context object, but it can also be the name of any of that context object's properties to refer to that property. For example, we can create a specimen, commit it, then create a derived specimen and link it both to the original specimen and the specimen collection group:
<specimen q:action="create" q:class="FluidSpecimen">
...
<q:followUp>
<child q:action="create" q:class="FrozenEventParameters">
<q:parentSpecimen q:context="."/>
<q:specimenCollectionGroup q:context="specimenCollectionGroup"/>
...
</child>
</q:followUp>
</specimen>
- @q:random - for testing, adds a 'random' number to the end of the value text so marked, i.e., the text in the q:random attribute will be earched in the text provided for the element value and replaced with that random number. This number is unique accross runs but the same every time inside a run. This allows testing with fields that have unique constraints. Example:
<participant q:action="create" q:class="Participant">
<firstName q:random="#">John#</firstName>
<lastName q:random="ething">Doe Something</lastName>
...
</participant>
Would create a participant with first name "John 32484882489433" and last name "Doe Som32484882489433" with the same random number 32484882489433 used everywhere in the run.
See the example.xml file for examples. Contemplate it. Most things are obvious and running some experiments will give you the feel for it.
ARE WE DONE?
Not all necessary functions of the API are supported yet. As we will use this to populate CaTISSUE Core from other databases and spreadsheets etc., we will discover more things to do and eventually get them done. However, we have used this function set for loading significant amount of data into CaTISSUE Core and believe we have reached a steady state of features.
Notably one of our use cases is to export collection protocol data, enhance this data (in XML) and use that to drive automatic data collection using barcode scanners. In the process of this we will likely hit on every possible way the API can be used through XML.
An enhancement planned soon is to make @q:context and @q:backLink values behave in the same way, i.e., "." refering to the context or parent object and any other value referring to a property.
A known issue has to do with the output format. We serialize Java objects in XML but we have only a crude way of limiting how far we branch out in this serialization. Presently a maxdepth parameter is given to this XSLT transform to limit the distance from the object to be serialized. But this leads to incomplete data (data that looks like it is null, when in fact we simply chose not to output any more data at that distance from the main object.
IS THIS STABLE CODE?
The power of XML and XSLT is that changes in structure are extremely easy to accomodate. Since I am not done developing this, and as long as I develop this, I will seek ways to make it ever simpler and more intuitive. My goal is to drive out most "computerisms", specifically the q:session element will probably be turned into an attribute. Also will tinker with the way results are populated.
HOW DO I GET MY DATA INTO THIS FORMAT?
I have companion projects, notably db2xml, which can be used to easily format data on relational databases into XML data. This will be added here once we begin to use it for our own migration. The last thing you want to do is use some DOM or other API to the XML data, because that degrades you into a little creeping bug again. I typically go a two- step process:
- use SQL (views) to bring the data into a structure similar to the XML data.
- then use db2xml to export the data into XML.
Any value translations are best done on the database in SQL, but can be done using transforms too.
Sometimes it is useful to put additional XSLT transforms behind it or, if you do not like SQL (in which case you should reconsider your attitude towards SQL), you could dump the data in its original structure to XML and then use XSLT to transform.
MOTIVATION AND PHILOSOPHY
My motivation is that I hate crawling in Java beans, calling setters and getters, navigate in APIs. That all leads to spaghetti code, and I feel like a little bug that creeps through a maze of pipelines. Conversely, if I look at XML data, it is all clear like I'm a grown up human who explores the world on a map with one finger on Turkmenistan and another finger in Chile. This is much more conducive for productive work with data.
My philosophy in XML design is radical but extremely powerful and has been tested time and again in all kinds of applications. I have made similar interfaces to anything, especially generic relational database access (db2xml), XML interface to GALEN/Grail, my cellphone calendar, HL7, ICDO, Multum, RxNorm?, SNOMED, and I am always thrilled by the fact that 150 lines of good XSLT code can accomplish the most complex of tasks and do so in a way that I can understand the design even many months after I wrote it.
What's radical about my use of XML is that I never use XML Schemas and I never use DOMs and I always use XSLT (sometimes Java/SAX directly for very big projects.) So, don't ask me for a schema. The point of XCACORE is that it can be used with any UML model. The UML model determines the XML schema in a pretty straight-forward manner, and the XCACORE functions can be added to any such schema.
