= XCACORE - XML Interface to the CaTISSUE Core APIs =
This interface has been developed to communicate with the NCI CaBIG
CaTissue Core application, and the same will likely work (with only
minor edits) for all CaBIG APIs.
XCACORE has been designed and written by Gunther Schadow under the
CaTISSUE adoption program.
Copyright (c) 2006, 2007, Regenstrief Institute, Inc. All rights reserved
[[PageOutline]]
== WHY DO I WANT TO USE THIS? ==
If you want to move data in and out of the CaTISSUE Core system
programmatically, XCACORE is the easiest way to do that. It does not
require you to turn into a little ant chasing along the data graph
leaving behind you a spaghetti-code trail of API calls, no, with
XCACORE you simply transform your data into the structure defined by
the CaTISSUE UML model and let a very small XSLT script take care of
the API calls.
== SO WHAT CAN I DO WITH IT? ==
At this time you can
* Create objects from the XML data,
* Retrieve XML data for existing objects,
== PREREQUISITES ==
You need
* ant
* saxon XSLT processor.
For various reasons I am stuck to saxon-8.1.1 release, but it should
in principle work with subsequent releases.
== HOW DO I GET STARTED? ==
First you want to download the system. We do not make a zip or tarball release, but you can get this with anonymous subversion.
{{{
svn co http://aurora.regenstrief.org/svn/xcacore
}}}
If you just want to have a peak in the code, you can [source:trunk browse here].
Check out the file
{{{
etc/example.xml
}}}
This file will show you many of the things you can do. You can run
this example using the ant build.xml file included here. Just say
{{{
ant -Dusername=... -Dpassword=... example
}}}
which will then upload some test data. Be advised that this will load
dummy data into your database, so you may want to do it on a test
instance first or figure out a way to clean up afterwards (I don't
know of an easy way to clean up test data.)
The example should be pretty self documenting, but I'll explain a few
things. The principle is that you write XML data accoriding to the
CaTISSUE Core UML model (or any other UML model of an application that
has a similar API, such as most caBIG applications.) Your XML tag
names and structure are determined by that API and its data structure
and are not our responsibility here. Whatever that structure, XCACORE
adds to it very few XML elements ("tags") and some XML attributes in
a special namespace, and then can interpret your XML to communicate
with the CaBIG service through its client API.
== RULES OF THE XML REPRESENTATION ==
All UML properties (attributes and association-ends) are represented
by XML elements. The rules are exactly like the API (and indeed depend
on the API names.) For example, in an element of class User the
"emailAddress" stands for the property of the same name. In the Java
API these properties are reflected by accessor methods
"getEmailAddress" and "setEmailAddress".
All XML elements are interpreted as such object properties.
Literal data is in XML
* text nodes or, e.g. John
* @value attribute, e.g.
these are equivalent forms. The type of this data can be specified
with the @q:as attribute (see below). Nothing else needs to be
specified for String data or any data convertible to Java data by
saxon.
== XCACORE Elements and Attributes ==
The namespace "http://regenstrief.org/XCACORE" is for controling the
interpretation and actions of the API.
* q:session - the toplevel element that opens a remote API session
* @q:class - specifies the class of object which you are describing
* this is the only way to say what class of object you have or want
* no other magic inference is done, especially
* the XML element NEVER tells the name of the class
* @q:as - specifies the type of literal data
* date - uses a date format to convert to a java.util.Date
* string - saxon will usually get that right
* number - saxon will usually get that right
* @q:action - tells what you want to do with the data:
* create - create the object
* search - search for an object like the one you're specifying
* qbe - a query by example, like search but more powerful, see @q:op
* update - update an object
You can nest actions. For instance, you can use q:action="search" to
retrieve a User object to put as prinicpalInvestigator on a
CollectionProtocol.
* @q:collection - specifies the kind of collection
* set - a java.util.Set
* list - a java.util.List
* bag - a bag (currently also java.util.List)
* anything else is interpreted as a fully qualified name of a java Collection class.
* @q:op - allows specifying a comparison operator for a property when writing queries by example:
* eq - equals
* le - less or equal
* lt - less than
* gt - greater than
* ge - greater or equal
* ..Property - compares with other property (rather than constant)
* like - use SQL LIKE operator
* ilike - case insensitive LIKE
(check with the Hibernate Restrictions class for details on these comparison operations.)
A special operator is @q:op="join" which can be used to restrict
a query with another related object. For example, one can find
a collection protocol event by collection protocol name and
studyCalendarEventPoint as in:
{{{
NormalSerPlasPeds
}}}
* q:item - name of the elements inside a collection.
* @q:backLink - can be specified on an element instead of @q:class
and @q:action, and will result in the 'parent' object. This is used
to make bidirectional connections between two objects. The parent
object is the object created in the parent XML element, but
skipping a collection element. In the following example we create
a specimen with an event collection, where the included events also
refer back to the specimen so created.
{{{
...
}}}
Presently the value of the q:backLink attribute makes no
difference, it is recommended to use a single period ".".
* q:followUp - an element that can be nested as the final child
of another element to perform actions with that element after the
action on it has been performed. This can be used for example, to
create a specimen, add events, add aliquots, and then add a
discard event:
{{{
...
...
}}}
nested inside a q:followUp element one can refer to the object
that is the result of the action of the parent element using the
@q:context attribute.
* @q:context - similar to @q:backLink, but refers to the object that
contains q:followUp elements. The @q:context can be used in any
nesting depth under a q:followUp element, and always refers to the
object represented by the parent of the nearest q:followUp
ancestor. In the following example we create a specimen, commit
it to the system and then add events:
{{{
...
...
...
}}}
The value of the q:context attribute must be "." to refer to that
context object, but it can also be the name of any of that context
object's properties to refer to that property. For example, we can
create a specimen, commit it, then create a derived specimen and
link it both to the original specimen and the specimen collection
group:
{{{
...
...
}}}
* @q:random - for testing, adds a 'random' number to the end of the
value text so marked, i.e., the text in the q:random attribute will
be earched in the text provided for the element value and replaced
with that random number. This number is unique accross runs but the
same every time inside a run. This allows testing with fields that
have unique constraints. Example:
{{{
John#
Doe Something
...
}}}
Would create a participant with first name "John 32484882489433"
and last name "Doe Som32484882489433" with the same random number
32484882489433 used everywhere in the run.
See the example.xml file for examples. Contemplate it. Most things are
obvious and running some experiments will give you the feel for it.
== ARE WE DONE? ==
Not all necessary functions of the API are supported yet. As we will
use this to populate CaTISSUE Core from other databases and
spreadsheets etc., we will discover more things to do and eventually
get them done. However, we have used this function set for loading
significant amount of data into CaTISSUE Core and believe we have
reached a steady state of features.
Notably one of our use cases is to export collection protocol data,
enhance this data (in XML) and use that to drive automatic data
collection using barcode scanners. In the process of this we will
likely hit on every possible way the API can be used through XML.
An enhancement planned soon is to make @q:context and @q:backLink
values behave in the same way, i.e., "." refering to the context
or parent object and any other value referring to a property.
A known issue has to do with the output format. We serialize Java
objects in XML but we have only a crude way of limiting how far
we branch out in this serialization. Presently a maxdepth parameter
is given to this XSLT transform to limit the distance from the
object to be serialized. But this leads to incomplete data (data
that looks like it is null, when in fact we simply chose not to
output any more data at that distance from the main object.
== IS THIS STABLE CODE? ==
The power of XML and XSLT is that changes in structure are extremely
easy to accomodate. Since I am not done developing this, and as long
as I develop this, I will seek ways to make it ever simpler and more
intuitive. My goal is to drive out most "computerisms", specifically
the q:session element will probably be turned into an attribute. Also
will tinker with the way results are populated.
== HOW DO I GET MY DATA INTO THIS FORMAT? ==
I have companion projects, notably db2xml, which can be used to easily
format data on relational databases into XML data. This will be added
here once we begin to use it for our own migration. The last thing you
want to do is use some DOM or other API to the XML data, because that
degrades you into a little creeping bug again. I typically go a two-
step process:
1. use SQL (views) to bring the data into a structure similar to
the XML data.
1. then use db2xml to export the data into XML.
Any value translations are best done on the database in SQL, but can
be done using transforms too.
Sometimes it is useful to put additional XSLT transforms behind it or,
if you do not like SQL (in which case you should reconsider your
attitude towards SQL), you could dump the data in its original
structure to XML and then use XSLT to transform.
== MOTIVATION AND PHILOSOPHY ==
My motivation is that I hate crawling in Java beans, calling setters
and getters, navigate in APIs. That all leads to spaghetti code, and I
feel like a little bug that creeps through a maze of
pipelines. Conversely, if I look at XML data, it is all clear like I'm
a grown up human who explores the world on a map with one finger on
Turkmenistan and another finger in Chile. This is much more conducive
for productive work with data.
My philosophy in XML design is radical but extremely powerful and has
been tested time and again in all kinds of applications. I have made
similar interfaces to anything, especially generic relational database
access (db2xml), XML interface to GALEN/Grail, my cellphone calendar,
HL7, ICDO, Multum, RxNorm, SNOMED, and I am always thrilled by the fact
that 150 lines of good XSLT code can accomplish the most complex of
tasks and do so in a way that I can understand the design even many
months after I wrote it.
What's radical about my use of XML is that I never use XML Schemas and
I never use DOMs and I always use XSLT (sometimes Java/SAX directly
for very big projects.) So, don't ask me for a schema. The point of
XCACORE is that it can be used with any UML model. The UML model
determines the XML schema in a pretty straight-forward manner, and
the XCACORE functions can be added to any such schema.