The HL7XMLReader
The HL7XMLReader can parse HL7 v2 into an XML format and a growing number of XSLT transforms for common tasks and demos of what one can do with this.
Class HL7XMLReader implements org.xml.sax.XMLReader
This is a very simple parser for HL7 messages that generates a very simple XML format. This is NOT the same format as the HL7 v2xml specification for a number of reasons. The most important reason being that the v2xml specification format cannot be generated from HL7 instances alone without much help from the specification (e.g., which data type a certain field is, and it requires groups to be identified as elements on their own.)
<hl7>
<segment tag="MSH">
<field>SIMPLE CONTENT</field>
<field>FIRST COMPONENT<component>SECOND COMPONENT</component>
<component>THIRD COMPONENT</component>
</field>
<field>FIRST REPETITION<repeat>SECOND REPETITION</repeat>
<repeat>THIRD REPETITION</repeat>
</field>
<field>FIRST COMPONENT FIRST ITEM<component>SECOND COMPNENT</component>
<repeat>SECOND REPETITION</repeat>
</field>
<field>...<component>...<subcomponent>...</subcomponent>
<subcomponent>...</subcomponent>
</component>
<repeat>...</repeat>
</field>
</segment>
</hl7>
This is what I call "lazy structure", i.e., structural tags are only used at the point where they are really needed. This is the true spirit of HL7 v2 encoding and backwards compatibility rules, where am |a| suddenly turns into an |a^b| without changing the meaning of the a.
It is easy to use this structure in XSLT and XPath. The first node in a tag is the first field content. The next node, if any, is a structural tag that will tell you on what structural level the first text node was. Since HL7 has no mixed content models, there is never any ambiguity.
The lazy spirit of HL7 v2 is followed not because the author believes that that is a good way of thinking and handling information, but because the real world is just that messy and after having done all a person can do to produce a structure-anal HL7 parser (ProtoGen/HL7) the author has given up any hope that HL7 v2.x use will ever get there.
This class behaves like an XML SAX parser, i.e., upon reading an HL7 message it generates SAX events. It is extremely simple and extremely easy to use with standard XML tools in Java. One can simply run the HL7 message through an XSLT transform. And this is really the main purpose of this class: to open up the HL7 v2.x message of any uglyness into the world of powerful XSLT transforms. This can be used to drive message processors or just message transformers that end up emitting the result of the transformation in HL7 v2 syntax.
Note also that there is no guarrantee the result is actually an HL7 message. It could be a batch or a continuation of a preceeding message. That's why the toplevel element isn't called "message" but simply "hl7".
Usage
You can invoke this parser in various ways according to the TRAX specification, as this class implements the SAX XMLReader interface. I recommend using Saxon v7 and higher as follows:
$ saxon7 -x org.regenstrief.xhl7.HL7XMLReader test.hl7 deep-identity-transform.xsl
Now for testing and simple tasks the HL7XMLReader class can be invoked directly to simply output the XML with indentation and no further transform.
$ java org.regenstrief.xhl7.HL7XMLReader file:test.hl7
(Notice that the argument is a url that must begin with a url scheme and colon)
Transforms
The lazy-structure may not be good to use for all circumstances. Hence there are two additional structure-models and transforms to those.
- lazy2eager.xsl (Note: this transform may have bugs!)
- eager2lazy.xsl
- eager2normalized.xsl
- normalized2eager.xsl
And finally there is a transform to produce HL7 in traditional encoding format.
- lazy2traditional.xsl
