Enrichment of Medications Catalog using RxNorm
After a thorough study of RxNorm and its relationship with SPL we will tie RxNorm into our application until FDA has finished fixing their drug-listing process.
We have previously created RXN_ROLE_STAGE, and now we use that to move on.
We build a gigantic query from pieces using temporary views (WITH clauses).
Quantities Numerator and Denominator
WITH RXN_ROLE_STAGE1 AS (
SELECT SCOPERID, ID, SCOPERTYPE, SCOPERNAME, PLAYERID,
CASE WHEN PLAYERID = MOIETYID THEN MOIETYNAME ELSE PLAYERNAME END AS PLAYERNAME,
MOIETYID, MOIETYNAME, ROLENAME, ROLENAMESTEM, QUANTITYNUMBER, QUANTITYUNIT
FROM RXN_ROLE_STAGE r
), RXN_ROLE_STAGE2 AS (
SELECT r.*, INSTR(QUANTITYUNIT,'/') AS qty_slash_position
FROM RXN_ROLE_STAGE1 r
), RXN_ROLE_STAGE3 AS (
SELECT r.*,
quantitynumber AS qty_numerator_value,
CASE WHEN qty_slash_position > 0
THEN SUBSTR(quantityunit, 1, qty_slash_position-1)
ELSE quantityunit
END AS qty_numerator_unit,
1 AS qty_denominator_value,
CASE WHEN qty_slash_position > 0
THEN SUBSTR(quantityunit, qty_slash_position+1)
ELSE '1'
END AS qty_denominator_unit
FROM RXN_ROLE_STAGE2 r
) ...
Map Units to UCUM
Now using that we make a mapping of RxNorm units to UCUM units:
CREATE TABLE RXN_UNIT_UCUM_MAP ( RXN_UNIT VARCHAR2(64) PRIMARY KEY, RXN_COUNT INTEGER, UCUM_UNIT VARCHAR2(64) ) GO INSERT INTO RXN_UNIT_UCUM_MAP(RXN_UNIT, RXN_COUNT) WITH RXN_ROLE_STAGE2 AS ( ... ), RXN_ROLE_STAGE3 AS ( ... ), all_units AS ( SELECT qty_numerator_unit AS unit FROM RXN_ROLE_STAGE3 UNION ALL SELECT qty_denominator_unit FROM RXN_ROLE_STAGE3 ) SELECT unit, count(1) AS count FROM all_units GROUP BY unit ORDER BY count(1) DESC
Then this table is manually edited (do not delete) to fill in UCUM mappings.
| RXN_UNIT | RXN_COUNT | UCUM_UNIT |
| MG | 69837 | mg |
| 1 | 38162 | 1 |
| ML | 28793 | mL |
| UNT | 3622 | [iU] |
| MEQ | 792 | mEq |
| ACTUAT | 450 | {actuation} |
| X | 317 | {X} |
| % | 244 | % |
| HR | 241 | h |
| MIL | 51 | (null) |
| MMOL | 37 | mmol |
| IU | 23 | [iU] |
| GTT | 11 | [drp] |
There is a tail of garbage, which we will simply ignore.
Now we can actually map the units by adding
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS (
SELECT r.*,
nu.ucum_unit AS qty_numerator_ucum_unit,
du.ucum_unit AS qty_denominator_ucum_unit
FROM RXN_ROLE_STAGE3 r
INNER JOIN RXN_UNIT_UCUM_MAP nu ON(nu.rxn_unit = r.qty_numerator_unit)
INNER JOIN RXN_UNIT_UCUM_MAP du ON(du.rxn_unit = r.qty_denominator_unit)
) SELECT * FROM RXN_ROLE_STAGE4
Remove all drugs where the units could not be completely mapped
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS (
SELECT scoperid FROM RXN_ROLE_STAGE4
MINUS
SELECT scoperid FROM RXN_ROLE_STAGE4
WHERE qty_numerator_ucum_unit IS NULL
OR qty_denominator_ucum_unit IS NULL
), RXN_ROLE_STAGE5 AS (
SELECT * FROM RXN_ROLE_STAGE4 INNER JOIN RXN_COMPLETE_DRUGS USING(scoperid)
) SELECT * FROM RXN_ROLE_STAGE5
Remove overlap with SPL
Now we have to remove the overlap to SPL. We do this by using the NDC map very simply. Any SCD or SBD which has an NDC numbers known to RxNorm which is also in SPL is entirely deleted with all its sibling SBDs. We really need every kind of SPL label only one in the system and do not want overlapping RxNorm drugs. Also, we want to create SPL descriptions from RxNorm, so we will remove all SCDs. We will do our own classifications later.
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS (
SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
MINUS
SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
INNER JOIN (
SELECT ndc2 AS ndc
FROM SPL_DRUGLISTING
INNER JOIN RXN_NDC_EQUIV ON(ndc1 = ndc)
) USING(ndc)
), RXN_ROLE_STAGE6 AS (
SELECT * FROM RXN_ROLE_STAGE5 INNER JOIN RXN_NDC_NO_SPL USING(scoperid)
WHERE scopertype = 'SBD'
) SELECT * FROM RXN_ROLE_STAGE6
Mappings to UNII/SRS Codes
We use our mapping if we can, and for the rest we make up fake UNII codes that are the RxCUIs padded with "RXNXXXX...".
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS (
SELECT r.*,
COALESCE(um.code_code, LPAD(playerid,10,'RXNXXXXXXX')) AS moiety_unii_code,
COALESCE(up.code_code, LPAD(moietyid,10,'RXNXXXXXXX')) AS player_unii_code
FROM RXN_ROLE_STAGE6 r
LEFT OUTER JOIN UNII_RXN_MAP um ON(um.rxcui = r.moietyid)
LEFT OUTER JOIN UNII_RXN_MAP up ON(up.rxcui = r.playerid)
) SELECT * FROM RXN_ROLE_STAGE7
Tradename
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS (
SELECT r.*,
INSTR(scopername,'[',-1) AS tradename_start,
INSTR(scopername,']',-1) AS tradename_end
FROM RXN_ROLE_STAGE7 r
), RXN_ROLE_STAGE9 AS (
SELECT r.*,
SUBSTR(scopername, tradename_start+1, tradename_end-tradename_start-1) AS tradename
FROM RXN_ROLE_STAGE8 r
) SELECT * FROM RXN_ROLE_STAGE9
NDC code
NDC codes added on the product level only, no package information exists). Note how we have to make sure that we don't pick up more than one NDC code per product. We use MAX, not MIN, because MIN might put empty codes in (i.e., "-")
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS ( ...
), RXN_ROLE_STAGE9 AS ( ...
), RXN_PRODUCT_NDC AS (
SELECT rxcui AS scoperid,
MAX(SUBSTR(ndc, 1, 5)) AS ndc_labeler,
MAX(SUBSTR(ndc, 1, 5)||'-'||SUBSTR(ndc, 6, 4)) AS ndc_product
FROM RXN_NDC_CLEANUP
GROUP BY rxcui
), RXN_ROLE_STAGE10 AS (
SELECT *
FROM RXN_ROLE_STAGE9
INNER JOIN RXN_PRODUCT_NDC USING(scoperid)
LEFT OUTER JOIN NDC_FIRM ON(lblcode = ndc_labeler)
) SELECT * FROM RXN_ROLE_STAGE10
Form and route mapping
Mapping dose forms and routes between FDA to RxNorm is not so easy and we have to do it by hand. Let's go get that table started:
DROP TABLE rxn_spl_form_route_map
GO
CREATE TABLE rxn_spl_form_route_map (
doseform_rxcui VARCHAR2(16) PRIMARY KEY,
doseform_name VARCHAR2(128),
count INTEGER,
formCode_code VARCHAR2(32),
formCode_name VARCHAR2(128),
routeCode_code VARCHAR2(32),
routeCode_name VARCHAR2(128)
)
GO
INSERT INTO rxn_spl_form_route_map(
doseform_rxcui,
doseform_name,
count
)
WITH rxn_form AS (
SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name
FROM umls.rxnconso c
INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
) SELECT doseform_rxcui, doseform_name, count(1) as count
FROM rxn_form
GROUP BY doseform_rxcui, doseform_name
ORDER BY count DESC, doseform_name ASC
And edit it manually. Attached is a spreadsheet where the work was not too hard to do and to check. It also has SQL commands to recreate the data quickly. Now we can actually map:
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS ( ...
), RXN_ROLE_STAGE9 AS ( ...
), RXN_PRODUCT_NDC AS ( ...
), RXN_ROLE_STAGE10 AS ( ...
), RXN_FORM AS (
SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name
FROM umls.rxnconso c
INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
), RXN_ROLE_STAGE11 AS (
SELECT r.*, doseform_rxcui, df.doseform_name, m.formCode_code, m.formCode_name, m.routeCode_code, m.routeCode_name
FROM RXN_ROLE_STAGE10 r
INNER JOIN RXN_FORM df ON(df.rxcui = scoperid)
LEFT OUTER JOIN RXN_SPL_FORM_ROUTE_MAP m USING(doseform_rxcui)
) SELECT * FROM RXN_ROLE_STAGE11
But we still want the display names for these codes also.
Final Staging Table
And now, we should be able to create the HL7 XML for the drug descriptions all by ourselves. This is done using our db2xml templates from a predefined table that has (should have) everything.
CREATE TABLE RXN_ROLE_STAGE_FINAL AS WITH RXN_ROLE_STAGE1 AS ( ... ), RXN_ROLE_STAGE2 AS ( ... ), RXN_ROLE_STAGE3 AS ( ... ), RXN_ROLE_STAGE4 AS ( ... ), RXN_COMPLETE_DRUGS AS ( ... ), RXN_ROLE_STAGE5 AS ( ... ), RXN_NDC_NO_SPL AS ( ... ), RXN_ROLE_STAGE6 AS ( ... ), RXN_ROLE_STAGE7 AS ( ... ), RXN_ROLE_STAGE8 AS ( ... ), RXN_ROLE_STAGE9 AS ( ... ), RXN_PRODUCT_NDC AS ( ... ), RXN_ROLE_STAGE10 AS ( ... ), RXN_FORM AS ( ... ), RXN_ROLE_STAGE11 AS ( ... ) SELECT * FROM RXN_ROLE_STAGE11 ORDER BY scoperid
19361 records added.
Next Steps
From here we use db2xml to create the SPL descriptions from RxNorm and immediately load it into the database (mw target rxnspl). Now we can make the final classification. This is based on DrugBaseEvaluationClassification, but during the writing of the paper, we have cleaned up the queries to do so, so we rewrite the previous work again here. Turns out that the cleaner queries are also faster.
Here is the whole thing as one easy-to-replay query:
DROP TABLE RXN_ROLE_STAGE_FINAL
GO
CREATE TABLE RXN_ROLE_STAGE_FINAL AS
WITH RXN_ROLE_STAGE1 AS (
SELECT SCOPERID, ID, SCOPERTYPE, SCOPERNAME, PLAYERID,
CASE WHEN PLAYERID = MOIETYID THEN MOIETYNAME ELSE PLAYERNAME END AS PLAYERNAME,
MOIETYID, MOIETYNAME, ROLENAME, ROLENAMESTEM, QUANTITYNUMBER, QUANTITYUNIT
FROM RXN_ROLE_STAGE r
), RXN_ROLE_STAGE2 AS (
SELECT r.*, INSTR(QUANTITYUNIT,'/') AS qty_slash_position
FROM RXN_ROLE_STAGE1 r
), RXN_ROLE_STAGE3 AS (
SELECT r.*,
quantitynumber AS qty_numerator_value,
CASE WHEN qty_slash_position > 0
THEN SUBSTR(quantityunit, 1, qty_slash_position-1)
ELSE quantityunit
END AS qty_numerator_unit,
1 AS qty_denominator_value,
CASE WHEN qty_slash_position > 0
THEN SUBSTR(quantityunit, qty_slash_position+1)
ELSE '1'
END AS qty_denominator_unit
FROM RXN_ROLE_STAGE2 r
), RXN_ROLE_STAGE4 AS (
SELECT r.*,
nu.ucum_unit AS qty_numerator_ucum_unit,
du.ucum_unit AS qty_denominator_ucum_unit
FROM RXN_ROLE_STAGE3 r
INNER JOIN RXN_UNIT_UCUM_MAP nu ON(nu.rxn_unit = r.qty_numerator_unit)
INNER JOIN RXN_UNIT_UCUM_MAP du ON(du.rxn_unit = r.qty_denominator_unit)
), RXN_COMPLETE_DRUGS AS (
SELECT scoperid FROM RXN_ROLE_STAGE4
MINUS
SELECT scoperid FROM RXN_ROLE_STAGE4
WHERE qty_numerator_ucum_unit IS NULL
OR qty_denominator_ucum_unit IS NULL
), RXN_ROLE_STAGE5 AS (
SELECT * FROM RXN_ROLE_STAGE4 INNER JOIN RXN_COMPLETE_DRUGS USING(scoperid)
), RXN_NDC_NO_SPL AS (
SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
MINUS
SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
INNER JOIN (
SELECT ndc2 AS ndc
FROM SPL_DRUGLISTING
INNER JOIN RXN_NDC_EQUIV ON(ndc1 = ndc)
) USING(ndc)
), RXN_ROLE_STAGE6 AS (
SELECT * FROM RXN_ROLE_STAGE5 INNER JOIN RXN_NDC_NO_SPL USING(scoperid)
WHERE scopertype = 'SBD'
), RXN_ROLE_STAGE7 AS (
SELECT r.*,
COALESCE(um.code_code, LPAD(playerid,10,'RXNXXXXXXX')) AS moiety_unii_code,
COALESCE(up.code_code, LPAD(moietyid,10,'RXNXXXXXXX')) AS player_unii_code
FROM RXN_ROLE_STAGE6 r
LEFT OUTER JOIN UNII_RXN_MAP um ON(um.rxcui = r.moietyid)
LEFT OUTER JOIN UNII_RXN_MAP up ON(up.rxcui = r.playerid)
), RXN_ROLE_STAGE8 AS (
SELECT r.*,
INSTR(scopername,'[',-1) AS tradename_start,
INSTR(scopername,']',-1) AS tradename_end
FROM RXN_ROLE_STAGE7 r
), RXN_ROLE_STAGE9 AS (
SELECT r.*,
SUBSTR(scopername, tradename_start+1, tradename_end-tradename_start-1) AS tradename
FROM RXN_ROLE_STAGE8 r
), RXN_PRODUCT_NDC AS (
SELECT rxcui AS scoperid,
MAX(SUBSTR(ndc, 1, 5)) AS ndc_labeler,
MAX(SUBSTR(ndc, 1, 5)||'-'||SUBSTR(ndc, 6, 4)) AS ndc_product
FROM RXN_NDC_CLEANUP
GROUP BY rxcui
), RXN_ROLE_STAGE10 AS (
SELECT *
FROM RXN_ROLE_STAGE9
INNER JOIN RXN_PRODUCT_NDC USING(scoperid)
LEFT OUTER JOIN NDC_FIRM ON(lblcode = ndc_labeler)
), RXN_FORM AS (
SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name
FROM umls.rxnconso c
INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
), RXN_ROLE_STAGE11 AS (
SELECT r.*, doseform_rxcui, df.doseform_name, m.formCode_code, m.formCode_name, m.routeCode_code, m.routeCode_name
FROM RXN_ROLE_STAGE10 r
INNER JOIN RXN_FORM df ON(df.rxcui = scoperid)
LEFT OUTER JOIN RXN_SPL_FORM_ROUTE_MAP m USING(doseform_rxcui)
) SELECT * FROM RXN_ROLE_STAGE11
ORDER BY scoperid, ndc_product
GO
CREATE INDEX rxn_role_stage_final_idx ON rxn_role_stage_final(scoperId) TABLESPACE useridx
Attachments
- RXN_SPL_FORM_ROUTE_MAP.xls (288.0 kB) - added by gschadow on 03/29/07 18:12:28.
- RXN_SPL_FORM_ROUTE_MAP.2.xls (326.5 kB) -
next revision with some annotations, version sent to Clem
, added by gschadow on 11/13/07 18:25:27.
