Enrichment of Medications Catalog using RxNorm

After a thorough study of RxNorm and its relationship with SPL we will tie RxNorm into our application until FDA has finished fixing their drug-listing process.

We have previously created RXN_ROLE_STAGE, and now we use that to move on.

We build a gigantic query from pieces using temporary views (WITH clauses).

Quantities Numerator and Denominator

WITH RXN_ROLE_STAGE1 AS (
  SELECT SCOPERID, ID, SCOPERTYPE, SCOPERNAME, PLAYERID, 
         CASE WHEN PLAYERID = MOIETYID THEN MOIETYNAME ELSE PLAYERNAME END AS PLAYERNAME,
         MOIETYID, MOIETYNAME, ROLENAME, ROLENAMESTEM, QUANTITYNUMBER, QUANTITYUNIT
    FROM RXN_ROLE_STAGE r
), RXN_ROLE_STAGE2 AS (
  SELECT r.*, INSTR(QUANTITYUNIT,'/') AS qty_slash_position
    FROM RXN_ROLE_STAGE1 r
), RXN_ROLE_STAGE3 AS (
  SELECT r.*, 
    quantitynumber AS qty_numerator_value,
    CASE WHEN qty_slash_position > 0 
      THEN SUBSTR(quantityunit, 1, qty_slash_position-1)
      ELSE quantityunit
    END AS qty_numerator_unit,
    1 AS qty_denominator_value,
    CASE WHEN qty_slash_position > 0 
      THEN SUBSTR(quantityunit, qty_slash_position+1)
      ELSE '1'
    END AS qty_denominator_unit
    FROM RXN_ROLE_STAGE2 r
) ...

Map Units to UCUM

Now using that we make a mapping of RxNorm units to UCUM units:

CREATE TABLE RXN_UNIT_UCUM_MAP (
   RXN_UNIT VARCHAR2(64) PRIMARY KEY,
   RXN_COUNT INTEGER,
   UCUM_UNIT VARCHAR2(64)
)
GO
INSERT INTO RXN_UNIT_UCUM_MAP(RXN_UNIT, RXN_COUNT)
WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), all_units AS (
  SELECT qty_numerator_unit AS unit FROM RXN_ROLE_STAGE3
  UNION ALL SELECT qty_denominator_unit FROM RXN_ROLE_STAGE3
) 
SELECT unit, count(1) AS count FROM all_units 
GROUP BY unit ORDER BY count(1) DESC

Then this table is manually edited (do not delete) to fill in UCUM mappings.

RXN_UNITRXN_COUNTUCUM_UNIT
MG69837mg
1381621
ML28793mL
UNT3622[iU]
MEQ792mEq
ACTUAT450{actuation}
X317{X}
%244%
HR241h
MIL51(null)
MMOL37mmol
IU23[iU]
GTT11[drp]

There is a tail of garbage, which we will simply ignore.

Now we can actually map the units by adding

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS (
  SELECT r.*,
      nu.ucum_unit AS qty_numerator_ucum_unit,
      du.ucum_unit AS qty_denominator_ucum_unit
    FROM RXN_ROLE_STAGE3 r
      INNER JOIN RXN_UNIT_UCUM_MAP nu ON(nu.rxn_unit =  r.qty_numerator_unit)
      INNER JOIN RXN_UNIT_UCUM_MAP du ON(du.rxn_unit =  r.qty_denominator_unit)
) SELECT * FROM RXN_ROLE_STAGE4

Remove all drugs where the units could not be completely mapped

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( 
  SELECT scoperid FROM RXN_ROLE_STAGE4
  MINUS
  SELECT scoperid FROM RXN_ROLE_STAGE4
    WHERE qty_numerator_ucum_unit IS NULL
       OR qty_denominator_ucum_unit IS NULL
), RXN_ROLE_STAGE5 AS (
  SELECT * FROM RXN_ROLE_STAGE4 INNER JOIN RXN_COMPLETE_DRUGS USING(scoperid)
) SELECT * FROM RXN_ROLE_STAGE5

Remove overlap with SPL

Now we have to remove the overlap to SPL. We do this by using the NDC map very simply. Any SCD or SBD which has an NDC numbers known to RxNorm which is also in SPL is entirely deleted with all its sibling SBDs. We really need every kind of SPL label only one in the system and do not want overlapping RxNorm drugs. Also, we want to create SPL descriptions from RxNorm, so we will remove all SCDs. We will do our own classifications later.

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( 
    SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
  MINUS
    SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
      INNER JOIN (
        SELECT ndc2 AS ndc
          FROM SPL_DRUGLISTING 
            INNER JOIN RXN_NDC_EQUIV ON(ndc1 = ndc)
      ) USING(ndc)
), RXN_ROLE_STAGE6 AS (
  SELECT * FROM RXN_ROLE_STAGE5 INNER JOIN RXN_NDC_NO_SPL USING(scoperid)
    WHERE scopertype = 'SBD'
) SELECT * FROM RXN_ROLE_STAGE6

Mappings to UNII/SRS Codes

We use our mapping if we can, and for the rest we make up fake UNII codes that are the RxCUIs padded with "RXNXXXX...".

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS (
  SELECT r.*, 
         COALESCE(um.code_code, LPAD(playerid,10,'RXNXXXXXXX')) AS moiety_unii_code,
         COALESCE(up.code_code, LPAD(moietyid,10,'RXNXXXXXXX')) AS player_unii_code
      FROM RXN_ROLE_STAGE6 r
        LEFT OUTER JOIN UNII_RXN_MAP um ON(um.rxcui = r.moietyid)
        LEFT OUTER JOIN UNII_RXN_MAP up ON(up.rxcui = r.playerid)
) SELECT * FROM RXN_ROLE_STAGE7

Tradename

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS (
  SELECT r.*, 
         INSTR(scopername,'[',-1) AS tradename_start,
         INSTR(scopername,']',-1) AS tradename_end
    FROM RXN_ROLE_STAGE7 r
), RXN_ROLE_STAGE9 AS (
  SELECT r.*, 
         SUBSTR(scopername, tradename_start+1, tradename_end-tradename_start-1) AS tradename
    FROM RXN_ROLE_STAGE8 r
) SELECT * FROM RXN_ROLE_STAGE9

NDC code

NDC codes added on the product level only, no package information exists). Note how we have to make sure that we don't pick up more than one NDC code per product. We use MAX, not MIN, because MIN might put empty codes in (i.e., "-")

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS ( ...
), RXN_ROLE_STAGE9 AS ( ...
), RXN_PRODUCT_NDC AS (
  SELECT rxcui AS scoperid, 
         MAX(SUBSTR(ndc, 1, 5)) AS ndc_labeler,
         MAX(SUBSTR(ndc, 1, 5)||'-'||SUBSTR(ndc, 6, 4)) AS ndc_product
    FROM RXN_NDC_CLEANUP
    GROUP BY rxcui
), RXN_ROLE_STAGE10 AS (
  SELECT *
    FROM RXN_ROLE_STAGE9
      INNER JOIN RXN_PRODUCT_NDC USING(scoperid)
      LEFT OUTER JOIN NDC_FIRM ON(lblcode = ndc_labeler)
) SELECT * FROM RXN_ROLE_STAGE10

Form and route mapping

Mapping dose forms and routes between FDA to RxNorm is not so easy and we have to do it by hand. Let's go get that table started:

DROP TABLE rxn_spl_form_route_map
GO
CREATE TABLE rxn_spl_form_route_map (
  doseform_rxcui VARCHAR2(16) PRIMARY KEY,
  doseform_name VARCHAR2(128),
  count INTEGER,
  formCode_code VARCHAR2(32),
  formCode_name VARCHAR2(128),
  routeCode_code VARCHAR2(32),
  routeCode_name VARCHAR2(128)
)
GO
INSERT INTO rxn_spl_form_route_map(
  doseform_rxcui,
  doseform_name,
  count
)
WITH rxn_form AS (
  SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name 
    FROM umls.rxnconso c
      INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
    WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
) SELECT doseform_rxcui, doseform_name, count(1) as count
    FROM rxn_form
    GROUP BY doseform_rxcui, doseform_name
    ORDER BY count DESC, doseform_name ASC

And edit it manually. Attached is a spreadsheet where the work was not too hard to do and to check. It also has SQL commands to recreate the data quickly. Now we can actually map:

WITH RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS ( ...
), RXN_ROLE_STAGE9 AS ( ...
), RXN_PRODUCT_NDC AS ( ...
), RXN_ROLE_STAGE10 AS ( ...
), RXN_FORM AS (
  SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name
    FROM umls.rxnconso c
      INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
    WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
), RXN_ROLE_STAGE11 AS (
  SELECT r.*, doseform_rxcui, df.doseform_name, m.formCode_code, m.formCode_name, m.routeCode_code, m.routeCode_name
    FROM RXN_ROLE_STAGE10 r
      INNER JOIN RXN_FORM df ON(df.rxcui = scoperid)
      LEFT OUTER JOIN RXN_SPL_FORM_ROUTE_MAP m USING(doseform_rxcui)
) SELECT * FROM RXN_ROLE_STAGE11

But we still want the display names for these codes also.

Final Staging Table

And now, we should be able to create the HL7 XML for the drug descriptions all by ourselves. This is done using our db2xml templates from a predefined table that has (should have) everything.

CREATE TABLE RXN_ROLE_STAGE_FINAL AS
WITH RXN_ROLE_STAGE1 AS ( ...
), RXN_ROLE_STAGE2 AS ( ...
), RXN_ROLE_STAGE3 AS ( ...
), RXN_ROLE_STAGE4 AS ( ...
), RXN_COMPLETE_DRUGS AS ( ...
), RXN_ROLE_STAGE5 AS ( ...
), RXN_NDC_NO_SPL AS ( ...
), RXN_ROLE_STAGE6 AS ( ...
), RXN_ROLE_STAGE7 AS ( ...
), RXN_ROLE_STAGE8 AS ( ...
), RXN_ROLE_STAGE9 AS ( ...
), RXN_PRODUCT_NDC AS ( ...
), RXN_ROLE_STAGE10 AS ( ...
), RXN_FORM AS ( ...
), RXN_ROLE_STAGE11 AS ( ...
) SELECT * FROM RXN_ROLE_STAGE11
  ORDER BY scoperid

19361 records added.

Next Steps

From here we use db2xml to create the SPL descriptions from RxNorm and immediately load it into the database (mw target rxnspl). Now we can make the final classification. This is based on DrugBaseEvaluationClassification, but during the writing of the paper, we have cleaned up the queries to do so, so we rewrite the previous work again here. Turns out that the cleaner queries are also faster.


Here is the whole thing as one easy-to-replay query:

DROP TABLE RXN_ROLE_STAGE_FINAL
GO
CREATE TABLE RXN_ROLE_STAGE_FINAL AS
WITH RXN_ROLE_STAGE1 AS (
  SELECT SCOPERID, ID, SCOPERTYPE, SCOPERNAME, PLAYERID, 
         CASE WHEN PLAYERID = MOIETYID THEN MOIETYNAME ELSE PLAYERNAME END AS PLAYERNAME,
         MOIETYID, MOIETYNAME, ROLENAME, ROLENAMESTEM, QUANTITYNUMBER, QUANTITYUNIT
    FROM RXN_ROLE_STAGE r
), RXN_ROLE_STAGE2 AS (
  SELECT r.*, INSTR(QUANTITYUNIT,'/') AS qty_slash_position
    FROM RXN_ROLE_STAGE1 r
), RXN_ROLE_STAGE3 AS (
  SELECT r.*, 
    quantitynumber AS qty_numerator_value,
    CASE WHEN qty_slash_position > 0 
      THEN SUBSTR(quantityunit, 1, qty_slash_position-1)
      ELSE quantityunit
    END AS qty_numerator_unit,
    1 AS qty_denominator_value,
    CASE WHEN qty_slash_position > 0 
      THEN SUBSTR(quantityunit, qty_slash_position+1)
      ELSE '1'
    END AS qty_denominator_unit
    FROM RXN_ROLE_STAGE2 r
), RXN_ROLE_STAGE4 AS (
  SELECT r.*,
      nu.ucum_unit AS qty_numerator_ucum_unit,
      du.ucum_unit AS qty_denominator_ucum_unit
    FROM RXN_ROLE_STAGE3 r
      INNER JOIN RXN_UNIT_UCUM_MAP nu ON(nu.rxn_unit =  r.qty_numerator_unit)
      INNER JOIN RXN_UNIT_UCUM_MAP du ON(du.rxn_unit =  r.qty_denominator_unit)
), RXN_COMPLETE_DRUGS AS ( 
    SELECT scoperid FROM RXN_ROLE_STAGE4
  MINUS
    SELECT scoperid FROM RXN_ROLE_STAGE4
      WHERE qty_numerator_ucum_unit IS NULL
         OR qty_denominator_ucum_unit IS NULL
), RXN_ROLE_STAGE5 AS (
  SELECT * FROM RXN_ROLE_STAGE4 INNER JOIN RXN_COMPLETE_DRUGS USING(scoperid)
), RXN_NDC_NO_SPL AS ( 
    SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
  MINUS
    SELECT rxcui AS scoperid FROM RXN_NDC_CLEANUP
      INNER JOIN (
        SELECT ndc2 AS ndc
          FROM SPL_DRUGLISTING 
            INNER JOIN RXN_NDC_EQUIV ON(ndc1 = ndc)
      ) USING(ndc)
), RXN_ROLE_STAGE6 AS (
  SELECT * FROM RXN_ROLE_STAGE5 INNER JOIN RXN_NDC_NO_SPL USING(scoperid)
    WHERE scopertype = 'SBD'
), RXN_ROLE_STAGE7 AS (
  SELECT r.*, 
         COALESCE(um.code_code, LPAD(playerid,10,'RXNXXXXXXX')) AS moiety_unii_code,
         COALESCE(up.code_code, LPAD(moietyid,10,'RXNXXXXXXX')) AS player_unii_code
      FROM RXN_ROLE_STAGE6 r
        LEFT OUTER JOIN UNII_RXN_MAP um ON(um.rxcui = r.moietyid)
        LEFT OUTER JOIN UNII_RXN_MAP up ON(up.rxcui = r.playerid)
), RXN_ROLE_STAGE8 AS (
  SELECT r.*, 
         INSTR(scopername,'[',-1) AS tradename_start,
         INSTR(scopername,']',-1) AS tradename_end
    FROM RXN_ROLE_STAGE7 r
), RXN_ROLE_STAGE9 AS (
  SELECT r.*, 
         SUBSTR(scopername, tradename_start+1, tradename_end-tradename_start-1) AS tradename
    FROM RXN_ROLE_STAGE8 r
), RXN_PRODUCT_NDC AS (
  SELECT rxcui AS scoperid, 
         MAX(SUBSTR(ndc, 1, 5)) AS ndc_labeler,
         MAX(SUBSTR(ndc, 1, 5)||'-'||SUBSTR(ndc, 6, 4)) AS ndc_product
    FROM RXN_NDC_CLEANUP
    GROUP BY rxcui
), RXN_ROLE_STAGE10 AS (
  SELECT *
    FROM RXN_ROLE_STAGE9
      INNER JOIN RXN_PRODUCT_NDC USING(scoperid)
      LEFT OUTER JOIN NDC_FIRM ON(lblcode = ndc_labeler)
), RXN_FORM AS (
  SELECT r.rxcui1 AS rxcui, r.rxcui2 AS doseform_rxcui, c.str AS doseform_name
    FROM umls.rxnconso c
      INNER JOIN umls.rxnrel r ON(r.rxcui2=c.rxcui)
    WHERE c.tty='DF' AND c.SAB='RXNORM' AND c.LAT='ENG'
), RXN_ROLE_STAGE11 AS (
  SELECT r.*, doseform_rxcui, df.doseform_name, m.formCode_code, m.formCode_name, m.routeCode_code, m.routeCode_name
    FROM RXN_ROLE_STAGE10 r
      INNER JOIN RXN_FORM df ON(df.rxcui = scoperid)
      LEFT OUTER JOIN RXN_SPL_FORM_ROUTE_MAP m USING(doseform_rxcui)
) SELECT * FROM RXN_ROLE_STAGE11
  ORDER BY scoperid, ndc_product
GO
CREATE INDEX rxn_role_stage_final_idx ON rxn_role_stage_final(scoperId) TABLESPACE useridx

Attachments