Allergies and Intolerances

Allergies are essentially Problems. So they will be very similar to problems and problem lists?. The only thing specific about allergies is that the Observation that details the problem will be a specific allergy observation.

Being an Observation we need to know

  1. what is the Observation.code?
  2. what are the Observation.values?
  3. are there any other associated data structures?

Observation.code

Comes from the HL7 ActCode? vocabulary. They have defined specific terms and codes. The OID for the ActCode? is (according to the etc/domain-oid-map.xml 2.16.840.1.113883.5.4). Now here are the kinds of intolerances and allergies that we can pick:

OINTintoleranceHypersensitivity resulting in an adverse reaction upon exposure to an agent.
- ALGallergyHypersensitivity to an agent caused by an immunologic response to an initial exposure.
- - DALGDrug AllergyAn allergy to a pharmaceutical product.
- - EALGEnvironmental AllergyAn allergy to a substance other than a drug or a food. E.g. Latex, pollen, etc.
- - FALGFood AllergyAn allergy to a substance generally consumed for nutritional purposes.
- DINTDrug IntoleranceHypersensitivity resulting in an adverse reaction upon exposure to a drug.
- - DALGDrug AllergyAn allergy to a pharmaceutical product.
- - DNAINTDrug Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- EINTEnvironmental IntoleranceHypersensitivity resulting in an adverse reaction upon exposure to environmental conditions.
- - EALGEnvironmental AllergyAn allergy to a substance other than a drug or a food. E.g. Latex, pollen, etc.
- - ENAINTEnvironmental Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- FINTFood IntoleranceHypersensitivity resulting in an adverse reaction upon exposure to food.
- - FALGFood AllergyAn allergy to a substance generally consumed for nutritional purposes.
- - FNAINTFood Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- NAINTNon-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- - DNAINTDrug Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- - ENAINTEnvironmental Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
- - FNAINTFood Non-Allergy IntoleranceHypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure

This list seems to follow the following schema:

AllergyNon-Allergy Intolerance
Hypersensitivity to an agent caused by an immunologic response to an initial exposure.Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure
DrugDALGDNAINT
FoodFALGFNAINT
EnvironmentEALGENAINT

and then there are two ways of rolling it up: ALG vs. NAINT with D-, E-, or F- underneath or D-, E-, F- with ALG vs. NAINT under them.

For our purpose, we will do it much simpler. We will allow

  • allergy or intolerance
    • allergy (immunologic)
    • non-allergy intolerance

Most of the time people will just use the "allergy or intolerance" concept. Whether the allergen is a drug, food or environmental agent should be evident from something else.

Observation.value

The draft HL7 v3 specifications right now say that the allergen concept sits as a code in the Observation.value attribute. That is fine for hay fever, but it is not so fine for drugs. However, now we also have a "causative agent" (CAGNT) participation. It allows us to connect to a material

Second problem is that if there are specific allergy tests, they will specify the allergen in the Observation . code (e.g., "hordeum vulgare IgE.").

There are also SNOMED problems we would use for allergies, in that case we would just say "Medical Problem" = "hayfever" or some such.

SNOMED and Allergies

Ancestors

SELECT DISTINCT e.conceptId, e.fullySpecifiedName, c.distance, d.conceptId, d.fullySpecifiedName
  FROM UMLS.MRCONSO n
   INNER JOIN UMLS.MRCONSO s ON(s.CUI = n.CUI AND s.SAB='SNOMEDCT')
   INNER JOIN UMLS.SCT_ISA_CLOSURE c ON(c.conceptId1 = s.code)
   INNER JOIN UMLS.SCT_CONCEPT e ON(e.conceptId = s.code)
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId2)
   WHERE LOWER(n.str) = 'allergy' AND n.LAT='ENG'
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = e.conceptId)
     AND EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE f
                   WHERE f.conceptId2 = 404684003 -- Clinical finding
                     AND f.conceptId1 = e.conceptId)
     AND d.conceptId <> 138875005 -- SNOMED CT toplevel
   ORDER BY e.conceptId, c.distance
CONCEPTIDFULLYSPECIFIEDNAMEDISTANCECONCEPTIDFULLYSPECIFIEDNAME
106190000Allergic state (disorder)0106190000Allergic state (disorder)
106190000Allergic state (disorder)1420134006Propensity to adverse reactions (disorder)
106190000Allergic state (disorder)264572001Disease (disorder)
106190000Allergic state (disorder)3404684003Clinical finding (finding)
269284003Allergy, unspecified (disorder)0269284003Allergy, unspecified (disorder)
269284003Allergy, unspecified (disorder)1106190000Allergic state (disorder)
269284003Allergy, unspecified (disorder)2420134006Propensity to adverse reactions (disorder)
269284003Allergy, unspecified (disorder)364572001Disease (disorder)
269284003Allergy, unspecified (disorder)4404684003Clinical finding (finding)
421961002Hypersensitivity reaction (disorder)0421961002Hypersensitivity reaction (disorder)
421961002Hypersensitivity reaction (disorder)1281647001Adverse reaction (disorder)
421961002Hypersensitivity reaction (disorder)2116223007Complication (disorder)
421961002Hypersensitivity reaction (disorder)3362977000Sequela (disorder)
421961002Hypersensitivity reaction (disorder)464572001Disease (disorder)
421961002Hypersensitivity reaction (disorder)5404684003Clinical finding (finding)

Looks like 106190000 is what we really care for.

Descendants

SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   WHERE c.conceptId2 = 106190000 
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)
   ORDER BY c.distance
DISTANCECONCEPTIDFULLYSPECIFIEDNAME
0106190000Allergic state (disorder)
18641003Jones-Mote type hypersensitivity (disorder)
121626009Cutaneous hypersensitivity (disorder)
1115665000Atopy (disorder)
1269284003Allergy, unspecified (disorder)
1419474003Allergy to mold (disorder)
1420881009Allergic disorder by allergen type (disorder)
1426232007Environmental allergy (disorder)

This first level makes sense, although it uses "Allergy" where we use "Intolerance" in HL7 (because "Atopy" is the better word for Non-Allergic Intolerance). "Allergy to mold" is obviously a mistaken oddball. The category "by allergen type" becomes interesting.

Allergic disorder by allergen type (420881009)

SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   WHERE c.conceptId2 = 420881009
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)
   ORDER BY c.distance

1598 items under 420881009, these go 9 levels deep:

0420881009Allergic disorder by allergen type (disorder)138875005SNOMED CT Concept (SNOMED RT+CTV3)
17114006Hypersensitivity to foreign antigen (disorder)138875005SNOMED CT Concept (SNOMED RT+CTV3)
137545009Hypersensitivity to endogenous antigen (disorder)138875005SNOMED CT Concept (SNOMED RT+CTV3)
1419199007Allergy to substance (disorder)105590001Substance (substance)
1402594000Allergy to plant (disorder)31006001Kingdom Plantae (organism)
1294918007Enteral and supplement feeds allergy (disorder)346415002Enteral+supplement feeds (product)
1294919004Oral rehydration salts allergy (disorder)346599006Oral rehydration salts (product)
1294917002Multiple electrolyte infusion allergy (disorder)346354002Multiple electrolyte infusion (product)
1232350006House dust mite allergy (disorder)260147004House dust mite (organism)
1405649006Tape allergy (disorder)84756000Adhesive tape, device (physical object)
1418968001Allergy to gauze (disorder)350785008Gauzes (physical object)

There are strange concepts here, that seem to belong far lower in the tree:

  • Hypersensitivity to endogenous antigen (disorder)
  • Hypersensitivity to foreign antigen (disorder)
    • Allergy to substance (disorder)
      • Enteral and supplement feeds allergy (disorder)
        • Oral rehydration salts allergy (disorder)
          • Multiple electrolyte infusion allergy (disorder)
      • Allergy to gauze (disorder)
      • Allergy to tape (disorder)
    • Allergy to plant (disorder)
    • Allergy to animal (does not even exist)
      • House dust mite allergy (disorder)

Strange. Well, we did notice some of these misplaced items in the previous level too, e.g. mold. On the other hand, allergy to substance (419199007) is what we now really care about:

Allergy to substance (419199007)

SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   WHERE c.conceptId2 = 419199007
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)
   ORDER BY c.distance

There are strange things in here, like "Allergy to anti-infective agent (disorder)" (91929009). Under this, of course, we find allergies to specific anti-infective agents. I think they might have created too many of these allergy concepts on all sorts of generalizations of substances which nobody really needs. Certainly an allergy to anti-infective agent is rather useless. We should not even admit this as it will cause in-actionable warnings.

SNOMED Allergies to NDF-RT

The idea is that all of the SNOMED allergy to substance concepts should indeed have such a substance (or even more than one) connected to them as the "causative agent". Lo and behold this is the same concept name as used by the HL7 participation for the same purpose.

SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   INNER JOIN UMLS.SCT_RELATIONSHIP ar 
      ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003/*causative agent*/)
   INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
   WHERE c.conceptId2 = 419199007 -- Allergy to substance
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)
   ORDER BY c.distance

Now we can find all those allergies whose causative substance is a drug. First we have to find out what a "substance is a drug" actually means. Do we mean some chemical? Well, that would be a substance. We mean "not a natural product"? But then the chicken egg gets quickly excluded which we still need because of the vaccines that may contain traces of the chicken egg used to grow the virus material.

SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName, aic.distance, ac.conceptId, ac.fullySpecifiedName
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   INNER JOIN UMLS.SCT_RELATIONSHIP ar ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003)
   INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
   INNER JOIN UMLS.SCT_ISA_CLOSURE aic ON(aic.conceptId1 = a.conceptId)
   INNER JOIN UMLS.SCT_CONCEPT ac ON(ac.conceptId = aic.conceptId2)
   WHERE c.conceptId2 = 419199007 -- Allergy to substance
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)
   ORDER BY c.distance, d.conceptId, a.conceptId, aic.distance

Perhaps we can say, allergies whose causative agent exists in any drug. But that question cannot be answered with the material we have, because before the FDA makes people register inactive ingredients, no code system will be complete in that regard.

So, when we now connect to NDF-RT classes, we must be careful to interpret the results correctly. Those we cannot map, are not necessarily true misses, but they are also not necessarilt to be excluded.

Instead we should use NDF-RT classes, specifically chemical structures. NDF-RT classes for chemical structure are the generalizations among NDF-RT's Ingredient/Chemical kind of concepts. On this we can make a mapping from SNOMED CT to NDF-RT. The general SNOMED CT to NDF-RT map through the UMLS is:

WITH ndf_cui_map AS (
  SELECT nr.source_id AS ndf_id, nr.target_value AS cui
    FROM UMLS.NDFRT_ROLE nr 
    WHERE nr.role_def_id = 'P141'
), snomed_cui_map AS (
  SELECT sc.code AS sct_id, cui
    FROM UMLS.MRCONSO sc
    WHERE sc.SAB = 'SNOMEDCT'
), snomed_ndf_map AS (
  SELECT sct_id, ndf_id 
    FROM ndf_cui_map INNER JOIN snomed_cui_map USING(cui)
)

Now let's see the cardinalities and misses of this mapping for our purpose of chemical/ingredient.

WITH snomed_ndf_map AS (
...
), ndf_substance AS (
  SELECT id AS ndf_id
    FROM UMLS.NDFRT_CONCEPT WHERE kind = '11'
) SELECT count(DISTINCT ndf_id), count(DISTINCT sct_id), count(DISTINCT sct_id)/count(DISTINCT ndf_id)
     FROM ndf_substance  
       LEFT OUTER JOIN snomed_ndf_map USING(ndf_id)
  • NDF: 8421
  • SNOMED: 7462 (88.6%)

Not too bad. Let's go the inverse:

WITH snomed_ndf_map AS (
...
), snomed_substance AS (
  SELECT conceptId1 AS sct_id
    FROM UMLS.SCT_ISA_CLOSURE WHERE conceptId2='105590001'
) SELECT count(DISTINCT sct_id), count(DISTINCT ndf_id), count(DISTINCT ndf_id)/count(DISTINCT sct_id)
     FROM snomed_substance  
       LEFT OUTER JOIN snomed_ndf_map USING(sct_id)
  • SNOMED: 23368
  • NDF: 4321 (18.4%)

Bad, but it's unfair to go in with unrestricted SNOMED:

WITH snomed_ndf_map AS (
...
), snomed_allergen AS (
  SELECT sct_id FROM snomed_substance 
    WHERE EXISTS (SELECT 1 FROM UMLS.SCT_RELATIONSHIP ar 
                   INNER JOIN UMLS.SCT_ISA_CLOSURE a 
                     ON(a.conceptId1 = ar.conceptId1)
                   WHERE ar.relationshipType = 246075003/*causative agent*/
                     AND ar.conceptId2 = sct_id
                     AND a.conceptId2 = 419199007/*Allergy to substance*/)
) SELECT count(DISTINCT sct_id), count(DISTINCT ndf_id), count(DISTINCT ndf_id)/count(DISTINCT sct_id)
     FROM snomed_allergen
       LEFT OUTER JOIN snomed_ndf_map USING(sct_id)
  • SNOMED: 1507
  • NDF: 933 (61.9%)

Now inverse again and fully constrained:

SELECT count(DISTINCT ndf_id), count(DISTINCT b.sct_id), count(DISTINCT sct_id)/count(DISTINCT ndf_id)
     FROM ndf_substance  
       LEFT OUTER JOIN snomed_ndf_map a USING(ndf_id)
       LEFT OUTER JOIN snomed_allergen b ON(b.sct_id = a.sct_id)

  • NDF: 8421
  • SCT: 915 10.8%

What a loss by forcing allergen! It means that there are many potential allergies that have no code in SNOMED. One more reason to make an allergy structure rather than simple allergy code.

There may still be some items here which cannot be properly mapped. But it's good enough for a start.

What is Actually Required?

To see what people really enter for allergies, I am scanning the RMRS Allergy Hx observations (deidentified, no human subjects). I do:

create table gschadow.allergy_terms as
select vc.code AS code, vc.name AS name, VALUE_MODIFIER_TEXT AS freeText, count(1) AS count
  from rmrs.clinical_variable 
    inner join rmrs.concept vc ON(vc.sys_id = value_if_type_is_coded_sys_id AND vc.code = value_if_type_is_coded_code)
      where SERVICE_SYS_ID = 1
        and service_code = '7717'
  group by vc.code, vc.name, VALUE_MODIFIER_TEXT
  • 224375 instances of Allergy Hx
  • 1040 distinct values (including the write-ins)
  • only 57 different write-ins, very small number of instances (<100)
  • looks very doable to map

Top-30 values are here:

CODENAMEWRITE INFREQ
16404no known allergies 119360
93PENICILLINS 15296
4552none 13292
13316no known drug allergies 10400
144Codeine 8467
3557PCN allergy 6597
72Aspirin 3520
555Sulfamethoxazole 3008
6020sulfa 2923
476Penicillin V 2099
4455drug allergy other 2051
6126morphine 2041
568SULFONAMIDES 1989
2637Amoxicillin 1724
6588Cefaclor 1622
97Cephalexin 1621
1777Ibuprofen 1418
145Meperidine 1367
44Sodium SerPl? Qn 769
346Erythromycin 669
83Tetracycline 598
3706sulfa allergy 556
560Darvon-N 550
11147Ciprofloxacin 549
4872codeine allergy 525
119Prochlorperazine 514
4915no response 482
78Acetaminophen 435
12983Acetam/Hydrocodn 432
94Ampicillin 421
5917Penicillin 391

We can map this via NDC or ICD-9. 32 of these allergies (6690 instances) were using LOINC codes for plasma levels of the chemical in question.

WHATCOUNT CODECOUNT
no match345 168542
NDC 509 40713
ICD-9 only74 11181
LN only 27 3893

Most (143052) of the 168542 instances of no match is for no allergies:

  • no known allergies (119360)
  • none (13292) and
  • no known drug allergies (10400)

So we are not overly concerned about the 25490 occurrences, of which only 82 distinct terms have 10 or more instance.

Here is the query block to get to the above numbers:

WITH map AS (
SELECT source_code as code, target_sys_id as other_sys_id, target_code AS other_code
  FROM rmrs.concept_mapping WHERE source_sys_id = 1
UNION
SELECT target_code as code, source_sys_id as other_sys_id, source_code AS other_code
  FROM rmrs.concept_mapping WHERE target_sys_id = 1
), a AS ( 
SELECT * FROM Allergy_terms a
LEFT OUTER JOIN (
SELECT code, name, count, other_sys_id, other_sys_name, count(1) AS mcount
FROM (
SELECT a.code, a.name, a.count, other_sys_id, s.name AS other_sys_name, other_code, c.name AS other_name
  FROM Allergy_terms a
  INNER JOIN map m ON(m.code = a.code)
  INNER JOIN rmrs.code_system s ON(s.sys_id = m.other_sys_id)
  INNER JOIN rmrs.concept c ON(c.sys_id = m.other_sys_id and c.code = m.other_code)
  WHERE other_sys_id IN (357, 2, 41)
) u
  GROUP BY code, name, count, other_sys_id, other_sys_name
) USING(code, name, count)
--  WHERE count > 1
  ORDER BY count DESC, name ASC, other_sys_name ASC
)
-- SELECT * FROM a
SELECT 'no match', count(DISTINCT CODE), SUM(COUNT) FROM a WHERE other_sys_id IS NULL
UNION ALL
SELECT 'NDC', count(DISTINCT CODE), SUM(COUNT) FROM a WHERE other_sys_id = 357
UNION ALL
SELECT 'ICD-9 only', count(DISTINCT CODE), SUM(COUNT) FROM (
SELECT CODE, COUNT FROM a WHERE other_sys_id = 2
MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 357
)
UNION ALL
SELECT 'LN only', count(DISTINCT CODE), SUM(COUNT) FROM (
SELECT CODE, COUNT FROM a WHERE other_sys_id = 41
MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 2
MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 357
)

Here is how we can map it to the maximum extent:

DROP TABLE  Allergy_term_mapping
GO
CREATE TABLE Allergy_term_mapping AS
WITH map AS (
SELECT source_code as code, target_sys_id as other_sys_id, target_code AS other_code
  FROM rmrs.concept_mapping WHERE source_sys_id = 1
UNION
SELECT target_code as code, source_sys_id as other_sys_id, source_code AS other_code
  FROM rmrs.concept_mapping WHERE target_sys_id = 1
), a AS ( 
SELECT a.code, a.name, a.freetext, a.count, ndc.other_code as ndc_code, icd.other_code as icd_code 
  FROM Allergy_terms a
  LEFT OUTER JOIN map ndc ON(ndc.code = a.code AND ndc.other_sys_id = 357)
  LEFT OUTER JOIN map icd ON(icd.code = a.code AND icd.other_sys_id = 2)
)
SELECT * FROM a
GO
SELECT a.*
  FROM Allergy_terms a
MINUS
SELECT a.*
  FROM Allergy_terms a
    INNER JOIN Allergy_term_mapping m ON(m.code = a.code)
    INNER JOIN SPL_DRUGLISTING l ON(l.ndc = m.ndc_code)
MINUS 
SELECT a.*
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 ON(lower(c1.str) = lower(a.name) and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
MINUS 
SELECT a.*
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 
       ON(lower(c1.str) = regexp_replace(lower(a.name), ' (allergy|serpl qn|meds|inj|tab|tabs|vag gel|impl|sr cap|pca|drugs|susc|tapes|strip|top sol|sodium|potassium|ont|tape|syrup|eqv|othr|syr|syrin|tn|po|1-6|rx|immun|op sol|oral|op|vacc|kit|intolerance|allergy|product|type|drug|dpi|meds|eye|sr|chew|susp|alrgy|order|op|abn|hcl|na|acid|vag cr|aci|pf inj|sl|est|elix|ds|scrub|opo|liq|inhaler|cr|bld qn|unconj serpl bld qn|top susc|neb sol|inhl)[''*]?$','')
          and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
MINUS 
SELECT a.*
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 
       ON(lower(c1.str) = regexp_replace(lower(a.name), 's[''*]?$','')
          and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)

and that leaves over 461 and very few actual allergies with high instance counts. We may not even need the NDC code mapping, because that only adds 70 more.

Here is the actual mapping now, we added sulfa -> sulfonamides manually.

CREATE TABLE Allergy_term_mesh_map AS
SELECT * FROM Allergy_terms a LEFT OUTER JOIN (
-- SELECT * FROM (
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 ON(lower(c1.str) = lower(a.name) and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION 
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 
       ON(lower(c1.str) = regexp_replace(lower(a.name), ' (-n|allergy|serpl qn|meds|inj|tab|tabs|vag gel|impl|sr cap|pca|drugs|susc|tapes|strip|top sol|sodium|potassium|ont|tape|syrup|eqv|othr|syr|syrin|tn|po|1-6|rx|immun|op sol|oral|op|vacc|kit|intolerance|allergy|product|type|drug|dpi|meds|eye|sr|chew|susp|alrgy|order|op|abn|hcl|na|acid|vag cr|aci|pf inj|sl|est|elix|ds|scrub|opo|liq|inhaler|cr|bld qn|unconj serpl bld qn|top susc|neb sol|inhl)[''*]?$','')
          and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION 
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
  FROM Allergy_terms a
    INNER JOIN umls.mrconso c1 
       ON(lower(c1.str) = regexp_replace(lower(a.name), 's[''*]?$','')
          and lat='ENG')
    INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION
SELECT a.code, a.name, a.count, 'D013449' as MeSH_code, 'Sulfonamides' as mesh_name
  FROM Allergy_terms a
  WHERE lower(a.name) in('sulfa', 'sulfa allergy')
) USING(code, name, count)

With this:

SELECT count(code), sum(count) FROM (
--SELECT code, count FROM allergy_term_mesh_map
--MINUS
SELECT code, count FROM allergy_term_mesh_map WHERE mesh_code is not null
)

we can match 521 terms for 72796 instances and we miss 490 terms with 152410 instances, of which 143052 are explicitly no allergies, so that makes only 9358 instances missed.

Conclusions from SNOMED

Next Step

For immediate purpose we can use SNOMED allergy concepts of the 420881009 descendents.

SELECT *
  FROM mw.SNOMED_FINDING_CLOSURE c 
   INNER JOIN mw.SNOMED_FINDING d USING(code)
   INNER JOIN mw.SNOMED_FINDING_NAME n USING(code)
  WHERE c.ancestor_or_self_code = '420881009'

which is done in db2xml as snomed-allergies-rim-template.xml.

But this doesn't help us if we want to do allergy checking, because we will eventually have NDF-RT chemical structures only. This, in fact, calls for us to create the NDF-RT chemical structures for our SPL drugs also.

  1. link SNOMED allergy agents to NDF-RT ingredient/chemicals (88.6% done)
  2. annotate SPL drugs with MeSH (was: NDF-RT ingredient/chemical) classes, see below
  3. describe SNOMED allergy observations in the HL7 model referring to the MeSH class as MaterialKind?.
  4. create the Composition which either picks up a defined allergy term (SNOMED) or a substance class.
  5. add empirical data about occurrence of allergies from various sources, including RMRS
  6. remove the detailed SNOMED allergy codes from the problem list codes?

The SNOMED allergies, we can describe thus:

  1. From SNOMED allergy finding we map to SNOMED allergen
  2. From SNOMED allergen we map to MeSH chemical class

Here is the SNOMED Allergy to MeSH Allergen Map:

SELECT --+FIRST_ROWS
       c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName,
       sc.cui, mc.*, mch.*
  FROM UMLS.SCT_ISA_CLOSURE c
   INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
   INNER JOIN UMLS.SCT_RELATIONSHIP ar 
      ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003/*causative agent*/)
   INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
   INNER JOIN UMLS.MRCONSO sc ON(sc.code = a.conceptId AND sc.sab = 'SNOMEDCT')
   INNER JOIN unii_mesh_class mc ON(mc.cui = sc.cui OR mc.class_cui = sc.cui)
   INNER JOIN UMLS.MeSH_chemical mch ON(mch.code = mc.class_code)
   WHERE c.conceptId2 = 419199007 -- Allergy to substance
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = d.conceptId)

That's too big, only for review, here is a smaller version, same essential content:

CREATE TABLE sct_allergy_mesh_map AS
SELECT c.conceptId1 AS allergy_sct_id, ar.conceptId2 AS allergen_sct_id, mc.class_code AS mesh_class_code
  FROM UMLS.SCT_ISA_CLOSURE c
   LEFT OUTER JOIN UMLS.SCT_RELATIONSHIP ar 
      ON(ar.conceptId1 = c.conceptId1 AND ar.relationshipType = 246075003/*causative agent*/)
   LEFT OUTER JOIN UMLS.MRCONSO sc ON(sc.code = ar.conceptId2 AND sc.sab = 'SNOMEDCT')
   LEFT OUTER JOIN unii_mesh_class mc ON(mc.cui = sc.cui OR mc.class_cui = sc.cui)
   WHERE c.conceptId2 = 419199007 -- Allergy to substance
/*
     AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = c.conceptId1)
*/

How much does that cover?

SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map
WHERE NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
                       WHERE x.conceptId2 = 362955004 -- Inactive Concept
                         AND x.conceptId1 = allergy_sct_id)
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map WHERE allergen_sct_id IS NOT NULL
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map WHERE mesh_class_code IS NOT NULL
UNION ALL
SELECT COUNT(DISTINCT allergen_sct_id) FROM sct_allergy_mesh_map
UNION ALL
SELECT COUNT(DISTINCT mesh_class_code) FROM sct_allergy_mesh_map

That is 1587 allergy concepts (1509 allergens) of which 761 allergy concepts have a MeSH allergen class for 682 unique such classes.

This was using the UNII code allergy classes, we could, however, use the MeSH allergy class mapping we did for UNII and apply it to SNOMED directly, that gives us a better coverage here, and it will tell us clearer how good our selection of MeSH classes are. First the direct map to MeSH chemical:

INSERT INTO sct_allergy_mesh_map
SELECT allergy_sct_id, allergen_sct_id, mc.code
  FROM sct_allergy_mesh_map sa
--  INNER JOIN UMLS.SCT_CONCEPT c ON(c.conceptId = sa.allergen_sct_id)
  INNER JOIN UMLS.MRCONSO sc ON(sc.code = sa.allergen_sct_id AND sc.sab = 'SNOMEDCT')
  INNER JOIN umls.MeSH_Chemical mc ON(mc.cui = sc.cui)
  WHERE sa.MeSH_class_code IS NULL

brings us up to 1011 allergens mapped to 928 unique classes. Now delete the unmapped entries which are now mapped:

DELETE FROM sct_allergy_mesh_map d
  WHERE mesh_class_code IS NULL 
    AND EXISTS (SELECT 1 FROM sct_allergy_mesh_map x
                  WHERE x.allergen_sct_id = d.allergen_sct_id
                    AND x.mesh_class_code IS NOT NULL)

943 down

Next one relationship removed:

INSERT INTO sct_allergy_mesh_map
SELECT allergy_sct_id, allergen_sct_id, mc.code
  FROM sct_allergy_mesh_map sa
  INNER JOIN UMLS.MRCONSO sc ON(sc.code = sa.allergen_sct_id AND sc.sab = 'SNOMEDCT')
  INNER JOIN umls.mrrel r1 ON(r1.cui1 = sc.cui AND r1.sab='MSH' AND (r1.REL IN('PAR', 'RB') OR (r1.REL = 'RO' AND r1.RELA IS NULL))) 
  INNER JOIN umls.MeSH_Chemical mc ON(mc.cui = r1.cui2)
  WHERE sa.MeSH_class_code IS NULL

Takes far too long. Also, not sure we want this, because we should not be abstracting the SNOMED allergy concept.

DROP TABLE UMLS.SCT_ALLERGY_MESH_MAP
GO
CREATE TABLE UMLS.SCT_ALLERGY_MESH_MAP AS
SELECT DISTINCT sa.*, mc.name as MESH_CLASS_DISPLAYNAME
  FROM sct_allergy_mesh_map sa
    INNER JOIN UMLS.MESH_Chemical mc ON(Mesh_class_code = code)
GO
CREATE INDEX UMLS.SCT_ALLERGY_MESH_MAP_SCT ON UMLS.SCT_ALLERGY_MESH_MAP(ALLERGY_SCT_ID)

Anyway, we can proceed with

  1. exporting the SNOMED allergy definitions along with normal problem list
  2. add XML annotation to SPL labels

The enrichment of SPL labels from database content is something new for db2xml.

The VA/KP problem list subset is the one we use too, as the full SNOMED findings are too much.

create table umls.sct_vakp_subset as
select distinct scd_id AS conceptId from smantena.yatbu
go
create unique index umls.sct_vakp_subset_pk on umls.sct_vakp_subset(conceptId)

There are also very few essential allergies here.

Done.

Annotate SPL Drugs

The annotation of SPL drugs with allergens through NDF-RT is a big pain. This is described in UNII to NDF-RT. Instead we use MeSH directly. We now have a UNII to MeSH mapping (unii_mesh_class, created in UNII to MeSH) which gets 76% of the UNII codes mapped just by exact name match through UMLS.

HL7 Allergy RMIM

The HL7 allergy model allows us to specify the allergen in the following structure:

- Concern (Act)
  - subject
    - AllergyObservation
      - Participation "causative agent" (CAGNT)
        - Role (AdministrableMaterial)
          - MaterialKind

http://www.hl7.org/v3ballot/html/domains/uvpc/editable/images/REPC_RM000324UV.png