Allergies and Intolerances
Allergies are essentially Problems. So they will be very similar to problems and problem lists?. The only thing specific about allergies is that the Observation that details the problem will be a specific allergy observation.
Being an Observation we need to know
- what is the Observation.code?
- what are the Observation.values?
- are there any other associated data structures?
Observation.code
Comes from the HL7 ActCode? vocabulary. They have defined specific terms and codes. The OID for the ActCode? is (according to the etc/domain-oid-map.xml 2.16.840.1.113883.5.4). Now here are the kinds of intolerances and allergies that we can pick:
| OINT | intolerance | Hypersensitivity resulting in an adverse reaction upon exposure to an agent. |
| - ALG | allergy | Hypersensitivity to an agent caused by an immunologic response to an initial exposure. |
| - - DALG | Drug Allergy | An allergy to a pharmaceutical product. |
| - - EALG | Environmental Allergy | An allergy to a substance other than a drug or a food. E.g. Latex, pollen, etc. |
| - - FALG | Food Allergy | An allergy to a substance generally consumed for nutritional purposes. |
| - DINT | Drug Intolerance | Hypersensitivity resulting in an adverse reaction upon exposure to a drug. |
| - - DALG | Drug Allergy | An allergy to a pharmaceutical product. |
| - - DNAINT | Drug Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - EINT | Environmental Intolerance | Hypersensitivity resulting in an adverse reaction upon exposure to environmental conditions. |
| - - EALG | Environmental Allergy | An allergy to a substance other than a drug or a food. E.g. Latex, pollen, etc. |
| - - ENAINT | Environmental Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - FINT | Food Intolerance | Hypersensitivity resulting in an adverse reaction upon exposure to food. |
| - - FALG | Food Allergy | An allergy to a substance generally consumed for nutritional purposes. |
| - - FNAINT | Food Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - NAINT | Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - - DNAINT | Drug Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - - ENAINT | Environmental Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
| - - FNAINT | Food Non-Allergy Intolerance | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure |
This list seems to follow the following schema:
| Allergy | Non-Allergy Intolerance | |
| Hypersensitivity to an agent caused by an immunologic response to an initial exposure. | Hypersensitivity to an agent caused by a mechanism other than an immunologic response to an initial exposure | |
| Drug | DALG | DNAINT |
| Food | FALG | FNAINT |
| Environment | EALG | ENAINT |
and then there are two ways of rolling it up: ALG vs. NAINT with D-, E-, or F- underneath or D-, E-, F- with ALG vs. NAINT under them.
For our purpose, we will do it much simpler. We will allow
- allergy or intolerance
- allergy (immunologic)
- non-allergy intolerance
Most of the time people will just use the "allergy or intolerance" concept. Whether the allergen is a drug, food or environmental agent should be evident from something else.
Observation.value
The draft HL7 v3 specifications right now say that the allergen concept sits as a code in the Observation.value attribute. That is fine for hay fever, but it is not so fine for drugs. However, now we also have a "causative agent" (CAGNT) participation. It allows us to connect to a material
Second problem is that if there are specific allergy tests, they will specify the allergen in the Observation . code (e.g., "hordeum vulgare IgE.").
There are also SNOMED problems we would use for allergies, in that case we would just say "Medical Problem" = "hayfever" or some such.
SNOMED and Allergies
Ancestors
SELECT DISTINCT e.conceptId, e.fullySpecifiedName, c.distance, d.conceptId, d.fullySpecifiedName
FROM UMLS.MRCONSO n
INNER JOIN UMLS.MRCONSO s ON(s.CUI = n.CUI AND s.SAB='SNOMEDCT')
INNER JOIN UMLS.SCT_ISA_CLOSURE c ON(c.conceptId1 = s.code)
INNER JOIN UMLS.SCT_CONCEPT e ON(e.conceptId = s.code)
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId2)
WHERE LOWER(n.str) = 'allergy' AND n.LAT='ENG'
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = e.conceptId)
AND EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE f
WHERE f.conceptId2 = 404684003 -- Clinical finding
AND f.conceptId1 = e.conceptId)
AND d.conceptId <> 138875005 -- SNOMED CT toplevel
ORDER BY e.conceptId, c.distance
| CONCEPTID | FULLYSPECIFIEDNAME | DISTANCE | CONCEPTID | FULLYSPECIFIEDNAME |
| 106190000 | Allergic state (disorder) | 0 | 106190000 | Allergic state (disorder) |
| 106190000 | Allergic state (disorder) | 1 | 420134006 | Propensity to adverse reactions (disorder) |
| 106190000 | Allergic state (disorder) | 2 | 64572001 | Disease (disorder) |
| 106190000 | Allergic state (disorder) | 3 | 404684003 | Clinical finding (finding) |
| 269284003 | Allergy, unspecified (disorder) | 0 | 269284003 | Allergy, unspecified (disorder) |
| 269284003 | Allergy, unspecified (disorder) | 1 | 106190000 | Allergic state (disorder) |
| 269284003 | Allergy, unspecified (disorder) | 2 | 420134006 | Propensity to adverse reactions (disorder) |
| 269284003 | Allergy, unspecified (disorder) | 3 | 64572001 | Disease (disorder) |
| 269284003 | Allergy, unspecified (disorder) | 4 | 404684003 | Clinical finding (finding) |
| 421961002 | Hypersensitivity reaction (disorder) | 0 | 421961002 | Hypersensitivity reaction (disorder) |
| 421961002 | Hypersensitivity reaction (disorder) | 1 | 281647001 | Adverse reaction (disorder) |
| 421961002 | Hypersensitivity reaction (disorder) | 2 | 116223007 | Complication (disorder) |
| 421961002 | Hypersensitivity reaction (disorder) | 3 | 362977000 | Sequela (disorder) |
| 421961002 | Hypersensitivity reaction (disorder) | 4 | 64572001 | Disease (disorder) |
| 421961002 | Hypersensitivity reaction (disorder) | 5 | 404684003 | Clinical finding (finding) |
Looks like 106190000 is what we really care for.
Descendants
SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
WHERE c.conceptId2 = 106190000
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
ORDER BY c.distance
| DISTANCE | CONCEPTID | FULLYSPECIFIEDNAME |
| 0 | 106190000 | Allergic state (disorder) |
| 1 | 8641003 | Jones-Mote type hypersensitivity (disorder) |
| 1 | 21626009 | Cutaneous hypersensitivity (disorder) |
| 1 | 115665000 | Atopy (disorder) |
| 1 | 269284003 | Allergy, unspecified (disorder) |
| 1 | 419474003 | Allergy to mold (disorder) |
| 1 | 420881009 | Allergic disorder by allergen type (disorder) |
| 1 | 426232007 | Environmental allergy (disorder) |
This first level makes sense, although it uses "Allergy" where we use "Intolerance" in HL7 (because "Atopy" is the better word for Non-Allergic Intolerance). "Allergy to mold" is obviously a mistaken oddball. The category "by allergen type" becomes interesting.
Allergic disorder by allergen type (420881009)
SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
WHERE c.conceptId2 = 420881009
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
ORDER BY c.distance
1598 items under 420881009, these go 9 levels deep:
| 0 | 420881009 | Allergic disorder by allergen type (disorder) | 138875005 | SNOMED CT Concept (SNOMED RT+CTV3) |
| 1 | 7114006 | Hypersensitivity to foreign antigen (disorder) | 138875005 | SNOMED CT Concept (SNOMED RT+CTV3) |
| 1 | 37545009 | Hypersensitivity to endogenous antigen (disorder) | 138875005 | SNOMED CT Concept (SNOMED RT+CTV3) |
| 1 | 419199007 | Allergy to substance (disorder) | 105590001 | Substance (substance) |
| 1 | 402594000 | Allergy to plant (disorder) | 31006001 | Kingdom Plantae (organism) |
| 1 | 294918007 | Enteral and supplement feeds allergy (disorder) | 346415002 | Enteral+supplement feeds (product) |
| 1 | 294919004 | Oral rehydration salts allergy (disorder) | 346599006 | Oral rehydration salts (product) |
| 1 | 294917002 | Multiple electrolyte infusion allergy (disorder) | 346354002 | Multiple electrolyte infusion (product) |
| 1 | 232350006 | House dust mite allergy (disorder) | 260147004 | House dust mite (organism) |
| 1 | 405649006 | Tape allergy (disorder) | 84756000 | Adhesive tape, device (physical object) |
| 1 | 418968001 | Allergy to gauze (disorder) | 350785008 | Gauzes (physical object) |
There are strange concepts here, that seem to belong far lower in the tree:
- Hypersensitivity to endogenous antigen (disorder)
- Hypersensitivity to foreign antigen (disorder)
- Allergy to substance (disorder)
- Enteral and supplement feeds allergy (disorder)
- Oral rehydration salts allergy (disorder)
- Multiple electrolyte infusion allergy (disorder)
- Oral rehydration salts allergy (disorder)
- Allergy to gauze (disorder)
- Allergy to tape (disorder)
- Enteral and supplement feeds allergy (disorder)
- Allergy to plant (disorder)
- Allergy to animal (does not even exist)
- House dust mite allergy (disorder)
- Allergy to substance (disorder)
Strange. Well, we did notice some of these misplaced items in the previous level too, e.g. mold. On the other hand, allergy to substance (419199007) is what we now really care about:
Allergy to substance (419199007)
SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
WHERE c.conceptId2 = 419199007
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
ORDER BY c.distance
There are strange things in here, like "Allergy to anti-infective agent (disorder)" (91929009). Under this, of course, we find allergies to specific anti-infective agents. I think they might have created too many of these allergy concepts on all sorts of generalizations of substances which nobody really needs. Certainly an allergy to anti-infective agent is rather useless. We should not even admit this as it will cause in-actionable warnings.
SNOMED Allergies to NDF-RT
The idea is that all of the SNOMED allergy to substance concepts should indeed have such a substance (or even more than one) connected to them as the "causative agent". Lo and behold this is the same concept name as used by the HL7 participation for the same purpose.
SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
INNER JOIN UMLS.SCT_RELATIONSHIP ar
ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003/*causative agent*/)
INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
WHERE c.conceptId2 = 419199007 -- Allergy to substance
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
ORDER BY c.distance
Now we can find all those allergies whose causative substance is a drug. First we have to find out what a "substance is a drug" actually means. Do we mean some chemical? Well, that would be a substance. We mean "not a natural product"? But then the chicken egg gets quickly excluded which we still need because of the vaccines that may contain traces of the chicken egg used to grow the virus material.
SELECT DISTINCT c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName, aic.distance, ac.conceptId, ac.fullySpecifiedName
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
INNER JOIN UMLS.SCT_RELATIONSHIP ar ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003)
INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
INNER JOIN UMLS.SCT_ISA_CLOSURE aic ON(aic.conceptId1 = a.conceptId)
INNER JOIN UMLS.SCT_CONCEPT ac ON(ac.conceptId = aic.conceptId2)
WHERE c.conceptId2 = 419199007 -- Allergy to substance
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
ORDER BY c.distance, d.conceptId, a.conceptId, aic.distance
Perhaps we can say, allergies whose causative agent exists in any drug. But that question cannot be answered with the material we have, because before the FDA makes people register inactive ingredients, no code system will be complete in that regard.
So, when we now connect to NDF-RT classes, we must be careful to interpret the results correctly. Those we cannot map, are not necessarily true misses, but they are also not necessarilt to be excluded.
Instead we should use NDF-RT classes, specifically chemical structures. NDF-RT classes for chemical structure are the generalizations among NDF-RT's Ingredient/Chemical kind of concepts. On this we can make a mapping from SNOMED CT to NDF-RT. The general SNOMED CT to NDF-RT map through the UMLS is:
WITH ndf_cui_map AS (
SELECT nr.source_id AS ndf_id, nr.target_value AS cui
FROM UMLS.NDFRT_ROLE nr
WHERE nr.role_def_id = 'P141'
), snomed_cui_map AS (
SELECT sc.code AS sct_id, cui
FROM UMLS.MRCONSO sc
WHERE sc.SAB = 'SNOMEDCT'
), snomed_ndf_map AS (
SELECT sct_id, ndf_id
FROM ndf_cui_map INNER JOIN snomed_cui_map USING(cui)
)
Now let's see the cardinalities and misses of this mapping for our purpose of chemical/ingredient.
WITH snomed_ndf_map AS (
...
), ndf_substance AS (
SELECT id AS ndf_id
FROM UMLS.NDFRT_CONCEPT WHERE kind = '11'
) SELECT count(DISTINCT ndf_id), count(DISTINCT sct_id), count(DISTINCT sct_id)/count(DISTINCT ndf_id)
FROM ndf_substance
LEFT OUTER JOIN snomed_ndf_map USING(ndf_id)
- NDF: 8421
- SNOMED: 7462 (88.6%)
Not too bad. Let's go the inverse:
WITH snomed_ndf_map AS (
...
), snomed_substance AS (
SELECT conceptId1 AS sct_id
FROM UMLS.SCT_ISA_CLOSURE WHERE conceptId2='105590001'
) SELECT count(DISTINCT sct_id), count(DISTINCT ndf_id), count(DISTINCT ndf_id)/count(DISTINCT sct_id)
FROM snomed_substance
LEFT OUTER JOIN snomed_ndf_map USING(sct_id)
- SNOMED: 23368
- NDF: 4321 (18.4%)
Bad, but it's unfair to go in with unrestricted SNOMED:
WITH snomed_ndf_map AS (
...
), snomed_allergen AS (
SELECT sct_id FROM snomed_substance
WHERE EXISTS (SELECT 1 FROM UMLS.SCT_RELATIONSHIP ar
INNER JOIN UMLS.SCT_ISA_CLOSURE a
ON(a.conceptId1 = ar.conceptId1)
WHERE ar.relationshipType = 246075003/*causative agent*/
AND ar.conceptId2 = sct_id
AND a.conceptId2 = 419199007/*Allergy to substance*/)
) SELECT count(DISTINCT sct_id), count(DISTINCT ndf_id), count(DISTINCT ndf_id)/count(DISTINCT sct_id)
FROM snomed_allergen
LEFT OUTER JOIN snomed_ndf_map USING(sct_id)
- SNOMED: 1507
- NDF: 933 (61.9%)
Now inverse again and fully constrained:
SELECT count(DISTINCT ndf_id), count(DISTINCT b.sct_id), count(DISTINCT sct_id)/count(DISTINCT ndf_id)
FROM ndf_substance
LEFT OUTER JOIN snomed_ndf_map a USING(ndf_id)
LEFT OUTER JOIN snomed_allergen b ON(b.sct_id = a.sct_id)
- NDF: 8421
- SCT: 915 10.8%
What a loss by forcing allergen! It means that there are many potential allergies that have no code in SNOMED. One more reason to make an allergy structure rather than simple allergy code.
There may still be some items here which cannot be properly mapped. But it's good enough for a start.
What is Actually Required?
To see what people really enter for allergies, I am scanning the RMRS Allergy Hx observations (deidentified, no human subjects). I do:
create table gschadow.allergy_terms as
select vc.code AS code, vc.name AS name, VALUE_MODIFIER_TEXT AS freeText, count(1) AS count
from rmrs.clinical_variable
inner join rmrs.concept vc ON(vc.sys_id = value_if_type_is_coded_sys_id AND vc.code = value_if_type_is_coded_code)
where SERVICE_SYS_ID = 1
and service_code = '7717'
group by vc.code, vc.name, VALUE_MODIFIER_TEXT
- 224375 instances of Allergy Hx
- 1040 distinct values (including the write-ins)
- only 57 different write-ins, very small number of instances (<100)
- looks very doable to map
Top-30 values are here:
| CODE | NAME | WRITE IN | FREQ |
| 16404 | no known allergies | 119360 | |
| 93 | PENICILLINS | 15296 | |
| 4552 | none | 13292 | |
| 13316 | no known drug allergies | 10400 | |
| 144 | Codeine | 8467 | |
| 3557 | PCN allergy | 6597 | |
| 72 | Aspirin | 3520 | |
| 555 | Sulfamethoxazole | 3008 | |
| 6020 | sulfa | 2923 | |
| 476 | Penicillin V | 2099 | |
| 4455 | drug allergy other | 2051 | |
| 6126 | morphine | 2041 | |
| 568 | SULFONAMIDES | 1989 | |
| 2637 | Amoxicillin | 1724 | |
| 6588 | Cefaclor | 1622 | |
| 97 | Cephalexin | 1621 | |
| 1777 | Ibuprofen | 1418 | |
| 145 | Meperidine | 1367 | |
| 44 | Sodium SerPl? Qn | 769 | |
| 346 | Erythromycin | 669 | |
| 83 | Tetracycline | 598 | |
| 3706 | sulfa allergy | 556 | |
| 560 | Darvon-N | 550 | |
| 11147 | Ciprofloxacin | 549 | |
| 4872 | codeine allergy | 525 | |
| 119 | Prochlorperazine | 514 | |
| 4915 | no response | 482 | |
| 78 | Acetaminophen | 435 | |
| 12983 | Acetam/Hydrocodn | 432 | |
| 94 | Ampicillin | 421 | |
| 5917 | Penicillin | 391 |
We can map this via NDC or ICD-9. 32 of these allergies (6690 instances) were using LOINC codes for plasma levels of the chemical in question.
| WHAT | COUNT CODE | COUNT |
| no match | 345 | 168542 |
| NDC | 509 | 40713 |
| ICD-9 only | 74 | 11181 |
| LN only | 27 | 3893 |
Most (143052) of the 168542 instances of no match is for no allergies:
- no known allergies (119360)
- none (13292) and
- no known drug allergies (10400)
So we are not overly concerned about the 25490 occurrences, of which only 82 distinct terms have 10 or more instance.
Here is the query block to get to the above numbers:
WITH map AS ( SELECT source_code as code, target_sys_id as other_sys_id, target_code AS other_code FROM rmrs.concept_mapping WHERE source_sys_id = 1 UNION SELECT target_code as code, source_sys_id as other_sys_id, source_code AS other_code FROM rmrs.concept_mapping WHERE target_sys_id = 1 ), a AS ( SELECT * FROM Allergy_terms a LEFT OUTER JOIN ( SELECT code, name, count, other_sys_id, other_sys_name, count(1) AS mcount FROM ( SELECT a.code, a.name, a.count, other_sys_id, s.name AS other_sys_name, other_code, c.name AS other_name FROM Allergy_terms a INNER JOIN map m ON(m.code = a.code) INNER JOIN rmrs.code_system s ON(s.sys_id = m.other_sys_id) INNER JOIN rmrs.concept c ON(c.sys_id = m.other_sys_id and c.code = m.other_code) WHERE other_sys_id IN (357, 2, 41) ) u GROUP BY code, name, count, other_sys_id, other_sys_name ) USING(code, name, count) -- WHERE count > 1 ORDER BY count DESC, name ASC, other_sys_name ASC ) -- SELECT * FROM a SELECT 'no match', count(DISTINCT CODE), SUM(COUNT) FROM a WHERE other_sys_id IS NULL UNION ALL SELECT 'NDC', count(DISTINCT CODE), SUM(COUNT) FROM a WHERE other_sys_id = 357 UNION ALL SELECT 'ICD-9 only', count(DISTINCT CODE), SUM(COUNT) FROM ( SELECT CODE, COUNT FROM a WHERE other_sys_id = 2 MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 357 ) UNION ALL SELECT 'LN only', count(DISTINCT CODE), SUM(COUNT) FROM ( SELECT CODE, COUNT FROM a WHERE other_sys_id = 41 MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 2 MINUS SELECT CODE, COUNT FROM a WHERE other_sys_id = 357 )
Here is how we can map it to the maximum extent:
DROP TABLE Allergy_term_mapping
GO
CREATE TABLE Allergy_term_mapping AS
WITH map AS (
SELECT source_code as code, target_sys_id as other_sys_id, target_code AS other_code
FROM rmrs.concept_mapping WHERE source_sys_id = 1
UNION
SELECT target_code as code, source_sys_id as other_sys_id, source_code AS other_code
FROM rmrs.concept_mapping WHERE target_sys_id = 1
), a AS (
SELECT a.code, a.name, a.freetext, a.count, ndc.other_code as ndc_code, icd.other_code as icd_code
FROM Allergy_terms a
LEFT OUTER JOIN map ndc ON(ndc.code = a.code AND ndc.other_sys_id = 357)
LEFT OUTER JOIN map icd ON(icd.code = a.code AND icd.other_sys_id = 2)
)
SELECT * FROM a
GO
SELECT a.*
FROM Allergy_terms a
MINUS
SELECT a.*
FROM Allergy_terms a
INNER JOIN Allergy_term_mapping m ON(m.code = a.code)
INNER JOIN SPL_DRUGLISTING l ON(l.ndc = m.ndc_code)
MINUS
SELECT a.*
FROM Allergy_terms a
INNER JOIN umls.mrconso c1 ON(lower(c1.str) = lower(a.name) and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
MINUS
SELECT a.*
FROM Allergy_terms a
INNER JOIN umls.mrconso c1
ON(lower(c1.str) = regexp_replace(lower(a.name), ' (allergy|serpl qn|meds|inj|tab|tabs|vag gel|impl|sr cap|pca|drugs|susc|tapes|strip|top sol|sodium|potassium|ont|tape|syrup|eqv|othr|syr|syrin|tn|po|1-6|rx|immun|op sol|oral|op|vacc|kit|intolerance|allergy|product|type|drug|dpi|meds|eye|sr|chew|susp|alrgy|order|op|abn|hcl|na|acid|vag cr|aci|pf inj|sl|est|elix|ds|scrub|opo|liq|inhaler|cr|bld qn|unconj serpl bld qn|top susc|neb sol|inhl)[''*]?$','')
and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
MINUS
SELECT a.*
FROM Allergy_terms a
INNER JOIN umls.mrconso c1
ON(lower(c1.str) = regexp_replace(lower(a.name), 's[''*]?$','')
and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
and that leaves over 461 and very few actual allergies with high instance counts. We may not even need the NDC code mapping, because that only adds 70 more.
Here is the actual mapping now, we added sulfa -> sulfonamides manually.
CREATE TABLE Allergy_term_mesh_map AS
SELECT * FROM Allergy_terms a LEFT OUTER JOIN (
-- SELECT * FROM (
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
FROM Allergy_terms a
INNER JOIN umls.mrconso c1 ON(lower(c1.str) = lower(a.name) and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
FROM Allergy_terms a
INNER JOIN umls.mrconso c1
ON(lower(c1.str) = regexp_replace(lower(a.name), ' (-n|allergy|serpl qn|meds|inj|tab|tabs|vag gel|impl|sr cap|pca|drugs|susc|tapes|strip|top sol|sodium|potassium|ont|tape|syrup|eqv|othr|syr|syrin|tn|po|1-6|rx|immun|op sol|oral|op|vacc|kit|intolerance|allergy|product|type|drug|dpi|meds|eye|sr|chew|susp|alrgy|order|op|abn|hcl|na|acid|vag cr|aci|pf inj|sl|est|elix|ds|scrub|opo|liq|inhaler|cr|bld qn|unconj serpl bld qn|top susc|neb sol|inhl)[''*]?$','')
and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION
SELECT a.code, a.name, a.count, u.code as MeSH_code, u.name as mesh_name
FROM Allergy_terms a
INNER JOIN umls.mrconso c1
ON(lower(c1.str) = regexp_replace(lower(a.name), 's[''*]?$','')
and lat='ENG')
INNER JOIN umls.MeSH_Chemical u ON(u.cui = c1.cui)
UNION
SELECT a.code, a.name, a.count, 'D013449' as MeSH_code, 'Sulfonamides' as mesh_name
FROM Allergy_terms a
WHERE lower(a.name) in('sulfa', 'sulfa allergy')
) USING(code, name, count)
With this:
SELECT count(code), sum(count) FROM ( --SELECT code, count FROM allergy_term_mesh_map --MINUS SELECT code, count FROM allergy_term_mesh_map WHERE mesh_code is not null )
we can match 521 terms for 72796 instances and we miss 490 terms with 152410 instances, of which 143052 are explicitly no allergies, so that makes only 9358 instances missed.
Conclusions from SNOMED
Next Step
For immediate purpose we can use SNOMED allergy concepts of the 420881009 descendents.
SELECT * FROM mw.SNOMED_FINDING_CLOSURE c INNER JOIN mw.SNOMED_FINDING d USING(code) INNER JOIN mw.SNOMED_FINDING_NAME n USING(code) WHERE c.ancestor_or_self_code = '420881009'
which is done in db2xml as snomed-allergies-rim-template.xml.
But this doesn't help us if we want to do allergy checking, because we will eventually have NDF-RT chemical structures only. This, in fact, calls for us to create the NDF-RT chemical structures for our SPL drugs also.
- link SNOMED allergy agents to NDF-RT ingredient/chemicals (88.6% done)
- annotate SPL drugs with MeSH (was: NDF-RT ingredient/chemical) classes, see below
- describe SNOMED allergy observations in the HL7 model referring to the MeSH class as MaterialKind?.
- create the Composition which either picks up a defined allergy term (SNOMED) or a substance class.
- add empirical data about occurrence of allergies from various sources, including RMRS
- remove the detailed SNOMED allergy codes from the problem list codes?
The SNOMED allergies, we can describe thus:
- From SNOMED allergy finding we map to SNOMED allergen
- From SNOMED allergen we map to MeSH chemical class
Here is the SNOMED Allergy to MeSH Allergen Map:
SELECT --+FIRST_ROWS
c.distance, d.conceptId, d.fullySpecifiedName, a.conceptId, a.fullySpecifiedName,
sc.cui, mc.*, mch.*
FROM UMLS.SCT_ISA_CLOSURE c
INNER JOIN UMLS.SCT_CONCEPT d ON(d.conceptId = c.conceptId1)
INNER JOIN UMLS.SCT_RELATIONSHIP ar
ON(ar.conceptId1 = d.conceptId AND ar.relationshipType = 246075003/*causative agent*/)
INNER JOIN UMLS.SCT_CONCEPT a ON(a.conceptId = ar.conceptId2)
INNER JOIN UMLS.MRCONSO sc ON(sc.code = a.conceptId AND sc.sab = 'SNOMEDCT')
INNER JOIN unii_mesh_class mc ON(mc.cui = sc.cui OR mc.class_cui = sc.cui)
INNER JOIN UMLS.MeSH_chemical mch ON(mch.code = mc.class_code)
WHERE c.conceptId2 = 419199007 -- Allergy to substance
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = d.conceptId)
That's too big, only for review, here is a smaller version, same essential content:
CREATE TABLE sct_allergy_mesh_map AS
SELECT c.conceptId1 AS allergy_sct_id, ar.conceptId2 AS allergen_sct_id, mc.class_code AS mesh_class_code
FROM UMLS.SCT_ISA_CLOSURE c
LEFT OUTER JOIN UMLS.SCT_RELATIONSHIP ar
ON(ar.conceptId1 = c.conceptId1 AND ar.relationshipType = 246075003/*causative agent*/)
LEFT OUTER JOIN UMLS.MRCONSO sc ON(sc.code = ar.conceptId2 AND sc.sab = 'SNOMEDCT')
LEFT OUTER JOIN unii_mesh_class mc ON(mc.cui = sc.cui OR mc.class_cui = sc.cui)
WHERE c.conceptId2 = 419199007 -- Allergy to substance
/*
AND NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = c.conceptId1)
*/
How much does that cover?
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map
WHERE NOT EXISTS (SELECT 1 FROM UMLS.SCT_ISA_CLOSURE x
WHERE x.conceptId2 = 362955004 -- Inactive Concept
AND x.conceptId1 = allergy_sct_id)
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map WHERE allergen_sct_id IS NOT NULL
UNION ALL
SELECT COUNT(DISTINCT allergy_sct_id) FROM sct_allergy_mesh_map WHERE mesh_class_code IS NOT NULL
UNION ALL
SELECT COUNT(DISTINCT allergen_sct_id) FROM sct_allergy_mesh_map
UNION ALL
SELECT COUNT(DISTINCT mesh_class_code) FROM sct_allergy_mesh_map
That is 1587 allergy concepts (1509 allergens) of which 761 allergy concepts have a MeSH allergen class for 682 unique such classes.
This was using the UNII code allergy classes, we could, however, use the MeSH allergy class mapping we did for UNII and apply it to SNOMED directly, that gives us a better coverage here, and it will tell us clearer how good our selection of MeSH classes are. First the direct map to MeSH chemical:
INSERT INTO sct_allergy_mesh_map SELECT allergy_sct_id, allergen_sct_id, mc.code FROM sct_allergy_mesh_map sa -- INNER JOIN UMLS.SCT_CONCEPT c ON(c.conceptId = sa.allergen_sct_id) INNER JOIN UMLS.MRCONSO sc ON(sc.code = sa.allergen_sct_id AND sc.sab = 'SNOMEDCT') INNER JOIN umls.MeSH_Chemical mc ON(mc.cui = sc.cui) WHERE sa.MeSH_class_code IS NULL
brings us up to 1011 allergens mapped to 928 unique classes. Now delete the unmapped entries which are now mapped:
DELETE FROM sct_allergy_mesh_map d
WHERE mesh_class_code IS NULL
AND EXISTS (SELECT 1 FROM sct_allergy_mesh_map x
WHERE x.allergen_sct_id = d.allergen_sct_id
AND x.mesh_class_code IS NOT NULL)
943 down
Next one relationship removed:
INSERT INTO sct_allergy_mesh_map
SELECT allergy_sct_id, allergen_sct_id, mc.code
FROM sct_allergy_mesh_map sa
INNER JOIN UMLS.MRCONSO sc ON(sc.code = sa.allergen_sct_id AND sc.sab = 'SNOMEDCT')
INNER JOIN umls.mrrel r1 ON(r1.cui1 = sc.cui AND r1.sab='MSH' AND (r1.REL IN('PAR', 'RB') OR (r1.REL = 'RO' AND r1.RELA IS NULL)))
INNER JOIN umls.MeSH_Chemical mc ON(mc.cui = r1.cui2)
WHERE sa.MeSH_class_code IS NULL
Takes far too long. Also, not sure we want this, because we should not be abstracting the SNOMED allergy concept.
DROP TABLE UMLS.SCT_ALLERGY_MESH_MAP
GO
CREATE TABLE UMLS.SCT_ALLERGY_MESH_MAP AS
SELECT DISTINCT sa.*, mc.name as MESH_CLASS_DISPLAYNAME
FROM sct_allergy_mesh_map sa
INNER JOIN UMLS.MESH_Chemical mc ON(Mesh_class_code = code)
GO
CREATE INDEX UMLS.SCT_ALLERGY_MESH_MAP_SCT ON UMLS.SCT_ALLERGY_MESH_MAP(ALLERGY_SCT_ID)
Anyway, we can proceed with
- exporting the SNOMED allergy definitions along with normal problem list
- add XML annotation to SPL labels
The enrichment of SPL labels from database content is something new for db2xml.
The VA/KP problem list subset is the one we use too, as the full SNOMED findings are too much.
create table umls.sct_vakp_subset as select distinct scd_id AS conceptId from smantena.yatbu go create unique index umls.sct_vakp_subset_pk on umls.sct_vakp_subset(conceptId)
There are also very few essential allergies here.
Done.
Annotate SPL Drugs
The annotation of SPL drugs with allergens through NDF-RT is a big pain. This is described in UNII to NDF-RT. Instead we use MeSH directly. We now have a UNII to MeSH mapping (unii_mesh_class, created in UNII to MeSH) which gets 76% of the UNII codes mapped just by exact name match through UMLS.
HL7 Allergy RMIM
The HL7 allergy model allows us to specify the allergen in the following structure:
- Concern (Act)
- subject
- AllergyObservation
- Participation "causative agent" (CAGNT)
- Role (AdministrableMaterial)
- MaterialKind
http://www.hl7.org/v3ballot/html/domains/uvpc/editable/images/REPC_RM000324UV.png
![(please configure the [header_logo] section in trac.ini)](/mw/chrome/site/logo100.png)