SNOMED Problem List Concepts and Synonyms from UMLS
Older method using the UMLS as input.
In order to create the SNOMED problem list we want all the codes underneath the SNOMED finding hierarchy. We had previously studied the VA/KP problem list subset and found that it too had all codes from the finding hierarchy but we also found that this subset was missing about 10% of relevant problems just for no other reason than that they were not included in the subset even though they exist in SNOMED.
Instead of using the UMLS as a source, we will now move to using original SNOMED distribution files, which are more straight forward, actually pretty simple. There are 3 tables:
- sct_concept (conceptId, fullySpecifiedName, ...)
- sct_relationship (conceptId1, relationshipTypeId, conceptId2, ...)
- sct_description (conceptId, term)
and that's it. Now the relationship types are also SNOMED CT concepts. So, we can simply say:
SELECT rc.conceptId, rc.FULLYSPECIFIEDNAME, count(1) FROM UMLS.SCT_RELATIONSHIP r INNER JOIN UMLS.SCT_CONCEPT rc ON(rc.conceptId = r.relationshipType) GROUP BY rc.conceptId, rc.FULLYSPECIFIEDNAME ORDER BY count(1) DESC
Which gives us:
| CONCEPTID | FULLYSPECIFIEDNAME | COUNT(1) |
| 116680003 | Is a (attribute) | 499781 |
| 363698007 | Finding site (attribute) | 86403 |
| 246456000 | Episodicity (attribute) | 68225 |
| 263502005 | Clinical course (attribute) | 68199 |
| 246112005 | Severity (attribute) | 68095 |
| 116676008 | Associated morphology (attribute) | 58667 |
| 260870009 | Priority (attribute) | 53546 |
| 260686004 | Method (attribute) | 52825 |
| 123005000 | Part of (attribute) | 47502 |
| 168666000 | SAME AS (attribute) | 40500 |
| 260507000 | Access (attribute) | 40289 |
| 405813007 | Procedure site - Direct (attribute) | 33447 |
| 149016008 | MAY BE A (attribute) | 28621 |
| 363714003 | Interprets (attribute) | 25336 |
| 246075003 | Causative agent (attribute) | 22371 |
| 127489000 | Has active ingredient (attribute) | 18961 |
| 272741003 | Laterality (attribute) | 16351 |
| 411116001 | Has dose form (attribute) | 9126 |
| 246093002 | Component (attribute) | 9120 |
| 246454002 | Occurrence (attribute) | 8940 |
| 418775008 | Finding method (attribute) | 8461 |
| 405814001 | Procedure site - Indirect (attribute) | 8225 |
| 363700003 | Direct morphology (attribute) | 7997 |
| 363705008 | Has definitional manifestation (attribute) | 5737 |
| 363704007 | Procedure site (attribute) | 5270 |
| 424226004 | Using device (attribute) | 5177 |
| 363713009 | Has interpretation (attribute) | 5055 |
| 363701004 | Direct substance (attribute) | 4674 |
| 370124000 | REPLACED BY (attribute) | 4433 |
| 363699004 | Direct device (attribute) | 3774 |
| 363703001 | Has intent (attribute) | 3636 |
| 408732007 | Subject relationship context (attribute) | 3415 |
| 408731000 | Temporal context (attribute) | 3412 |
| 260669005 | Approach (attribute) | 3217 |
| 363702006 | Has focus (attribute) | 3106 |
| 246090004 | Associated finding (attribute) | 2913 |
| 419066007 | Finding informer (attribute) | 2497 |
| 408729009 | Finding context (attribute) | 2227 |
| 255234002 | After (attribute) | 2200 |
| 42752001 | Due to (attribute) | 2158 |
| 116686009 | Has specimen (attribute) | 1962 |
| 47429007 | Associated with (attribute) | 1809 |
| 363589002 | Associated procedure (attribute) | 1579 |
| 424361007 | Using substance (attribute) | 1570 |
| 118169006 | Specimen source topography (attribute) | 1346 |
| 408730004 | Procedure context (attribute) | 1184 |
| 425391005 | Using access device (attribute) | 1169 |
| 370125004 | MOVED TO (attribute) | 1151 |
| 246513007 | Revision status (attribute) | 1127 |
| 118171006 | Specimen procedure (attribute) | 508 |
| 363709002 | Indirect morphology (attribute) | 478 |
| 424244007 | Using energy (attribute) | 399 |
| 370135005 | Pathological process (attribute) | 246 |
| 370133003 | Specimen substance (attribute) | 238 |
| 405815000 | Procedure device (attribute) | 191 |
| 370131001 | Recipient category (attribute) | 113 |
| 370130000 | Property (attribute) | 90 |
| 363710007 | Indirect device (attribute) | 67 |
| 118168003 | Specimen source morphology (attribute) | 64 |
| 405816004 | Procedure morphology (attribute) | 54 |
| 410675002 | Route of administration (attribute) | 51 |
| 370132008 | Scale type (attribute) | 49 |
| 118170007 | Specimen source identity (attribute) | 48 |
| 370129005 | Measurement method (attribute) | 40 |
| 159083000 | WAS A (attribute) | 12 |
| 370134009 | Time aspect (attribute) | 1 |
ISA Closure
Now let's make the SCT_ISA_CLOSURE. Here we employ a variant of our usual algorithm. The reason is that the SNOMED file seems to have redundant ISA relationships which leads to transitive shortcuts and weird levels in the ISA tree.
DROP TABLE UMLS.SCT_ISA_CLOSURE GO CREATE TABLE UMLS.SCT_ISA_CLOSURE AS SELECT conceptId1, conceptId2, 1 AS distance FROM UMLS.SCT_RELATIONSHIP WHERE relationshipType = 116680003
499781 record(s)
Repeat until the number of rows inserted does not increase any more
CREATE TABLE UMLS.SCT_ISA_CLOSURE2 AS
SELECT conceptId1, conceptId2, MAX(distance) AS distance
FROM (
SELECT r1.conceptId1, r2.conceptId2, r1.distance + r2.distance AS distance
FROM UMLS.SCT_ISA_CLOSURE r1
INNER JOIN UMLS.SCT_ISA_CLOSURE r2 ON(r2.conceptId1 = r1.conceptId2)
UNION ALL
SELECT * FROM UMLS.SCT_ISA_CLOSURE
) GROUP BY conceptId1, conceptId2
GO
DROP TABLE UMLS.SCT_ISA_CLOSURE
GO
ALTER TABLE UMLS.SCT_ISA_CLOSURE2 RENAME TO SCT_ISA_CLOSURE
- 1197540 record(s) affected 3781/ms
- 3074165 record(s) affected 13860/ms
- 5261624 record(s) affected 60312/ms
- 5544600 record(s) affected 114375/ms
- 5546792 record(s) affected 121219/ms
- 5546792 record(s) affected 121078/ms
I want to also note that this has been much faster than my usual algorithm, even with indexes available, it was going like that:
- 697759 record(s) affected
- 1876625 record(s) affected 94454/ms
- 2187459 record(s) affected 623984/ms
- 282976 record(s) affected 467609/ms
- 2192 record(s) affected 396000/ms
- 0 record(s) affected 356266/ms
Anyway, I am always happy seeing things converge so rapidly!
Finally make the closure reflexive too and add the indexes:
INSERT INTO UMLS.SCT_ISA_CLOSURE SELECT conceptId, conceptId, 0 AS distance FROM UMLS.SCT_CONCEPT GO CREATE UNIQUE INDEX UMLS.SCT_ISA_CLOSURE_IDX1 ON UMLS.SCT_ISA_CLOSURE(conceptId1, conceptId2) GO CREATE UNIQUE INDEX UMLS.SCT_ISA_CLOSURE_IDX2 ON UMLS.SCT_ISA_CLOSURE(conceptId2, conceptId1)
376046 record(s) affected 531/ms, 22500/ms, 20031/ms
Indeed, now the transitive shortcuts are gone. (It used to be 336 just for allergies.)
SELECT COUNT(1)
FROM UMLS.SCT_ISA_CLOSURE c1
INNER JOIN UMLS.SCT_ISA_CLOSURE c2
ON(c2.conceptId2 = c1.conceptId2 AND c2.distance >= c1.distance)
-- ON(c2.conceptId2 = c1.conceptId2 AND c2.distance = 1 AND c1.distance = 1)
INNER JOIN UMLS.SCT_ISA_CLOSURE ic
ON(ic.conceptId1 = c1.conceptId1 AND ic.conceptId2 = c2.conceptId1 AND ic.distance > 0)
In fact, that query will not run to completion before my temporary segment runs out of space. So we have to believe that it does not happen. Even when probing with some of those lesser alternatives (distance = 1, etc.) But there should be none, because if there were, only the longer path should have been chosen by our algorithm above.
Names
SNOMED has lots of terms and synonyms, but still why not use the UMLS source for that? We create Usable Concept Names.
Appendix
Eye Diseases
An attempt to get all eye diseases:
with eye_disease as (
select distinct conceptid, FULLYSPECIFIEDNAME, snomedid
from umls.sct_concept d
inner join umls.sct_isa_closure dg on(dg.CONCEPTID1=d.conceptid)
inner join umls.SCT_RELATIONSHIP r on(r.CONCEPTID1=d.conceptid)
-- inner join umls.sct_concept tc on(tc.conceptid = r.conceptid2)
inner join umls.sct_isa_closure sg on(sg.CONCEPTID1=r.conceptid2)
-- inner join umls.sct_concept sgc on(sgc.conceptid = sg.conceptid2)
where -- lower(d.FULLYSPECIFIEDNAME) like '%glaucom%'
dg.conceptid2 = 404684003 -- finding
and r.relationshiptype = 363698007 -- finding site
and sg.conceptid2 in(371398005, 119263007, 302151004, 280951005)
and not exists (select 1 from umls.sct_isa_closure x where x.conceptid1 = d.conceptid and x.conceptid2 = 362955004)
-- order by d.conceptid --, tc.conceptid, distance
), eye_disease_with_class as (
select dg.conceptid2 as class, conceptid as concept, fullyspecifiedname as name
from eye_disease d
inner join umls.sct_relationship dg on(dg.CONCEPTID1=d.conceptid and relationshipType = 116680003)
) select level, concept, lpad(' - ', 3*level)||name
from eye_disease_with_class d
connect by prior concept = class
start with not exists (select 1 from eye_disease x where x.conceptid = d.class)
the ordered output is not good yet
![(please configure the [header_logo] section in trac.ini)](/mw/chrome/site/logo100.png)