SNOMED Problem List Concepts and Synonyms from UMLS

Older method using the UMLS as input.

In order to create the SNOMED problem list we want all the codes underneath the SNOMED finding hierarchy. We had previously studied the VA/KP problem list subset and found that it too had all codes from the finding hierarchy but we also found that this subset was missing about 10% of relevant problems just for no other reason than that they were not included in the subset even though they exist in SNOMED.

Instead of using the UMLS as a source, we will now move to using original SNOMED distribution files, which are more straight forward, actually pretty simple. There are 3 tables:

  1. sct_concept (conceptId, fullySpecifiedName, ...)
  2. sct_relationship (conceptId1, relationshipTypeId, conceptId2, ...)
  3. sct_description (conceptId, term)

and that's it. Now the relationship types are also SNOMED CT concepts. So, we can simply say:

SELECT rc.conceptId, rc.FULLYSPECIFIEDNAME, count(1)
   FROM UMLS.SCT_RELATIONSHIP r
   INNER JOIN UMLS.SCT_CONCEPT rc ON(rc.conceptId = r.relationshipType)
GROUP BY rc.conceptId, rc.FULLYSPECIFIEDNAME
ORDER BY count(1) DESC

Which gives us:

CONCEPTIDFULLYSPECIFIEDNAMECOUNT(1)
116680003Is a (attribute)499781
363698007Finding site (attribute)86403
246456000Episodicity (attribute)68225
263502005Clinical course (attribute)68199
246112005Severity (attribute)68095
116676008Associated morphology (attribute)58667
260870009Priority (attribute)53546
260686004Method (attribute)52825
123005000Part of (attribute)47502
168666000SAME AS (attribute)40500
260507000Access (attribute)40289
405813007Procedure site - Direct (attribute)33447
149016008MAY BE A (attribute)28621
363714003Interprets (attribute)25336
246075003Causative agent (attribute)22371
127489000Has active ingredient (attribute)18961
272741003Laterality (attribute)16351
411116001Has dose form (attribute)9126
246093002Component (attribute)9120
246454002Occurrence (attribute)8940
418775008Finding method (attribute)8461
405814001Procedure site - Indirect (attribute)8225
363700003Direct morphology (attribute)7997
363705008Has definitional manifestation (attribute)5737
363704007Procedure site (attribute)5270
424226004Using device (attribute)5177
363713009Has interpretation (attribute)5055
363701004Direct substance (attribute)4674
370124000REPLACED BY (attribute)4433
363699004Direct device (attribute)3774
363703001Has intent (attribute)3636
408732007Subject relationship context (attribute)3415
408731000Temporal context (attribute)3412
260669005Approach (attribute)3217
363702006Has focus (attribute)3106
246090004Associated finding (attribute)2913
419066007Finding informer (attribute)2497
408729009Finding context (attribute)2227
255234002After (attribute)2200
42752001Due to (attribute)2158
116686009Has specimen (attribute)1962
47429007Associated with (attribute)1809
363589002Associated procedure (attribute)1579
424361007Using substance (attribute)1570
118169006Specimen source topography (attribute)1346
408730004Procedure context (attribute)1184
425391005Using access device (attribute)1169
370125004MOVED TO (attribute)1151
246513007Revision status (attribute)1127
118171006Specimen procedure (attribute)508
363709002Indirect morphology (attribute)478
424244007Using energy (attribute)399
370135005Pathological process (attribute)246
370133003Specimen substance (attribute)238
405815000Procedure device (attribute)191
370131001Recipient category (attribute)113
370130000Property (attribute)90
363710007Indirect device (attribute)67
118168003Specimen source morphology (attribute)64
405816004Procedure morphology (attribute)54
410675002Route of administration (attribute)51
370132008Scale type (attribute)49
118170007Specimen source identity (attribute)48
370129005Measurement method (attribute)40
159083000WAS A (attribute)12
370134009Time aspect (attribute)1

ISA Closure

Now let's make the SCT_ISA_CLOSURE. Here we employ a variant of our usual algorithm. The reason is that the SNOMED file seems to have redundant ISA relationships which leads to transitive shortcuts and weird levels in the ISA tree.

DROP TABLE UMLS.SCT_ISA_CLOSURE
GO
CREATE TABLE UMLS.SCT_ISA_CLOSURE AS
  SELECT conceptId1, conceptId2, 1 AS distance FROM UMLS.SCT_RELATIONSHIP
  WHERE relationshipType = 116680003

499781 record(s)

Repeat until the number of rows inserted does not increase any more

CREATE TABLE UMLS.SCT_ISA_CLOSURE2 AS
  SELECT conceptId1, conceptId2, MAX(distance) AS distance 
    FROM (
      SELECT r1.conceptId1, r2.conceptId2, r1.distance + r2.distance AS distance
        FROM UMLS.SCT_ISA_CLOSURE r1
          INNER JOIN UMLS.SCT_ISA_CLOSURE r2 ON(r2.conceptId1 = r1.conceptId2)
    UNION ALL
      SELECT * FROM UMLS.SCT_ISA_CLOSURE
  ) GROUP BY conceptId1, conceptId2
GO
DROP TABLE UMLS.SCT_ISA_CLOSURE
GO
ALTER TABLE UMLS.SCT_ISA_CLOSURE2 RENAME TO SCT_ISA_CLOSURE
  1. 1197540 record(s) affected 3781/ms
  2. 3074165 record(s) affected 13860/ms
  3. 5261624 record(s) affected 60312/ms
  4. 5544600 record(s) affected 114375/ms
  5. 5546792 record(s) affected 121219/ms
  6. 5546792 record(s) affected 121078/ms

I want to also note that this has been much faster than my usual algorithm, even with indexes available, it was going like that:

  1. 697759 record(s) affected
  2. 1876625 record(s) affected 94454/ms
  3. 2187459 record(s) affected 623984/ms
  4. 282976 record(s) affected 467609/ms
  5. 2192 record(s) affected 396000/ms
  6. 0 record(s) affected 356266/ms

Anyway, I am always happy seeing things converge so rapidly!

Finally make the closure reflexive too and add the indexes:

INSERT INTO UMLS.SCT_ISA_CLOSURE
  SELECT conceptId, conceptId, 0 AS distance FROM UMLS.SCT_CONCEPT
GO
CREATE UNIQUE INDEX UMLS.SCT_ISA_CLOSURE_IDX1 ON UMLS.SCT_ISA_CLOSURE(conceptId1, conceptId2)
GO
CREATE UNIQUE INDEX UMLS.SCT_ISA_CLOSURE_IDX2 ON UMLS.SCT_ISA_CLOSURE(conceptId2, conceptId1)

376046 record(s) affected 531/ms, 22500/ms, 20031/ms

Indeed, now the transitive shortcuts are gone. (It used to be 336 just for allergies.)

SELECT COUNT(1)
  FROM UMLS.SCT_ISA_CLOSURE c1
   INNER JOIN UMLS.SCT_ISA_CLOSURE c2 
      ON(c2.conceptId2 = c1.conceptId2 AND c2.distance >= c1.distance)
--    ON(c2.conceptId2 = c1.conceptId2 AND c2.distance = 1 AND c1.distance = 1)
   INNER JOIN UMLS.SCT_ISA_CLOSURE ic 
      ON(ic.conceptId1 = c1.conceptId1 AND ic.conceptId2 = c2.conceptId1 AND ic.distance > 0)

In fact, that query will not run to completion before my temporary segment runs out of space. So we have to believe that it does not happen. Even when probing with some of those lesser alternatives (distance = 1, etc.) But there should be none, because if there were, only the longer path should have been chosen by our algorithm above.

Names

SNOMED has lots of terms and synonyms, but still why not use the UMLS source for that? We create Usable Concept Names.

Appendix

Eye Diseases

An attempt to get all eye diseases:

with eye_disease as (
select distinct conceptid, FULLYSPECIFIEDNAME, snomedid
  from umls.sct_concept d
  inner join umls.sct_isa_closure dg on(dg.CONCEPTID1=d.conceptid)
  inner join umls.SCT_RELATIONSHIP r on(r.CONCEPTID1=d.conceptid)
--  inner join umls.sct_concept tc on(tc.conceptid = r.conceptid2)
  inner join umls.sct_isa_closure sg on(sg.CONCEPTID1=r.conceptid2)
--  inner join umls.sct_concept sgc on(sgc.conceptid = sg.conceptid2)
where -- lower(d.FULLYSPECIFIEDNAME) like '%glaucom%'
    dg.conceptid2 = 404684003 -- finding
    and r.relationshiptype = 363698007 -- finding site
    and sg.conceptid2 in(371398005, 119263007, 302151004, 280951005)
    and not exists (select 1 from umls.sct_isa_closure x where x.conceptid1 = d.conceptid and x.conceptid2 = 362955004)
--  order by d.conceptid --, tc.conceptid, distance
), eye_disease_with_class as (
select dg.conceptid2 as class, conceptid as concept, fullyspecifiedname as name
  from eye_disease d
    inner join umls.sct_relationship dg on(dg.CONCEPTID1=d.conceptid and relationshipType = 116680003)
) select level, concept, lpad(' - ', 3*level)||name 
  from eye_disease_with_class d
    connect by prior concept = class 
    start with not exists (select 1 from eye_disease x where x.conceptid = d.class)

the ordered output is not good yet