Original NDF-RT

NDF-RT comes in a distribution outside of UMLS. They use the XML format of Apelon, Inc.. This is a pretty simple format which the NCI-Thesaurus also uses, and I believe even SNOMED is in this format (if you can get the original sources outside the UMLS.)

Step 1 is get the NDF-RT original distribution:

wget ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/NDF-RT_XML.zip
unzip NDF-RT_XML.zip
tail +3 NDFRT_Public_2007.04.10_TDE.xml >ndfrt.xml

The last item removes the DTD reference with some apelon URL which no longer works. Now we create an Oracle SQL*Loader control file, create the oracle database tables, load the resulting conrol file, and finally create the foreign key constraints.

java -jar saxon8.jar -o ndfrt.ctl ndfrt.xml ndfrt2oracle.xsl
sqlplus user/pass@dola.regenstrief.org <ndfrt.ddl
sqlldr control=ndfrt.ctl userid=user/pass@dola.regenstrief.org
sqlplus user/pass@dola.regenstrief.org <ndfrt.ref

Now we have the database loaded. T he database schema is described here.

Transitive and Reflexive Closure of ISA Relationsips

No other than ISA relationship is transitive here.

DROP TABLE umls.ndfrt_isa_closure
GO
CREATE TABLE umls.ndfrt_isa_closure NOLOGGING AS
SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE
  FROM umls.ndfrt_role r
  WHERE role_def_id='D' AND target_id IS NOT NULL
GO
CREATE INDEX umls.ndfrt_isa_closure_source_idx ON umls.ndfrt_isa_closure(source_id, target_id)
GO
CREATE INDEX umls.ndfrt_isa_closure_target_idx ON umls.ndfrt_isa_closure(target_id, source_id)
GO

Execute this until no more is inserted:

INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT DISTINCT a.source_id, b.target_id, a.DISTANCE+b.DISTANCE
  FROM umls.ndfrt_isa_closure a
    INNER JOIN umls.ndfrt_isa_closure b ON(b.source_id = a.target_id)
  WHERE NOT EXISTS (SELECT 1 FROM umls.ndfrt_isa_closure c
                     WHERE c.source_id = a.source_id
                       AND c.target_id = b.target_id)

Now for ISA we also like the reflexive closure so that we can always join to it and get the concept or it's specializations.

INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT DISTINCT id, id, 0
  FROM umls.ndfrt_concept

135665 record(s) affected

Finally, add this view to combine the roles with the transitive closure:

CREATE VIEW umls.ndfrt_role_plus AS
SELECT role_def_id, 1 as distance, source_id, target_id, target_value
  FROM umls.ndfrt_role
UNION ALL
SELECT 'D+' AS role_def_id, distance, source_id, target_id, null
  FROM umls.ndfrt_isa_closure 
  WHERE distance > 1

Exploring

Now we can look at a sample of NDF-RT

SELECT c.id, c.name, r.role_def_id, r.distance, d.name, t.id, COALESCE(t.name, r.target_value), k.id, k.name, n.id, n.name
  FROM umls.NDFRT_CONCEPT SAMPLE(0.1) c
    INNER JOIN umls.NDFRT_ROLE_PLUS r ON(source_id = c.id)
    LEFT OUTER JOIN umls.NDFRT_ROLE_DEF d ON(d.id = r.role_def_id)
    LEFT OUTER JOIN umls.NDFRT_CONCEPT t ON(t.id = r.target_id)
    INNER JOIN umls.NDFRT_concept k ON(k.id = c.kind)
    INNER JOIN umls.NDFRT_concept n ON(n.id = c.namespace)
  ORDER BY c.id, r.role_def_id, r.distance DESC

Good summary of relationships:

SELECT sk.name AS source, x.d_id, x.d_name, tk.name AS target, freq
FROM (
SELECT s.kind AS s_kind, d.id AS d_id, d.name AS d_name, t.kind AS t_kind, count(1) AS freq
  FROM umls.ndfrt_concept s 
    INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
    INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
    INNER JOIN umls.ndfrt_concept t ON(t.id = r.target_id)
  GROUP BY s.kind, d.id, d.name, t.kind
) x
  INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
  INNER JOIN umls.ndfrt_concept tk ON(tk.id = x.t_kind)
ORDER BY freq DESC
SOURCEROLE IDROLE NAMETARGETFREQ
NDF_KINDR27CI_ChemClassINGREDIENT_KIND283
NDF_KINDR31CI_MoAMECHANISM_OF_ACTION_KIND81
NDF_KINDR23CI_PEPHYSIOLOGIC_EFFECT_KIND61
NDF_KINDR37CI_withDISEASE_KIND4346
RXNORM_DOSE_FORM_KINDDdefiningConceptRXNORM_DOSE_FORM_KIND187
RECIP_STRUCTURES_KINDDdefiningConceptRECIP_STRUCTURES_KIND135
MECHANISM_OF_ACTION_KINDDdefiningConceptMECHANISM_OF_ACTION_KIND367
INGREDIENT_KINDDdefiningConceptINGREDIENT_KIND11262
PHYSIOLOGIC_EFFECT_KINDDdefiningConceptPHYSIOLOGIC_EFFECT_KIND1826
NDF_KINDDdefiningConceptNDF_KIND137936
PHARMACOKINETICS_KINDDdefiningConceptPHARMACOKINETICS_KIND58
DISEASE_KINDDdefiningConceptDISEASE_KIND6441
NDF_KINDR53effect_may_be_inhibited_byNDF_KIND1
RECIP_STRUCTURES_KINDR69gene_encodes_for_enzymeRECIP_STRUCTURES_KIND8
RECIP_STRUCTURES_KINDR78gene_has_variantRECIP_STRUCTURES_KIND33
NDF_KINDR49has_active_metabolitesINGREDIENT_KIND4
NDF_KINDR80has_Chemical_StructureINGREDIENT_KIND7
NDF_KINDR45has_DoseFormRXNORM_DOSE_FORM_KIND8491
NDF_KINDR25has_IngredientINGREDIENT_KIND4719
NDF_KINDR29has_MoAMECHANISM_OF_ACTION_KIND1951
NDF_KINDR21has_PEPHYSIOLOGIC_EFFECT_KIND3079
NDF_KINDR33has_PKPHARMACOKINETICS_KIND51
INGREDIENT_KINDA262779Heading_Mapped_ToINGREDIENT_KIND672
NDF_KINDR41inducesDISEASE_KIND132
NDF_KINDR43may_diagnoseDISEASE_KIND135
NDF_KINDR39may_preventDISEASE_KIND721
NDF_KINDR35may_treatDISEASE_KIND5590
NDF_KINDR59metabolized_by_recip_structureRECIP_STRUCTURES_KIND64
NDF_KINDR51site_of_metabolismPHARMACOKINETICS_KIND25
NDF_KINDA262778VA_Product_ComponentNDF_KIND30757
NDF_KINDA495VA_Secondary_Drug_ClassNDF_KIND12

Here are the kinds:

SELECT k.id, k.name, count(1) as freq
  FROM NDFRT_CONCEPT c
    INNER JOIN NDFRT_CONCEPT k ON(k.id = c.kind)
  GROUP BY k.id, k.name
  ORDER BY freq DESC
IDNAMEFREQ
9NDF_KIND120489
11INGREDIENT_KIND8174
17DISEASE_KIND4258
7PHYSIOLOGIC_EFFECT_KIND1722
13MECHANISM_OF_ACTION_KIND339
19RXNORM_DOSE_FORM_KIND175
3RECIP_STRUCTURES_KIND116
15PHARMACOKINETICS_KIND59

Here are the attributes (values rather than concept):

SELECT sk.name AS source, kf.kfreq, x.d_id, x.d_name, freq
FROM (
SELECT s.kind AS s_kind, d.id as d_id, d.name AS d_name, count(1) AS freq
  FROM umls.ndfrt_concept s 
    INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
    INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
  WHERE r.target_id IS NULL
  GROUP BY s.kind, d.id, d.name
) x
  INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
  INNER JOIN (
     SELECT kind, count(1) AS kfreq
       FROM umls.ndfrt_concept
       GROUP BY kind
  ) kf ON(kf.kind = s_kind)
ORDER BY source ASC, freq DESC
SOURCEFREQROLE IDROLE NAMEFREQ
NDF_KIND120506P264Display_Name120506
MECHANISM_OF_ACTION_KIND359P264Display_Name359
RXNORM_DOSE_FORM_KIND174P264Display_Name174
PHARMACOKINETICS_KIND59P264Display_Name59
INGREDIENT_KIND8421P264Display_Name8421
PHYSIOLOGIC_EFFECT_KIND1763P264Display_Name1763
DISEASE_KIND4258P264Display_Name4258
RECIP_STRUCTURES_KIND116P264Display_Name116
MECHANISM_OF_ACTION_KIND359P289FDA_SPL_Name34
INGREDIENT_KIND8421P289FDA_SPL_Name13
NDF_KIND120506P289FDA_SPL_Name16
PHYSIOLOGIC_EFFECT_KIND1763P289FDA_SPL_Name4
MECHANISM_OF_ACTION_KIND359P139Included_MeSH_UIs106
NDF_KIND120506P99Level120452
MECHANISM_OF_ACTION_KIND359P143MeSH_Definition65
INGREDIENT_KIND8421P143MeSH_Definition7957
DISEASE_KIND4258P143MeSH_Definition4046
INGREDIENT_KIND8421P287MeSH_Name8414
RECIP_STRUCTURES_KIND116P287MeSH_Name21
MECHANISM_OF_ACTION_KIND359P287MeSH_Name65
DISEASE_KIND4258P287MeSH_Name4222
NDF_KIND120506P129MeSH_Pharmacologic_Action1272
MECHANISM_OF_ACTION_KIND359P135MeSH_UI66
DISEASE_KIND4258P135MeSH_UI4222
INGREDIENT_KIND8421P135MeSH_UI8414
RECIP_STRUCTURES_KIND116P135MeSH_UI21
RECIP_STRUCTURES_KIND116P130NCI_Preferred_Term47
DISEASE_KIND4258P262794NUI4258
INGREDIENT_KIND8421P262794NUI8421
PHYSIOLOGIC_EFFECT_KIND1763P262794NUI1763
MECHANISM_OF_ACTION_KIND359P262794NUI359
NDF_KIND120506P262794NUI120506
RXNORM_DOSE_FORM_KIND174P262794NUI174
RECIP_STRUCTURES_KIND116P262794NUI116
PHARMACOKINETICS_KIND59P262794NUI59
PHYSIOLOGIC_EFFECT_KIND1763P145Review_Status102
NDF_KIND120506P145Review_Status4679
DISEASE_KIND4258P145Review_Status46
MECHANISM_OF_ACTION_KIND359P145Review_Status56
RXNORM_DOSE_FORM_KIND174P817RxNorm_Code140
NDF_KIND120506P817RxNorm_Code12988
NDF_KIND120506P818RxNorm_CUI12988
RXNORM_DOSE_FORM_KIND174P818RxNorm_CUI140
RXNORM_DOSE_FORM_KIND174P819RxNorm_Name140
NDF_KIND120506P819RxNorm_Name12988
NDF_KIND120506P262704RxNorm_Preferred_Translation_Term12903
DISEASE_KIND4258P262795SNOMED_Mapping_ID4577
DISEASE_KIND4258P262796SPL_Problem_List_Mapping_ID2478
NDF_KIND120506P262790Status120616
DISEASE_KIND4258P137Synonym34665
PHYSIOLOGIC_EFFECT_KIND1763P137Synonym31
MECHANISM_OF_ACTION_KIND359P137Synonym53
INGREDIENT_KIND8421P137Synonym35386
RECIP_STRUCTURES_KIND116P137Synonym232
DISEASE_KIND4258P141UMLS_CUI4232
RECIP_STRUCTURES_KIND116P141UMLS_CUI9
MECHANISM_OF_ACTION_KIND359P141UMLS_CUI66
INGREDIENT_KIND8421P141UMLS_CUI8415
NDF_KIND120506P2627UMLS_Preferred_Translation_Term4297
RECIP_STRUCTURES_KIND116P55UMLS_Semantic_Type36
INGREDIENT_KIND8421P262793UNII434
NDF_KIND120506P131VA_Class_Code572
NDF_KIND120506P262718VA_Class_Description33
NDF_KIND120506P262716VA_Class_Type572
NDF_KIND120506P63VA_CS_Federal_Schedule3941
NDF_KIND120506P279VA_Manufacturer_Code95781
NDF_KIND120506P69VA_National_Formulary_Name17467
NDF_KIND120506P121VA_NDC95781
NDF_KIND120506P262792VA_Ndf_Type17424
NDF_KIND120506P281VA_Package_Code95781
NDF_KIND120506P115VA_Print_Name13385
NDF_KIND120506P280VA_Product_Code95781
NDF_KIND120506P71VA_RxOtc_Indicator8706
NDF_KIND120506P101VA_Strength8509
NDF_KIND120506P119VA_Trade_Name96042
NDF_KIND120506P103VA_Units8511
NDF_KIND120506P97VA_UPN432
NDF_KIND120506P262778VUID24671

Next: UsingNdfRt

Attachments