Original NDF-RT

NDF-RT comes in a distribution outside of UMLS. They use the XML format of Apelon, Inc.. This is a pretty simple format which the NCI-Thesaurus also uses, and I believe even SNOMED is in this format (if you can get the original sources outside the UMLS.)

Step 1 is get the NDF-RT original distribution:

wget ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/NDF-RT_XML.zip
unzip NDF-RT_XML.zip
tail +3 NDFRT_Public_2007.04.10_TDE.xml >ndfrt.xml

The last item removes the DTD reference with some apelon URL which no longer works. Now we create an Oracle SQL*Loader control file, create the oracle database tables, load the resulting conrol file, and finally create the foreign key constraints.

java -jar saxon8.jar -o ndfrt.ctl ndfrt.xml ndfrt2oracle.xsl
sqlplus user/pass@dola.regenstrief.org <ndfrt.ddl
sqlldr control=ndfrt.ctl userid=user/pass@dola.regenstrief.org
sqlplus user/pass@dola.regenstrief.org <ndfrt.ref

Now we have the database loaded. T he database schema is described here.

Transitive and Reflexive Closure of ISA Relationsips

No other than ISA relationship is transitive here.

Note that just like in MeSH, the "heading mapped to" (A262779) relationship needs to be included in the isa relationship table,

because that's what it is.

DROP TABLE umls.ndfrt_isa_closure
GO
CREATE TABLE umls.ndfrt_isa_closure NOLOGGING AS
SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE
  FROM umls.ndfrt_role r
  WHERE role_def_id IN ('D', 'A262779') AND target_id IS NOT NULL
GO
CREATE INDEX umls.ndfrt_isa_closure_source_idx ON umls.ndfrt_isa_closure(source_id, target_id)
GO
CREATE INDEX umls.ndfrt_isa_closure_target_idx ON umls.ndfrt_isa_closure(target_id, source_id)
GO

Execute this until no more is inserted:

INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT source_id, target_id, MIN(distance) FROM (
SELECT a.source_id, b.target_id, a.DISTANCE+b.DISTANCE as distance
  FROM umls.ndfrt_isa_closure a
    INNER JOIN umls.ndfrt_isa_closure b ON(b.source_id = a.target_id)
UNION ALL
SELECT * FROM umls.ndfrt_isa_closure
) GROUP BY source_id, target_id
MINUS
SELECT * FROM umls.ndfrt_isa_closure

Now for ISA we also like the reflexive closure so that we can always join to it and get the concept or it's specializations.

INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT DISTINCT id, id, 0
  FROM umls.ndfrt_concept

136062 record(s) affected

Finally, add this view to combine the roles with the transitive closure:

CREATE VIEW umls.ndfrt_role_plus AS
SELECT role_def_id, 1 as distance, source_id, target_id, target_value
  FROM umls.ndfrt_role
UNION ALL
SELECT 'D+' AS role_def_id, distance, source_id, target_id, null
  FROM umls.ndfrt_isa_closure 
  WHERE distance > 1

Exploring

Now we can look at a sample of NDF-RT

SELECT c.id, c.name, r.role_def_id, r.distance, d.name, t.id, COALESCE(t.name, r.target_value), k.id, k.name, n.id, n.name
  FROM umls.NDFRT_CONCEPT SAMPLE(0.1) c
    INNER JOIN umls.NDFRT_ROLE_PLUS r ON(source_id = c.id)
    LEFT OUTER JOIN umls.NDFRT_ROLE_DEF d ON(d.id = r.role_def_id)
    LEFT OUTER JOIN umls.NDFRT_CONCEPT t ON(t.id = r.target_id)
    INNER JOIN umls.NDFRT_concept k ON(k.id = c.kind)
    INNER JOIN umls.NDFRT_concept n ON(n.id = c.namespace)
  ORDER BY c.id, r.role_def_id, r.distance DESC

Good summary of relationships:

SELECT sk.name AS source, x.d_id, x.d_name, tk.name AS target, freq
FROM (
SELECT s.kind AS s_kind, d.id AS d_id, d.name AS d_name, t.kind AS t_kind, count(1) AS freq
  FROM umls.ndfrt_concept s 
    INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
    INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
    INNER JOIN umls.ndfrt_concept t ON(t.id = r.target_id)
  GROUP BY s.kind, d.id, d.name, t.kind
) x
  INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
  INNER JOIN umls.ndfrt_concept tk ON(tk.id = x.t_kind)
ORDER BY freq DESC
SOURCEROLE IDROLE NAMETARGETFREQ
NDF_KINDR27CI_ChemClassINGREDIENT_KIND283
NDF_KINDR31CI_MoAMECHANISM_OF_ACTION_KIND81
NDF_KINDR23CI_PEPHYSIOLOGIC_EFFECT_KIND61
NDF_KINDR37CI_withDISEASE_KIND4346
RXNORM_DOSE_FORM_KINDDdefiningConceptRXNORM_DOSE_FORM_KIND187
RECIP_STRUCTURES_KINDDdefiningConceptRECIP_STRUCTURES_KIND135
MECHANISM_OF_ACTION_KINDDdefiningConceptMECHANISM_OF_ACTION_KIND367
INGREDIENT_KINDDdefiningConceptINGREDIENT_KIND11262
PHYSIOLOGIC_EFFECT_KINDDdefiningConceptPHYSIOLOGIC_EFFECT_KIND1826
NDF_KINDDdefiningConceptNDF_KIND137936
PHARMACOKINETICS_KINDDdefiningConceptPHARMACOKINETICS_KIND58
DISEASE_KINDDdefiningConceptDISEASE_KIND6441
NDF_KINDR53effect_may_be_inhibited_byNDF_KIND1
RECIP_STRUCTURES_KINDR69gene_encodes_for_enzymeRECIP_STRUCTURES_KIND8
RECIP_STRUCTURES_KINDR78gene_has_variantRECIP_STRUCTURES_KIND33
NDF_KINDR49has_active_metabolitesINGREDIENT_KIND4
NDF_KINDR80has_Chemical_StructureINGREDIENT_KIND7
NDF_KINDR45has_DoseFormRXNORM_DOSE_FORM_KIND8491
NDF_KINDR25has_IngredientINGREDIENT_KIND4719
NDF_KINDR29has_MoAMECHANISM_OF_ACTION_KIND1951
NDF_KINDR21has_PEPHYSIOLOGIC_EFFECT_KIND3079
NDF_KINDR33has_PKPHARMACOKINETICS_KIND51
INGREDIENT_KINDA262779Heading_Mapped_ToINGREDIENT_KIND672
NDF_KINDR41inducesDISEASE_KIND132
NDF_KINDR43may_diagnoseDISEASE_KIND135
NDF_KINDR39may_preventDISEASE_KIND721
NDF_KINDR35may_treatDISEASE_KIND5590
NDF_KINDR59metabolized_by_recip_structureRECIP_STRUCTURES_KIND64
NDF_KINDR51site_of_metabolismPHARMACOKINETICS_KIND25
NDF_KINDA262778VA_Product_ComponentNDF_KIND30757
NDF_KINDA495VA_Secondary_Drug_ClassNDF_KIND12

Here are the kinds:

SELECT k.id, k.name, count(1) as freq
  FROM NDFRT_CONCEPT c
    INNER JOIN NDFRT_CONCEPT k ON(k.id = c.kind)
  GROUP BY k.id, k.name
  ORDER BY freq DESC
IDNAMEFREQ
9NDF_KIND120489
11INGREDIENT_KIND8174
17DISEASE_KIND4258
7PHYSIOLOGIC_EFFECT_KIND1722
13MECHANISM_OF_ACTION_KIND339
19RXNORM_DOSE_FORM_KIND175
3RECIP_STRUCTURES_KIND116
15PHARMACOKINETICS_KIND59

Here are the attributes (values rather than concept):

SELECT sk.name AS source, kf.kfreq, x.d_id, x.d_name, freq
FROM (
SELECT s.kind AS s_kind, d.id as d_id, d.name AS d_name, count(1) AS freq
  FROM umls.ndfrt_concept s 
    INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
    INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
  WHERE r.target_id IS NULL
  GROUP BY s.kind, d.id, d.name
) x
  INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
  INNER JOIN (
     SELECT kind, count(1) AS kfreq
       FROM umls.ndfrt_concept
       GROUP BY kind
  ) kf ON(kf.kind = s_kind)
ORDER BY source ASC, freq DESC
SOURCEFREQROLE IDROLE NAMEFREQ
NDF_KIND120506P264Display_Name120506
MECHANISM_OF_ACTION_KIND359P264Display_Name359
RXNORM_DOSE_FORM_KIND174P264Display_Name174
PHARMACOKINETICS_KIND59P264Display_Name59
INGREDIENT_KIND8421P264Display_Name8421
PHYSIOLOGIC_EFFECT_KIND1763P264Display_Name1763
DISEASE_KIND4258P264Display_Name4258
RECIP_STRUCTURES_KIND116P264Display_Name116
MECHANISM_OF_ACTION_KIND359P289FDA_SPL_Name34
INGREDIENT_KIND8421P289FDA_SPL_Name13
NDF_KIND120506P289FDA_SPL_Name16
PHYSIOLOGIC_EFFECT_KIND1763P289FDA_SPL_Name4
MECHANISM_OF_ACTION_KIND359P139Included_MeSH_UIs106
NDF_KIND120506P99Level120452
MECHANISM_OF_ACTION_KIND359P143MeSH_Definition65
INGREDIENT_KIND8421P143MeSH_Definition7957
DISEASE_KIND4258P143MeSH_Definition4046
INGREDIENT_KIND8421P287MeSH_Name8414
RECIP_STRUCTURES_KIND116P287MeSH_Name21
MECHANISM_OF_ACTION_KIND359P287MeSH_Name65
DISEASE_KIND4258P287MeSH_Name4222
NDF_KIND120506P129MeSH_Pharmacologic_Action1272
MECHANISM_OF_ACTION_KIND359P135MeSH_UI66
DISEASE_KIND4258P135MeSH_UI4222
INGREDIENT_KIND8421P135MeSH_UI8414
RECIP_STRUCTURES_KIND116P135MeSH_UI21
RECIP_STRUCTURES_KIND116P130NCI_Preferred_Term47
DISEASE_KIND4258P262794NUI4258
INGREDIENT_KIND8421P262794NUI8421
PHYSIOLOGIC_EFFECT_KIND1763P262794NUI1763
MECHANISM_OF_ACTION_KIND359P262794NUI359
NDF_KIND120506P262794NUI120506
RXNORM_DOSE_FORM_KIND174P262794NUI174
RECIP_STRUCTURES_KIND116P262794NUI116
PHARMACOKINETICS_KIND59P262794NUI59
PHYSIOLOGIC_EFFECT_KIND1763P145Review_Status102
NDF_KIND120506P145Review_Status4679
DISEASE_KIND4258P145Review_Status46
MECHANISM_OF_ACTION_KIND359P145Review_Status56
RXNORM_DOSE_FORM_KIND174P817RxNorm_Code140
NDF_KIND120506P817RxNorm_Code12988
NDF_KIND120506P818RxNorm_CUI12988
RXNORM_DOSE_FORM_KIND174P818RxNorm_CUI140
RXNORM_DOSE_FORM_KIND174P819RxNorm_Name140
NDF_KIND120506P819RxNorm_Name12988
NDF_KIND120506P262704RxNorm_Preferred_Translation_Term12903
DISEASE_KIND4258P262795SNOMED_Mapping_ID4577
DISEASE_KIND4258P262796SPL_Problem_List_Mapping_ID2478
NDF_KIND120506P262790Status120616
DISEASE_KIND4258P137Synonym34665
PHYSIOLOGIC_EFFECT_KIND1763P137Synonym31
MECHANISM_OF_ACTION_KIND359P137Synonym53
INGREDIENT_KIND8421P137Synonym35386
RECIP_STRUCTURES_KIND116P137Synonym232
DISEASE_KIND4258P141UMLS_CUI4232
RECIP_STRUCTURES_KIND116P141UMLS_CUI9
MECHANISM_OF_ACTION_KIND359P141UMLS_CUI66
INGREDIENT_KIND8421P141UMLS_CUI8415
NDF_KIND120506P2627UMLS_Preferred_Translation_Term4297
RECIP_STRUCTURES_KIND116P55UMLS_Semantic_Type36
INGREDIENT_KIND8421P262793UNII434
NDF_KIND120506P131VA_Class_Code572
NDF_KIND120506P262718VA_Class_Description33
NDF_KIND120506P262716VA_Class_Type572
NDF_KIND120506P63VA_CS_Federal_Schedule3941
NDF_KIND120506P279VA_Manufacturer_Code95781
NDF_KIND120506P69VA_National_Formulary_Name17467
NDF_KIND120506P121VA_NDC95781
NDF_KIND120506P262792VA_Ndf_Type17424
NDF_KIND120506P281VA_Package_Code95781
NDF_KIND120506P115VA_Print_Name13385
NDF_KIND120506P280VA_Product_Code95781
NDF_KIND120506P71VA_RxOtc_Indicator8706
NDF_KIND120506P101VA_Strength8509
NDF_KIND120506P119VA_Trade_Name96042
NDF_KIND120506P103VA_Units8511
NDF_KIND120506P97VA_UPN432
NDF_KIND120506P262778VUID24671

Next: UsingNdfRt

Attachments