Original NDF-RT
NDF-RT comes in a distribution outside of UMLS. They use the XML format of Apelon, Inc.. This is a pretty simple format which the NCI-Thesaurus also uses, and I believe even SNOMED is in this format (if you can get the original sources outside the UMLS.)
Step 1 is get the NDF-RT original distribution:
wget ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/NDF-RT_XML.zip unzip NDF-RT_XML.zip tail +3 NDFRT_Public_2007.04.10_TDE.xml >ndfrt.xml
The last item removes the DTD reference with some apelon URL which no longer works. Now we create an Oracle SQL*Loader control file, create the oracle database tables, load the resulting conrol file, and finally create the foreign key constraints.
java -jar saxon8.jar -o ndfrt.ctl ndfrt.xml ndfrt2oracle.xsl sqlplus user/pass@dola.regenstrief.org <ndfrt.ddl sqlldr control=ndfrt.ctl userid=user/pass@dola.regenstrief.org sqlplus user/pass@dola.regenstrief.org <ndfrt.ref
Now we have the database loaded. T he database schema is described here.
Transitive and Reflexive Closure of ISA Relationsips
No other than ISA relationship is transitive here.
Note that just like in MeSH, the "heading mapped to" (A262779) relationship needs to be included in the isa relationship table,
because that's what it is.
DROP TABLE umls.ndfrt_isa_closure
GO
CREATE TABLE umls.ndfrt_isa_closure NOLOGGING AS
SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE
FROM umls.ndfrt_role r
WHERE role_def_id IN ('D', 'A262779') AND target_id IS NOT NULL
GO
CREATE INDEX umls.ndfrt_isa_closure_source_idx ON umls.ndfrt_isa_closure(source_id, target_id)
GO
CREATE INDEX umls.ndfrt_isa_closure_target_idx ON umls.ndfrt_isa_closure(target_id, source_id)
GO
Execute this until no more is inserted:
INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT source_id, target_id, MIN(distance) FROM (
SELECT a.source_id, b.target_id, a.DISTANCE+b.DISTANCE as distance
FROM umls.ndfrt_isa_closure a
INNER JOIN umls.ndfrt_isa_closure b ON(b.source_id = a.target_id)
UNION ALL
SELECT * FROM umls.ndfrt_isa_closure
) GROUP BY source_id, target_id
MINUS
SELECT * FROM umls.ndfrt_isa_closure
Now for ISA we also like the reflexive closure so that we can always join to it and get the concept or it's specializations.
INSERT INTO umls.ndfrt_isa_closure NOLOGGING SELECT DISTINCT id, id, 0 FROM umls.ndfrt_concept
136062 record(s) affected
Finally, add this view to combine the roles with the transitive closure:
CREATE VIEW umls.ndfrt_role_plus AS SELECT role_def_id, 1 as distance, source_id, target_id, target_value FROM umls.ndfrt_role UNION ALL SELECT 'D+' AS role_def_id, distance, source_id, target_id, null FROM umls.ndfrt_isa_closure WHERE distance > 1
Exploring
Now we can look at a sample of NDF-RT
SELECT c.id, c.name, r.role_def_id, r.distance, d.name, t.id, COALESCE(t.name, r.target_value), k.id, k.name, n.id, n.name
FROM umls.NDFRT_CONCEPT SAMPLE(0.1) c
INNER JOIN umls.NDFRT_ROLE_PLUS r ON(source_id = c.id)
LEFT OUTER JOIN umls.NDFRT_ROLE_DEF d ON(d.id = r.role_def_id)
LEFT OUTER JOIN umls.NDFRT_CONCEPT t ON(t.id = r.target_id)
INNER JOIN umls.NDFRT_concept k ON(k.id = c.kind)
INNER JOIN umls.NDFRT_concept n ON(n.id = c.namespace)
ORDER BY c.id, r.role_def_id, r.distance DESC
Good summary of relationships:
SELECT sk.name AS source, x.d_id, x.d_name, tk.name AS target, freq
FROM (
SELECT s.kind AS s_kind, d.id AS d_id, d.name AS d_name, t.kind AS t_kind, count(1) AS freq
FROM umls.ndfrt_concept s
INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
INNER JOIN umls.ndfrt_concept t ON(t.id = r.target_id)
GROUP BY s.kind, d.id, d.name, t.kind
) x
INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
INNER JOIN umls.ndfrt_concept tk ON(tk.id = x.t_kind)
ORDER BY freq DESC
| SOURCE | ROLE ID | ROLE NAME | TARGET | FREQ |
| NDF_KIND | R27 | CI_ChemClass | INGREDIENT_KIND | 283 |
| NDF_KIND | R31 | CI_MoA | MECHANISM_OF_ACTION_KIND | 81 |
| NDF_KIND | R23 | CI_PE | PHYSIOLOGIC_EFFECT_KIND | 61 |
| NDF_KIND | R37 | CI_with | DISEASE_KIND | 4346 |
| RXNORM_DOSE_FORM_KIND | D | definingConcept | RXNORM_DOSE_FORM_KIND | 187 |
| RECIP_STRUCTURES_KIND | D | definingConcept | RECIP_STRUCTURES_KIND | 135 |
| MECHANISM_OF_ACTION_KIND | D | definingConcept | MECHANISM_OF_ACTION_KIND | 367 |
| INGREDIENT_KIND | D | definingConcept | INGREDIENT_KIND | 11262 |
| PHYSIOLOGIC_EFFECT_KIND | D | definingConcept | PHYSIOLOGIC_EFFECT_KIND | 1826 |
| NDF_KIND | D | definingConcept | NDF_KIND | 137936 |
| PHARMACOKINETICS_KIND | D | definingConcept | PHARMACOKINETICS_KIND | 58 |
| DISEASE_KIND | D | definingConcept | DISEASE_KIND | 6441 |
| NDF_KIND | R53 | effect_may_be_inhibited_by | NDF_KIND | 1 |
| RECIP_STRUCTURES_KIND | R69 | gene_encodes_for_enzyme | RECIP_STRUCTURES_KIND | 8 |
| RECIP_STRUCTURES_KIND | R78 | gene_has_variant | RECIP_STRUCTURES_KIND | 33 |
| NDF_KIND | R49 | has_active_metabolites | INGREDIENT_KIND | 4 |
| NDF_KIND | R80 | has_Chemical_Structure | INGREDIENT_KIND | 7 |
| NDF_KIND | R45 | has_DoseForm | RXNORM_DOSE_FORM_KIND | 8491 |
| NDF_KIND | R25 | has_Ingredient | INGREDIENT_KIND | 4719 |
| NDF_KIND | R29 | has_MoA | MECHANISM_OF_ACTION_KIND | 1951 |
| NDF_KIND | R21 | has_PE | PHYSIOLOGIC_EFFECT_KIND | 3079 |
| NDF_KIND | R33 | has_PK | PHARMACOKINETICS_KIND | 51 |
| INGREDIENT_KIND | A262779 | Heading_Mapped_To | INGREDIENT_KIND | 672 |
| NDF_KIND | R41 | induces | DISEASE_KIND | 132 |
| NDF_KIND | R43 | may_diagnose | DISEASE_KIND | 135 |
| NDF_KIND | R39 | may_prevent | DISEASE_KIND | 721 |
| NDF_KIND | R35 | may_treat | DISEASE_KIND | 5590 |
| NDF_KIND | R59 | metabolized_by_recip_structure | RECIP_STRUCTURES_KIND | 64 |
| NDF_KIND | R51 | site_of_metabolism | PHARMACOKINETICS_KIND | 25 |
| NDF_KIND | A262778 | VA_Product_Component | NDF_KIND | 30757 |
| NDF_KIND | A495 | VA_Secondary_Drug_Class | NDF_KIND | 12 |
Here are the kinds:
SELECT k.id, k.name, count(1) as freq
FROM NDFRT_CONCEPT c
INNER JOIN NDFRT_CONCEPT k ON(k.id = c.kind)
GROUP BY k.id, k.name
ORDER BY freq DESC
| ID | NAME | FREQ |
| 9 | NDF_KIND | 120489 |
| 11 | INGREDIENT_KIND | 8174 |
| 17 | DISEASE_KIND | 4258 |
| 7 | PHYSIOLOGIC_EFFECT_KIND | 1722 |
| 13 | MECHANISM_OF_ACTION_KIND | 339 |
| 19 | RXNORM_DOSE_FORM_KIND | 175 |
| 3 | RECIP_STRUCTURES_KIND | 116 |
| 15 | PHARMACOKINETICS_KIND | 59 |
Here are the attributes (values rather than concept):
SELECT sk.name AS source, kf.kfreq, x.d_id, x.d_name, freq
FROM (
SELECT s.kind AS s_kind, d.id as d_id, d.name AS d_name, count(1) AS freq
FROM umls.ndfrt_concept s
INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
WHERE r.target_id IS NULL
GROUP BY s.kind, d.id, d.name
) x
INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
INNER JOIN (
SELECT kind, count(1) AS kfreq
FROM umls.ndfrt_concept
GROUP BY kind
) kf ON(kf.kind = s_kind)
ORDER BY source ASC, freq DESC
| SOURCE | FREQ | ROLE ID | ROLE NAME | FREQ |
| NDF_KIND | 120506 | P264 | Display_Name | 120506 |
| MECHANISM_OF_ACTION_KIND | 359 | P264 | Display_Name | 359 |
| RXNORM_DOSE_FORM_KIND | 174 | P264 | Display_Name | 174 |
| PHARMACOKINETICS_KIND | 59 | P264 | Display_Name | 59 |
| INGREDIENT_KIND | 8421 | P264 | Display_Name | 8421 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P264 | Display_Name | 1763 |
| DISEASE_KIND | 4258 | P264 | Display_Name | 4258 |
| RECIP_STRUCTURES_KIND | 116 | P264 | Display_Name | 116 |
| MECHANISM_OF_ACTION_KIND | 359 | P289 | FDA_SPL_Name | 34 |
| INGREDIENT_KIND | 8421 | P289 | FDA_SPL_Name | 13 |
| NDF_KIND | 120506 | P289 | FDA_SPL_Name | 16 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P289 | FDA_SPL_Name | 4 |
| MECHANISM_OF_ACTION_KIND | 359 | P139 | Included_MeSH_UIs | 106 |
| NDF_KIND | 120506 | P99 | Level | 120452 |
| MECHANISM_OF_ACTION_KIND | 359 | P143 | MeSH_Definition | 65 |
| INGREDIENT_KIND | 8421 | P143 | MeSH_Definition | 7957 |
| DISEASE_KIND | 4258 | P143 | MeSH_Definition | 4046 |
| INGREDIENT_KIND | 8421 | P287 | MeSH_Name | 8414 |
| RECIP_STRUCTURES_KIND | 116 | P287 | MeSH_Name | 21 |
| MECHANISM_OF_ACTION_KIND | 359 | P287 | MeSH_Name | 65 |
| DISEASE_KIND | 4258 | P287 | MeSH_Name | 4222 |
| NDF_KIND | 120506 | P129 | MeSH_Pharmacologic_Action | 1272 |
| MECHANISM_OF_ACTION_KIND | 359 | P135 | MeSH_UI | 66 |
| DISEASE_KIND | 4258 | P135 | MeSH_UI | 4222 |
| INGREDIENT_KIND | 8421 | P135 | MeSH_UI | 8414 |
| RECIP_STRUCTURES_KIND | 116 | P135 | MeSH_UI | 21 |
| RECIP_STRUCTURES_KIND | 116 | P130 | NCI_Preferred_Term | 47 |
| DISEASE_KIND | 4258 | P262794 | NUI | 4258 |
| INGREDIENT_KIND | 8421 | P262794 | NUI | 8421 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P262794 | NUI | 1763 |
| MECHANISM_OF_ACTION_KIND | 359 | P262794 | NUI | 359 |
| NDF_KIND | 120506 | P262794 | NUI | 120506 |
| RXNORM_DOSE_FORM_KIND | 174 | P262794 | NUI | 174 |
| RECIP_STRUCTURES_KIND | 116 | P262794 | NUI | 116 |
| PHARMACOKINETICS_KIND | 59 | P262794 | NUI | 59 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P145 | Review_Status | 102 |
| NDF_KIND | 120506 | P145 | Review_Status | 4679 |
| DISEASE_KIND | 4258 | P145 | Review_Status | 46 |
| MECHANISM_OF_ACTION_KIND | 359 | P145 | Review_Status | 56 |
| RXNORM_DOSE_FORM_KIND | 174 | P817 | RxNorm_Code | 140 |
| NDF_KIND | 120506 | P817 | RxNorm_Code | 12988 |
| NDF_KIND | 120506 | P818 | RxNorm_CUI | 12988 |
| RXNORM_DOSE_FORM_KIND | 174 | P818 | RxNorm_CUI | 140 |
| RXNORM_DOSE_FORM_KIND | 174 | P819 | RxNorm_Name | 140 |
| NDF_KIND | 120506 | P819 | RxNorm_Name | 12988 |
| NDF_KIND | 120506 | P262704 | RxNorm_Preferred_Translation_Term | 12903 |
| DISEASE_KIND | 4258 | P262795 | SNOMED_Mapping_ID | 4577 |
| DISEASE_KIND | 4258 | P262796 | SPL_Problem_List_Mapping_ID | 2478 |
| NDF_KIND | 120506 | P262790 | Status | 120616 |
| DISEASE_KIND | 4258 | P137 | Synonym | 34665 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P137 | Synonym | 31 |
| MECHANISM_OF_ACTION_KIND | 359 | P137 | Synonym | 53 |
| INGREDIENT_KIND | 8421 | P137 | Synonym | 35386 |
| RECIP_STRUCTURES_KIND | 116 | P137 | Synonym | 232 |
| DISEASE_KIND | 4258 | P141 | UMLS_CUI | 4232 |
| RECIP_STRUCTURES_KIND | 116 | P141 | UMLS_CUI | 9 |
| MECHANISM_OF_ACTION_KIND | 359 | P141 | UMLS_CUI | 66 |
| INGREDIENT_KIND | 8421 | P141 | UMLS_CUI | 8415 |
| NDF_KIND | 120506 | P2627 | UMLS_Preferred_Translation_Term | 4297 |
| RECIP_STRUCTURES_KIND | 116 | P55 | UMLS_Semantic_Type | 36 |
| INGREDIENT_KIND | 8421 | P262793 | UNII | 434 |
| NDF_KIND | 120506 | P131 | VA_Class_Code | 572 |
| NDF_KIND | 120506 | P262718 | VA_Class_Description | 33 |
| NDF_KIND | 120506 | P262716 | VA_Class_Type | 572 |
| NDF_KIND | 120506 | P63 | VA_CS_Federal_Schedule | 3941 |
| NDF_KIND | 120506 | P279 | VA_Manufacturer_Code | 95781 |
| NDF_KIND | 120506 | P69 | VA_National_Formulary_Name | 17467 |
| NDF_KIND | 120506 | P121 | VA_NDC | 95781 |
| NDF_KIND | 120506 | P262792 | VA_Ndf_Type | 17424 |
| NDF_KIND | 120506 | P281 | VA_Package_Code | 95781 |
| NDF_KIND | 120506 | P115 | VA_Print_Name | 13385 |
| NDF_KIND | 120506 | P280 | VA_Product_Code | 95781 |
| NDF_KIND | 120506 | P71 | VA_RxOtc_Indicator | 8706 |
| NDF_KIND | 120506 | P101 | VA_Strength | 8509 |
| NDF_KIND | 120506 | P119 | VA_Trade_Name | 96042 |
| NDF_KIND | 120506 | P103 | VA_Units | 8511 |
| NDF_KIND | 120506 | P97 | VA_UPN | 432 |
| NDF_KIND | 120506 | P262778 | VUID | 24671 |
Next: UsingNdfRt
Attachments
- UNII-NDFRT.xls (0.6 MB) - added by gschadow 2 years ago.
![(please configure the [header_logo] section in trac.ini)](/mw/chrome/site/logo100.png)