Original NDF-RT
NDF-RT comes in a distribution outside of UMLS. They use the XML format of Apelon, Inc.. This is a pretty simple format which the NCI-Thesaurus also uses, and I believe even SNOMED is in this format (if you can get the original sources outside the UMLS.)
Step 1 is get the NDF-RT original distribution:
wget ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NDF-RT/NDF-RT_XML.zip unzip NDF-RT_XML.zip tail +3 NDFRT_Public_2007.04.10_TDE.xml >ndfrt.xml
The last item removes the DTD reference with some apelon URL which no longer works. Now we create an Oracle SQL*Loader control file, create the oracle database tables, load the resulting conrol file, and finally create the foreign key constraints.
java -jar saxon8.jar -o ndfrt.ctl ndfrt.xml ndfrt2oracle.xsl sqlplus user/pass@dola.regenstrief.org <ndfrt.ddl sqlldr control=ndfrt.ctl userid=user/pass@dola.regenstrief.org sqlplus user/pass@dola.regenstrief.org <ndfrt.ref
Now we have the database loaded. T he database schema is described here.
Transitive and Reflexive Closure of ISA Relationsips
No other than ISA relationship is transitive here.
DROP TABLE umls.ndfrt_isa_closure GO CREATE TABLE umls.ndfrt_isa_closure NOLOGGING AS SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE FROM umls.ndfrt_role r WHERE role_def_id='D' AND target_id IS NOT NULL GO CREATE INDEX umls.ndfrt_isa_closure_source_idx ON umls.ndfrt_isa_closure(source_id, target_id) GO CREATE INDEX umls.ndfrt_isa_closure_target_idx ON umls.ndfrt_isa_closure(target_id, source_id) GO
Execute this until no more is inserted:
INSERT INTO umls.ndfrt_isa_closure NOLOGGING
SELECT DISTINCT a.source_id, b.target_id, a.DISTANCE+b.DISTANCE
FROM umls.ndfrt_isa_closure a
INNER JOIN umls.ndfrt_isa_closure b ON(b.source_id = a.target_id)
WHERE NOT EXISTS (SELECT 1 FROM umls.ndfrt_isa_closure c
WHERE c.source_id = a.source_id
AND c.target_id = b.target_id)
Now for ISA we also like the reflexive closure so that we can always join to it and get the concept or it's specializations.
INSERT INTO umls.ndfrt_isa_closure NOLOGGING SELECT DISTINCT id, id, 0 FROM umls.ndfrt_concept
135665 record(s) affected
Finally, add this view to combine the roles with the transitive closure:
CREATE VIEW umls.ndfrt_role_plus AS SELECT role_def_id, 1 as distance, source_id, target_id, target_value FROM umls.ndfrt_role UNION ALL SELECT 'D+' AS role_def_id, distance, source_id, target_id, null FROM umls.ndfrt_isa_closure WHERE distance > 1
Exploring
Now we can look at a sample of NDF-RT
SELECT c.id, c.name, r.role_def_id, r.distance, d.name, t.id, COALESCE(t.name, r.target_value), k.id, k.name, n.id, n.name
FROM umls.NDFRT_CONCEPT SAMPLE(0.1) c
INNER JOIN umls.NDFRT_ROLE_PLUS r ON(source_id = c.id)
LEFT OUTER JOIN umls.NDFRT_ROLE_DEF d ON(d.id = r.role_def_id)
LEFT OUTER JOIN umls.NDFRT_CONCEPT t ON(t.id = r.target_id)
INNER JOIN umls.NDFRT_concept k ON(k.id = c.kind)
INNER JOIN umls.NDFRT_concept n ON(n.id = c.namespace)
ORDER BY c.id, r.role_def_id, r.distance DESC
Good summary of relationships:
SELECT sk.name AS source, x.d_id, x.d_name, tk.name AS target, freq
FROM (
SELECT s.kind AS s_kind, d.id AS d_id, d.name AS d_name, t.kind AS t_kind, count(1) AS freq
FROM umls.ndfrt_concept s
INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
INNER JOIN umls.ndfrt_concept t ON(t.id = r.target_id)
GROUP BY s.kind, d.id, d.name, t.kind
) x
INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
INNER JOIN umls.ndfrt_concept tk ON(tk.id = x.t_kind)
ORDER BY freq DESC
| SOURCE | ROLE ID | ROLE NAME | TARGET | FREQ |
| NDF_KIND | R27 | CI_ChemClass | INGREDIENT_KIND | 283 |
| NDF_KIND | R31 | CI_MoA | MECHANISM_OF_ACTION_KIND | 81 |
| NDF_KIND | R23 | CI_PE | PHYSIOLOGIC_EFFECT_KIND | 61 |
| NDF_KIND | R37 | CI_with | DISEASE_KIND | 4346 |
| RXNORM_DOSE_FORM_KIND | D | definingConcept | RXNORM_DOSE_FORM_KIND | 187 |
| RECIP_STRUCTURES_KIND | D | definingConcept | RECIP_STRUCTURES_KIND | 135 |
| MECHANISM_OF_ACTION_KIND | D | definingConcept | MECHANISM_OF_ACTION_KIND | 367 |
| INGREDIENT_KIND | D | definingConcept | INGREDIENT_KIND | 11262 |
| PHYSIOLOGIC_EFFECT_KIND | D | definingConcept | PHYSIOLOGIC_EFFECT_KIND | 1826 |
| NDF_KIND | D | definingConcept | NDF_KIND | 137936 |
| PHARMACOKINETICS_KIND | D | definingConcept | PHARMACOKINETICS_KIND | 58 |
| DISEASE_KIND | D | definingConcept | DISEASE_KIND | 6441 |
| NDF_KIND | R53 | effect_may_be_inhibited_by | NDF_KIND | 1 |
| RECIP_STRUCTURES_KIND | R69 | gene_encodes_for_enzyme | RECIP_STRUCTURES_KIND | 8 |
| RECIP_STRUCTURES_KIND | R78 | gene_has_variant | RECIP_STRUCTURES_KIND | 33 |
| NDF_KIND | R49 | has_active_metabolites | INGREDIENT_KIND | 4 |
| NDF_KIND | R80 | has_Chemical_Structure | INGREDIENT_KIND | 7 |
| NDF_KIND | R45 | has_DoseForm | RXNORM_DOSE_FORM_KIND | 8491 |
| NDF_KIND | R25 | has_Ingredient | INGREDIENT_KIND | 4719 |
| NDF_KIND | R29 | has_MoA | MECHANISM_OF_ACTION_KIND | 1951 |
| NDF_KIND | R21 | has_PE | PHYSIOLOGIC_EFFECT_KIND | 3079 |
| NDF_KIND | R33 | has_PK | PHARMACOKINETICS_KIND | 51 |
| INGREDIENT_KIND | A262779 | Heading_Mapped_To | INGREDIENT_KIND | 672 |
| NDF_KIND | R41 | induces | DISEASE_KIND | 132 |
| NDF_KIND | R43 | may_diagnose | DISEASE_KIND | 135 |
| NDF_KIND | R39 | may_prevent | DISEASE_KIND | 721 |
| NDF_KIND | R35 | may_treat | DISEASE_KIND | 5590 |
| NDF_KIND | R59 | metabolized_by_recip_structure | RECIP_STRUCTURES_KIND | 64 |
| NDF_KIND | R51 | site_of_metabolism | PHARMACOKINETICS_KIND | 25 |
| NDF_KIND | A262778 | VA_Product_Component | NDF_KIND | 30757 |
| NDF_KIND | A495 | VA_Secondary_Drug_Class | NDF_KIND | 12 |
Here are the kinds:
SELECT k.id, k.name, count(1) as freq
FROM NDFRT_CONCEPT c
INNER JOIN NDFRT_CONCEPT k ON(k.id = c.kind)
GROUP BY k.id, k.name
ORDER BY freq DESC
| ID | NAME | FREQ |
| 9 | NDF_KIND | 120489 |
| 11 | INGREDIENT_KIND | 8174 |
| 17 | DISEASE_KIND | 4258 |
| 7 | PHYSIOLOGIC_EFFECT_KIND | 1722 |
| 13 | MECHANISM_OF_ACTION_KIND | 339 |
| 19 | RXNORM_DOSE_FORM_KIND | 175 |
| 3 | RECIP_STRUCTURES_KIND | 116 |
| 15 | PHARMACOKINETICS_KIND | 59 |
Here are the attributes (values rather than concept):
SELECT sk.name AS source, kf.kfreq, x.d_id, x.d_name, freq
FROM (
SELECT s.kind AS s_kind, d.id as d_id, d.name AS d_name, count(1) AS freq
FROM umls.ndfrt_concept s
INNER JOIN umls.ndfrt_role r ON(s.id = r.source_id)
INNER JOIN umls.ndfrt_role_def d ON(d.id = r.role_def_id)
WHERE r.target_id IS NULL
GROUP BY s.kind, d.id, d.name
) x
INNER JOIN umls.ndfrt_concept sk ON(sk.id = x.s_kind)
INNER JOIN (
SELECT kind, count(1) AS kfreq
FROM umls.ndfrt_concept
GROUP BY kind
) kf ON(kf.kind = s_kind)
ORDER BY source ASC, freq DESC
| SOURCE | FREQ | ROLE ID | ROLE NAME | FREQ |
| NDF_KIND | 120506 | P264 | Display_Name | 120506 |
| MECHANISM_OF_ACTION_KIND | 359 | P264 | Display_Name | 359 |
| RXNORM_DOSE_FORM_KIND | 174 | P264 | Display_Name | 174 |
| PHARMACOKINETICS_KIND | 59 | P264 | Display_Name | 59 |
| INGREDIENT_KIND | 8421 | P264 | Display_Name | 8421 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P264 | Display_Name | 1763 |
| DISEASE_KIND | 4258 | P264 | Display_Name | 4258 |
| RECIP_STRUCTURES_KIND | 116 | P264 | Display_Name | 116 |
| MECHANISM_OF_ACTION_KIND | 359 | P289 | FDA_SPL_Name | 34 |
| INGREDIENT_KIND | 8421 | P289 | FDA_SPL_Name | 13 |
| NDF_KIND | 120506 | P289 | FDA_SPL_Name | 16 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P289 | FDA_SPL_Name | 4 |
| MECHANISM_OF_ACTION_KIND | 359 | P139 | Included_MeSH_UIs | 106 |
| NDF_KIND | 120506 | P99 | Level | 120452 |
| MECHANISM_OF_ACTION_KIND | 359 | P143 | MeSH_Definition | 65 |
| INGREDIENT_KIND | 8421 | P143 | MeSH_Definition | 7957 |
| DISEASE_KIND | 4258 | P143 | MeSH_Definition | 4046 |
| INGREDIENT_KIND | 8421 | P287 | MeSH_Name | 8414 |
| RECIP_STRUCTURES_KIND | 116 | P287 | MeSH_Name | 21 |
| MECHANISM_OF_ACTION_KIND | 359 | P287 | MeSH_Name | 65 |
| DISEASE_KIND | 4258 | P287 | MeSH_Name | 4222 |
| NDF_KIND | 120506 | P129 | MeSH_Pharmacologic_Action | 1272 |
| MECHANISM_OF_ACTION_KIND | 359 | P135 | MeSH_UI | 66 |
| DISEASE_KIND | 4258 | P135 | MeSH_UI | 4222 |
| INGREDIENT_KIND | 8421 | P135 | MeSH_UI | 8414 |
| RECIP_STRUCTURES_KIND | 116 | P135 | MeSH_UI | 21 |
| RECIP_STRUCTURES_KIND | 116 | P130 | NCI_Preferred_Term | 47 |
| DISEASE_KIND | 4258 | P262794 | NUI | 4258 |
| INGREDIENT_KIND | 8421 | P262794 | NUI | 8421 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P262794 | NUI | 1763 |
| MECHANISM_OF_ACTION_KIND | 359 | P262794 | NUI | 359 |
| NDF_KIND | 120506 | P262794 | NUI | 120506 |
| RXNORM_DOSE_FORM_KIND | 174 | P262794 | NUI | 174 |
| RECIP_STRUCTURES_KIND | 116 | P262794 | NUI | 116 |
| PHARMACOKINETICS_KIND | 59 | P262794 | NUI | 59 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P145 | Review_Status | 102 |
| NDF_KIND | 120506 | P145 | Review_Status | 4679 |
| DISEASE_KIND | 4258 | P145 | Review_Status | 46 |
| MECHANISM_OF_ACTION_KIND | 359 | P145 | Review_Status | 56 |
| RXNORM_DOSE_FORM_KIND | 174 | P817 | RxNorm_Code | 140 |
| NDF_KIND | 120506 | P817 | RxNorm_Code | 12988 |
| NDF_KIND | 120506 | P818 | RxNorm_CUI | 12988 |
| RXNORM_DOSE_FORM_KIND | 174 | P818 | RxNorm_CUI | 140 |
| RXNORM_DOSE_FORM_KIND | 174 | P819 | RxNorm_Name | 140 |
| NDF_KIND | 120506 | P819 | RxNorm_Name | 12988 |
| NDF_KIND | 120506 | P262704 | RxNorm_Preferred_Translation_Term | 12903 |
| DISEASE_KIND | 4258 | P262795 | SNOMED_Mapping_ID | 4577 |
| DISEASE_KIND | 4258 | P262796 | SPL_Problem_List_Mapping_ID | 2478 |
| NDF_KIND | 120506 | P262790 | Status | 120616 |
| DISEASE_KIND | 4258 | P137 | Synonym | 34665 |
| PHYSIOLOGIC_EFFECT_KIND | 1763 | P137 | Synonym | 31 |
| MECHANISM_OF_ACTION_KIND | 359 | P137 | Synonym | 53 |
| INGREDIENT_KIND | 8421 | P137 | Synonym | 35386 |
| RECIP_STRUCTURES_KIND | 116 | P137 | Synonym | 232 |
| DISEASE_KIND | 4258 | P141 | UMLS_CUI | 4232 |
| RECIP_STRUCTURES_KIND | 116 | P141 | UMLS_CUI | 9 |
| MECHANISM_OF_ACTION_KIND | 359 | P141 | UMLS_CUI | 66 |
| INGREDIENT_KIND | 8421 | P141 | UMLS_CUI | 8415 |
| NDF_KIND | 120506 | P2627 | UMLS_Preferred_Translation_Term | 4297 |
| RECIP_STRUCTURES_KIND | 116 | P55 | UMLS_Semantic_Type | 36 |
| INGREDIENT_KIND | 8421 | P262793 | UNII | 434 |
| NDF_KIND | 120506 | P131 | VA_Class_Code | 572 |
| NDF_KIND | 120506 | P262718 | VA_Class_Description | 33 |
| NDF_KIND | 120506 | P262716 | VA_Class_Type | 572 |
| NDF_KIND | 120506 | P63 | VA_CS_Federal_Schedule | 3941 |
| NDF_KIND | 120506 | P279 | VA_Manufacturer_Code | 95781 |
| NDF_KIND | 120506 | P69 | VA_National_Formulary_Name | 17467 |
| NDF_KIND | 120506 | P121 | VA_NDC | 95781 |
| NDF_KIND | 120506 | P262792 | VA_Ndf_Type | 17424 |
| NDF_KIND | 120506 | P281 | VA_Package_Code | 95781 |
| NDF_KIND | 120506 | P115 | VA_Print_Name | 13385 |
| NDF_KIND | 120506 | P280 | VA_Product_Code | 95781 |
| NDF_KIND | 120506 | P71 | VA_RxOtc_Indicator | 8706 |
| NDF_KIND | 120506 | P101 | VA_Strength | 8509 |
| NDF_KIND | 120506 | P119 | VA_Trade_Name | 96042 |
| NDF_KIND | 120506 | P103 | VA_Units | 8511 |
| NDF_KIND | 120506 | P97 | VA_UPN | 432 |
| NDF_KIND | 120506 | P262778 | VUID | 24671 |
Next: UsingNdfRt
Attachments
- UNII-NDFRT.xls (0.6 MB) - added by gschadow on 10/23/07 21:03:38.
