You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project takes a MySQL Unified Medical Language System (UMLS) database and converts the ontologies to RDF using OWL and SKOS as the main schemas.
2
2
3
-
Virtual Appliance users can review the [documentation in the OntoPortal Administration Guide}(https://ontoportal.github.io/documentation/administration/ontologies/handling_umls).
3
+
Virtual Appliance users can review the [documentation in the OntoPortal Administration Guide](https://ontoportal.github.io/documentation/administration/ontologies/handling_umls).
4
4
5
-
To use it:
5
+
Recommended workflow:
6
6
7
-
* Specify your database connection conf.py
8
-
* Specify the SAB ontologies to export in umls.conf
7
+
* Install Python dependencies with <code>pip install -r requirements.txt</code>
8
+
* Configure <code>conf.py</code>
9
+
* Specify the SAB ontologies to export in <code>umls.conf</code>
10
+
* Run the full resumable import/export pipeline with <code>python run_umls_pipeline.py</code>
11
+
12
+
Generated TTL files are written under a versioned output directory based on
13
+
<code>OUTPUT_FOLDER</code> from <code>conf.py</code>. A common pattern is
14
+
<code>OUTPUT_FOLDER = "output/%s" % UMLS_VERSION.upper()</code>, which writes to
15
+
<code>output/2025AB</code>.
9
16
10
17
The umls.conf configuration file must contain one ontology per line. The lines are comma separated tuples where the elements are:
11
18
@@ -23,11 +30,59 @@ umls2rdf.py is designed to be an offline, run-once process.
23
30
It's memory intensive and exports all of the default ontologies in umls.conf in 3h 30min.
24
31
The ontologies listed in umls.conf are the UMLS ontologies accessible in [BioPortal](https://bioportal.bioontology.org/).
25
32
26
-
If you get an error when installing the MySQL-python python library, https://stackoverflow.com/questions/12218229/my-config-h-file-not-found-when-intall-mysql-python-on-osx-10-8 may be of help.
33
+
To download the full UMLS release archive outside the full pipeline, run:
34
+
35
+
<pre>
36
+
python download_umls.py
37
+
</pre>
38
+
39
+
The downloader returns the local path to the downloaded archive. This step only
40
+
fetches and extracts the pre-built UMLS release; you still need to load the
41
+
UMLS tables into MySQL before running <code>umls2rdf.py</code>. The script uses
42
+
<code>UMLS_VERSION</code> and <code>UMLS_API_KEY</code> from <code>conf.py</code>.
43
+
If <code>UMLS_DOWNLOAD_DIR</code> is set, the zip archive is stored under that
44
+
directory. If it is not set, the library default <code>~/.data/bio/umls</code>
45
+
is used. By default, the archive is extracted into an
46
+
<code>extracted</code> subdirectory next to the downloaded zip. You can override
47
+
that location with <code>UMLS_EXTRACT_DIR</code>.
48
+
49
+
To create the target MySQL database with explicit UTF-8 settings outside the
50
+
full pipeline, run:
51
+
52
+
<pre>
53
+
python create_mysql_db.py
54
+
</pre>
55
+
56
+
The script creates or updates <code>DB_NAME</code> from <code>conf.py</code>
57
+
with <code>utf8mb4</code> character set and
58
+
<code>utf8mb4_unicode_ci</code> collation.
59
+
60
+
To run the full UMLS pipeline end-to-end, use:
61
+
62
+
<pre>
63
+
python run_umls_pipeline.py
64
+
</pre>
65
+
66
+
The pipeline performs these stages:
67
+
68
+
* Download the configured UMLS full release archive
69
+
* Extract the release only when the extracted <code>META</code> directory is not already present
70
+
* Recreate the configured <code>DB_NAME</code> and load it with the extracted <code>META/populate_mysql_db.sh</code> script
71
+
* Run <code>umls2rdf.py</code>
27
72
28
-
If running a Windows 10 OS with MySQL, the following tips may be of help.
73
+
The pipeline patches loader settings from <code>conf.py</code> into a generated
74
+
copy of <code>populate_mysql_db.sh</code>, and it patches
75
+
<code>META/mysql_tables.sql</code> in place to replace
76
+
<code>@LINE_TERMINATION@</code>. Pipeline state is stored under
77
+
<code>PIPELINE_WORK_DIR</code> (default:
78
+
<code>data/pipeline/<UMLS_VERSION></code>) and reruns skip completed steps
79
+
after validating the extracted files, MySQL tables, and RDF output. Add
80
+
<code>MYSQL_HOME</code> to <code>conf.py</code>; if your MySQL client is at
81
+
<code>/usr/bin/mysql</code>, set <code>MYSQL_HOME = "/usr"</code>. Pipeline
82
+
stdout and stderr are appended to <code>PIPELINE_LOG_FILE</code> when set, or
83
+
to <code>data/pipeline/<UMLS_VERSION>/pipeline.log</code> by default.
29
84
30
-
- Install [MySQL 5.5](https://dev.mysql.com/downloads/mysql/5.5.html#downloads) to avoid the InnoDB space [disclaimer](https://www.nlm.nih.gov/research/umls/implementation_resources/scripts/README_RRF_MySQL_Output_Stream.html) by NLM.
31
-
-[Python 2.7.x](https://www.python.org/downloads/) should be used to avoid syntax errors on 'raise Attribute'
32
-
- For installtion of the MySQLdb module <pre>python -m pip install MySQLdb</pre> is error prone. Install with executable [MySQL-python-1.2.3.win-amd64-py2.7](http://www.codegood.com/archives/129) (last known location).
33
-
- Create your RRF subset(s) using mmsys with the MySQL load option, load your database, edit conf.py and umls.py to specifications, run umsl2rdf.py
85
+
If <code>PROCESS_ONLY_CURRENT_UMLS_VERSION</code> is set to <code>True</code>,
86
+
the exporter only processes ontologies whose <code>MRSAB.IMETA</code> exactly
87
+
matches <code>UMLS_VERSION</code>. Ontologies with a different value are skipped
0 commit comments