[nSLUG] XML to database conversion

Ron Dewar Ron.Dewar at ccns.nshealth.ca
Thu Oct 23 10:00:32 ADT 2003


Much as I'd like to join in on the math rant, I have a question that
exposes my complete ignorance of things XML, and I would far rather do
that, then be exposed as a mathematician who graduated (lo! these many
years ago) and has differentiated a function maybe twice since then.

A supplier of standards has produced (and promises to maintain) a
lengthy set of standards in the form of a series of .xml files.  There
are schema and .dtd files that seem to be part of this.  However, they
are releasing the standards as a series of about 1800 .html files that
contain information such as 'use this code in this situation'.  These
would form the core of documentation for vendors wishing to build
software to implement coding using these standards[1].  It will be some
time before the vendors come up with a solution to address our needs,
which would require that the coding standards be available as oracle
tables, so our coders can code conditions according to those standards
into our database, which is oracle (text-based), and have some hope that
the codes used are validated against the published standard.  Having the
coders view the .html documents and then code into the database is a
solution (so is having a printed copy on their desk), but it does not
allow the database designers any way of checking to see if the code
entered is valid in the given situation.

In order to import these standards into our database, a team of people
is beavering away, cutting and pasting tables from browser displays of
these .html files into excel, then clearing out formatting, merging
cells and generally cleaning things up so they can be imported into the
waiting Oracle tables.  Error-prone process aside, if the maintainer
modifies a few lines in the underlying .xml system, one suspects that
many (perhaps all) resulting .html files could be different, and need
re-capturing.

Surely there is a better way to do this!  The standards maintainer will
not agree to publish the standards in any other way than the .xml or
.html files.

I have done some searching for tools to convert .xml data into other
forms, (ie, comma-delimited text files would be ideal) but have run up
against my ignorance of nomenclature.  It may be that what I am looking
at is the tool I need, but the language is so meta, that I don't
understand what the documentation is telling me.  I need some direction.

Does anyone have an idea of the way forward here?  A solution involving
a linux platform would be cool, but not a requirement.

Ron

[1] the standards being published relate to how one describes the
severity of cancer at the time of diagnosis - called staging.  see
http://www.cancerstaging.org/collab.html#1

CDHA Confidentiality: This email message may contain confidential
information and is intended only for the individual named. If you
are not the named addressee you should not disseminate, distribute
or copy this email. Please notify the sender immediately by email
if you have received this email by mistake and delete this email
from your system. Email transmissions cannot be guaranteed to be
secure or error free as information could be intercepted, corrupted, 
lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept any liability for errors or
omissions in the contents of this message that arise as a result 
of email transmissions. If verification is required please request a hard 
copy version.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nslug.ns.ca/mailman/private/nslug/attachments/20031023/887eb7a5/attachment.html>


More information about the nSLUG mailing list