Structure of the CFR XML & XSD
The GPO website provides the XSD-Schema and documentation •CFRMergedXML.xsd •CFR-XML_User-Guide_v1.pdf The XSD Schema is generic for all Federal Regulations. A good example of the Semantic Web Layers discussed previously. We get the machine-readable syntax, but no semantics for the particular regulation. The header contains the identifying information about the document. We get the machine-readable syntax, but no semantics for the particular regulation. The header contains the identifying information about the document. | |
The regulation text is structured in sections. Section number with the paragraph reference, the subject of the section, and individual paragraphs. |
<SECTION>
>§
275.203(m)-1SECTNO>
<SUBJECT>Private
fund adviser exemption.SUBJECT>
<P>(a) <E T=“03”>United
States investment advisers.E> For
purposes of section 203(m) of the Act (15 U.S.C. 80b-3(m)), an investment
adviser with its principal office and place of business in the United States is
exempt from the requirement to register under section 203 of the Act if the
investment adviser:P>
<P>(1)
Acts solely as an investment adviser to one or more qualifying private funds;
andP>
<P>(2)
Manages private fund assets of less than $150 million.P>
Generating OWL classes from CFR XML
Topbraid Composer Maestro Edition11 (TBC) is our main ontology editor.
- First, we import the XSD-Schema.
There are various options to customize the import. The tool creates on OWL file CFRMergedXML.ttl with classes for the XSD elements.
Validating the generated OWL classes
<SECTION>
<SECTNO>§ 275.203(m)-1SECTNO>
<SUBJECT>Private fund adviser exemption.SUBJECT>
<P>(1) Acts solely as an investment adviser to one or more qualifying private funds; andP>
<P>(2) Manages private fund assets of less than $150 million.P>
•We define a parent class, CFRpart275 and make all generated classes subclass of it. The list to the right shows the 50 generated classes and their number of instances. We should consider this a set of dumb Staging classes rather than ontology design. The core classes are section with 44 instances and paragraph with 757 instances. Most of the other classes become data properties.
Validating SECTION § 275.203(m)-1
The TopBraid Composer Resource form shows the properties of our “Private fund adviser exemption” generated as an instance of SECTION.
The import generates properties “composite:” for nested elements in the XML:
•“composite:index” is the data property for the element position
The paragraph in turn has a composite child for the actual text instantiated in “sxml:TextNote”.
To use the data warehousing ETL analogy, this is just an Extract of into simple Staging classes. rather
than ontology design.
There are no semantics and it is hard and error prone to SPARQL query the data.
We need to Transform the
structure into true ontology design.