Structure of the CFR XML & XSD
The GPO website provides the XSD-Schema
The XSD Schema is generic for all Federal
The header contain identifying
The regulation text is structured in sections.
•Section number with the paragraph reference
•Subject of the section
•And individual paragraphs
fund adviser exemption.SUBJECT>
<P>(a) <E T=“03”>United
States investment advisers.E> For
purposes of section 203(m) of the Act (15 U.S.C. 80b-3(m)), an investment
adviser with its principal office and place of business in the United States is
exempt from the requirement to register under section 203 of the Act if the
Acts solely as an investment adviser to one or more qualifying private funds;
Manages private fund assets of less than $150 million.P>
Generating OWL classes from CFR XML
Topbraid Composer Maestro Edition11 (TBC) is our main ontology editor.
- First we import the XSD-Schema.
There are various options to customize the import. The tool creates on OWL file CFRMergedXML.ttl with classes for the XSD elements.
- The second step is to open CFR-2012-title17-vol3-part275.xml with TCB’s “Semantic XML”. This show instances of the classes with the actual XML text fragments. Note that this is only a semantic rendition. The underlying file is still XML.
- Finally we export the graph to create save it as OWL. This is for convenience and performance, so we don’t have to repeat the import steps.
Validating the generated OWL classes
<SUBJECT>Private fund adviser exemption.SUBJECT>
<P>(1) Acts solely as an investment adviser to one or more qualifying private funds; andP>
<P>(2) Manages private fund assets of less than $150 million.P>•Above, our Private Fund Exemption rule section. Below the generated OWL.•The XSD elements became classes. The XML content has been generated as instances of the schema classes.•The namespace prefix, fo-fr-cfr means Fund Ontology – Fund Regulation- Code of Federal Regulations
•We define a parent class, CFRpart275 and make all generated classes subclass of it. The list to the right shows the 50 generated classes and their number of instances. We should consider this a set of dumb Staging classes rather than ontology design. The core classes are section with 44 instances and paragraph with 757 instances. Most of the other classes become data properties.
Validating SECTION § 275.203(m)-1
The TopBraid Composer Resource form shows the properties of our “Private fund adviser exemption” generated as an instance of SECTION.
The import generates properties “composite:” for nested elements in the XML:
•“composite:index” is the data property for the element position
The paragraph in turn has a composite child for the actual text instantiated in “sxml:TextNote”.
To use the data warehousing ETL analogy, this is just an Extract of into simple Staging classes. rather
than ontology design.
There are no semantics and it is hard and error prone to SPARQL query the data.
We need to Transform the
structure into true ontology design.