Code of Federal Regulations (CFR)

 Structure of the CFR XML & XSD

The GPO website provides the XSD-Schema
and documentation •CFRMergedXML.xsd •CFR-XML_User-Guide_v1.pdf
The XSD Schema is generic for all Federal
Regulations. A good example of the Semantic Web Layers discussed previously. We get the machine-readable syntax, but no semantics for the particular
regulation. The header contains the identifying information about the document.
We get the machine-readable syntax, but no semantics for the particular
regulation. The header contains the identifying information about the document.
The regulation text is structured in sections. Section number with the paragraph reference, the subject of the section, and individual paragraphs.

<SECTION>

>§
275.203(m)-1
SECTNO>

      <SUBJECT>Private
fund adviser
exemption.SUBJECT>

<P>(a) <E T=“03”>United
States investment advisers.
E> For
purposes of section 203(m) of the Act (15 U.S.C. 80b-3(m)), an investment
adviser with its principal office and place of business in the United States is
exempt from the requirement to register under section 203 of the Act if the
investment adviser:
P>

<P>(1)
Acts solely as an investment adviser to one or more qualifying private funds;
and
P>

 <P>(2)
Manages private fund assets of less than $150 
    million.P>


Generating OWL classes from CFR XML

Topbraid Composer Maestro Edition11 (TBC) is our main ontology editor. 

  1. First, we import the XSD-Schema.
CFR XML import options
XML import options

 
There are various options to customize the import. The tool creates on OWL file CFRMergedXML.ttl with classes for the XSD elements.

  • The second step is to open CFR-2012-title17-vol3-part275.xml with TCB’s “Semantic XML”. This show instances of the classes with the actual XML text fragments. Note that this is only a semantic rendition. The underlying file is still XML.
  • Finally we export the graph to create save it as OWL. This is for convenience and performance, so we don’t have to repeat the import steps. 
  • CFR files in the ontology editor
    CFR XML schema definition saved as ontology file.

    Validating the generated OWL classes

    <SECTION>

    <SECTNO>§ 275.203(m)-1SECTNO>

     <SUBJECT>Private fund adviser exemption.SUBJECT>

    <P>(1) Acts solely as an investment adviser to one or more qualifying private funds; andP>

     <P>(2) Manages private fund assets of less than $150 million.P>

    Above, our Private Fund Exemption rule section. Below the generated OWL. 
    The XSD elements became classes. The XML content has been generated as instances of the schema classes.
    The namespace prefix, fo-fr-cfr means Fund Ontology – Fund Regulation- Code of Federal Regulations

    We define a parent class, CFRpart275 and make all generated classes subclass of it. The list to the right shows the 50 generated classes and their number of instances. We should consider this a set of dumb Staging classes rather than ontology design. The core classes are section with 44 instances and paragraph with 757 instances. Most of the other classes become data properties. 


    Validating SECTION § 275.203(m)-1

    The TopBraid Composer Resource form shows the properties of our “Private fund adviser exemption” generated as an instance of SECTION.

    CFR section Private Fund Exception resource form
    CFR section Private Fund Exception resource form

    The import generates properties “composite:” for nested elements in the XML:

    composite:child” is the object property generated for all XML groups and elements within the SECTION: Section number, subject, note, citation and many paragraphs.

    composite:index” is the data property for the element position

     

    CFR paragraph Private Fund Exception resource form
    CFR paragraph Private Fund Exception resource form

    The paragraph in turn has a composite child for the actual text instantiated in “sxml:TextNote”.

    To use the data warehousing ETL analogy, this is just an Extract of into simple Staging classes. rather
    than ontology design
    .
    There are no semantics and it is hard and error prone to SPARQL query the data.

    We need to Transform the
    structure into true ontology design
    .