Code of Federal Regulations (CFR)

 Structure of the CFR XML & XSD

The GPO website provides the XSD-Schema


The XSD Schema is generic for all Federal
Regulations. A good example of the Semantic Web Layers discussed previously. We
get the machine readable syntax, but no semantics for the particular

The header contain identifying
information about the document

The regulation text is structured in sections.





Section number with the paragraph reference
Subject of the section
And individual paragraphs



fund adviser

<P>(a) <E T=“03”>United
States investment advisers.
E> For
purposes of section 203(m) of the Act (15 U.S.C. 80b-3(m)), an investment
adviser with its principal office and place of business in the United States is
exempt from the requirement to register under section 203 of the Act if the
investment adviser:

Acts solely as an investment adviser to one or more qualifying private funds;

Manages private fund assets of less than $150 

Generating OWL classes from CFR XML

Topbraid Composer Maestro Edition11 (TBC) is our main ontology editor. 

  1. First we import the XSD-Schema.
    CFR XML import options

    XML import options

    There are various options to customize the import. The tool creates on OWL file CFRMergedXML.ttl with classes for the XSD elements.

  2. The second step is to open CFR-2012-title17-vol3-part275.xml with TCB’s “Semantic XML”. This show instances of the classes with the actual XML text fragments. Note that this is only a semantic rendition. The underlying file is still XML.
  3. Finally we export the graph to create save it as OWL. This is for convenience and performance, so we don’t have to repeat the import steps. 
    CFR files in the ontology editor

    CFR XML schema definition saved as ontology file.

    Validating the generated OWL classes


    <SECTNO>§ 275.203(m)-1SECTNO>

     <SUBJECT>Private fund adviser exemption.SUBJECT>

    <P>(1) Acts solely as an investment adviser to one or more qualifying private funds; andP>

     <P>(2) Manages private fund assets of less than $150 million.P>

    Above, our Private Fund Exemption rule section. Below the generated OWL. 
    The XSD elements became classes. The XML content has been generated as instances of the schema classes.
    The namespace prefix, fo-fr-cfr means Fund Ontology – Fund Regulation- Code of Federal Regulations

    We define a parent class, CFRpart275 and make all generated classes subclass of it. The list to the right shows the 50 generated classes and their number of instances. We should consider this a set of dumb Staging classes rather than ontology design. The core classes are section with 44 instances and paragraph with 757 instances. Most of the other classes become data properties. 

Validating SECTION § 275.203(m)-1

The TopBraid Composer Resource form shows the properties of our “Private fund adviser exemption” generated as an instance of SECTION.

CFR section Private Fund Exception resource form

CFR section Private Fund Exception resource form

The import generates properties “composite:” for nested elements in the XML:

composite:child” is the object property generated for all XML groups and elements within the SECTION: Section number, subject, note, citation and many paragraphs.

composite:index” is the data property for the element position


CFR paragraph Private Fund Exception resource form

CFR paragraph Private Fund Exception resource form

The paragraph in turn has a composite child for the actual text instantiated in “sxml:TextNote”.

To use the data warehousing ETL analogy, this is just an Extract of into simple Staging classes. rather
than ontology design
There are no semantics and it is hard and error prone to SPARQL query the data.

We need to Transform the
structure into true ontology design