U.S. patent application number 11/413051 was filed with the patent office on 2007-02-15 for method and system for an architecture for the processing of structured documents.
Invention is credited to Rakesh S. Bhakta, Daniel M. Cermak, Russell Davoli, John E. Derrick, Bryan Dobbs, Clifford L. Hall, Udi Kalekin, Howard Liu, Avinash C. Palaniswamy, Richard Trujillo.
Application Number | 20070038930 11/413051 |
Document ID | / |
Family ID | 37215515 |
Filed Date | 2007-02-15 |
United States Patent
Application |
20070038930 |
Kind Code |
A1 |
Derrick; John E. ; et
al. |
February 15, 2007 |
Method and system for an architecture for the processing of
structured documents
Abstract
Embodiments of systems, methods and apparatuses for an
architecture for the processing of structured documents are
disclosed. More specifically, embodiments of the architecture may
comprise hardware circuitry operable to parse a structured document
and transform the document according to a set of transformation
instructions to produce an output document.
Inventors: |
Derrick; John E.; (Austin,
TX) ; Trujillo; Richard; (Austin, TX) ;
Cermak; Daniel M.; (Austin, TX) ; Dobbs; Bryan;
(Round Rock, TX) ; Liu; Howard; (Plano, TX)
; Bhakta; Rakesh S.; (Austin, TX) ; Kalekin;
Udi; (Austin, TX) ; Davoli; Russell; (Austin,
TX) ; Hall; Clifford L.; (Austin, TX) ;
Palaniswamy; Avinash C.; (Austin, TX) |
Correspondence
Address: |
SPRINKLE IP LAW GROUP
1301 W. 25TH STREET
SUITE 408
AUSTIN
TX
78705
US
|
Family ID: |
37215515 |
Appl. No.: |
11/413051 |
Filed: |
April 27, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60675349 |
Apr 27, 2005 |
|
|
|
60675347 |
Apr 27, 2005 |
|
|
|
60675167 |
Apr 27, 2005 |
|
|
|
60675115 |
Apr 27, 2005 |
|
|
|
Current U.S.
Class: |
715/236 |
Current CPC
Class: |
G06F 40/221 20200101;
G06F 40/143 20200101; G06F 40/154 20200101 |
Class at
Publication: |
715/523 ;
715/513 |
International
Class: |
G06F 17/00 20060101
G06F017/00 |
Claims
1. An apparatus, comprising a parser circuit operable to parse a
structured document to create a first set of data structures; a
pattern expression processor circuit operable to create a second
set of data structures based on an output of the parser circuit; a
transformation engine circuit operable to generate a set of results
corresponding with an output document utilizing the first data
structure or second data structure, wherein the output document
corresponds to a transformation of the structured document
according to a set of transformation instructions; and an output
generator circuit, operable to create a set of output document
structures, associate the set of results generated by the
transformation engine with the set of output data structures and
assemble the output document from the output data structures.
2. The apparatus of claim 1, wherein the transformation engine
circuit is operable to execute a set of instructions generated from
the transformation instructions.
3. The apparatus of claim 2, comprising a host interface circuit
wherein the host interface circuit is operable to receive the
structured document and pass the structured document to the
parser.
4. The apparatus of claim 3, comprising a memory interface coupled
to each of the parser circuit, the pattern expression processor
circuit, the transformation engine circuit and the output generator
circuit, and operable to interface between each of the parser
circuit, the pattern expression processor circuit, the
transformation engine circuit, the output generator circuit and a
memory.
5. The apparatus of claim 4, wherein the first data structure, the
second data structure and the set of instructions are in the
memory.
6. The apparatus of claim 5, comprising a first bus coupled to each
of the parser circuit, the pattern expression processor circuit,
the transformation engine circuit and the output generator
circuit.
7. The apparatus of claim 6, comprising a second bus coupling the
parser to the pattern expression processor.
8. The apparatus of claim 7, comprising a bus coupling the
transformation engine to the output generator circuit.
9. A system, comprising: a compiler operable to generate a set of
instructions from a set of transformation instructions
corresponding to a structured document; and a document processor
circuit operable to execute the set of instructions generate an
output document corresponding to a transformation of the structured
document according to a set of transformation instructions, the
document processor circuit comprising: a parser circuit operable to
parse the structured document to create a first set of data
structures; a pattern expression processor circuit operable to
create a second set of data structures based on an output of the
parser circuit; a transformation engine circuit operable execute
the set of instructions to generate a set of results corresponding
with the output document utilizing the first data structure or
second data structure; and an output generator circuit, operable to
assemble the output document from the set of results.
10. The system of claim 9, wherein the transformation engine
circuit executes the set of instructions utilizing the first set of
data structures or second set of data structures.
11. The system of claim 10, wherein the document processor circuit
comprises a host interface circuit wherein the host interface
circuit is operable to receive a structured document and pass the
structured document to the parser.
12. The system of claim 11, further comprising a memory, wherein
the document processor circuit comprises a memory interface coupled
to each of the parser circuit, the pattern expression processor
circuit, the transformation engine circuit and the output generator
circuit, and operable to interface between each of the parser
circuit, the pattern expression processor circuit, the
transformation engine circuit, the output generator circuit and the
memory.
13. The apparatus of claim 12, wherein the first data structure,
the second data structure and the set of instructions are in the
memory.
14. The system of claim 13, wherein the document processor circuit
comprises a first bus coupled to each of the parser circuit, the
pattern expression processor circuit, the transformation engine
circuit and the output generator circuit.
15. The system of claim 14, wherein the document processor circuit
comprises a second bus coupling the parser to the pattern
expression processor.
16. The system of claim 15, wherein the document processor circuit
comprises a third bus coupling the transformation engine to the
output generator circuit.
17. A method, comprising: parsing a first structured document to
create a first data structure representative of the first
structured document; generating a second set of data structures,
each of the second set of data structures comprising a related set
of data associated with the first structured document; executing a
first set of instructions to generate a first set of results
associated with a first output document corresponding to a
transformation of the first structured document according to a
first set of transformation instructions, wherein the first set of
instructions were generated from the first set of transformation
instructions; and generating a first output document from the first
set of results, wherein generating the first output document
comprises assembling the first set of results in an order
corresponding to the first output document.
18. The method of claim 17, comprising: generating a first set of
output data structures; and associating the first set of results
with the first output data structures.
19. The method of claim 18, wherein executing the first set of
instructions further comprises accessing the first data structure
and the second set of data structures.
20. The method of claim 19, wherein generating the first output
document comprises formatting the set of results according to a
type of the first output document.
21. The method of claim 17, comprising executing a second set of
instructions to generate a second set of results associated with a
second output document corresponding to a transformation of the
second structured document according to a second set of
transformation instructions, wherein the second set of instructions
were generated from the first set of transformation instructions,
and the second set of instructions is executes substantially
simultaneously with the first set of instructions.
22. The method of claim 21, comprising generating a second output
document from the second set of results, wherein generating the
second output document comprises assembling the second set of
results in an order corresponding to the second output document,
and the second output document is generated substantially
simultaneously with the second.
23. The method of claim 22, wherein the first output document and
the second output document are output substantially as they are
generated.
Description
RELATED APPLICATIONS
[0001] This application claims a benefit of priority under 35
U.S.C. .sctn.119(e) to U.S. Provisional Patent Application Nos.
60/675,349, by inventors Howard Tsoi, Daniel Cermak, Richard
Trujillo, Trenton Grale, Robert Corley, Bryan Dobbs and Russell
Davoli, entitled "Output Generator for Use with System for Creation
of Multiple, Hierarchical Documents", filed on Apr. 27, 2005;
60/675,347, by inventors Daniel Cermak, Howard Tsoi, John Derrick,
Richard Trujillo, Udi Kalekin, Bryan Dobbs, Ying Tong, Brendon
Cahoon and Jack Matheson, entitled "Transformation Engine for Use
with System for Creation of Multiple, Hierarchical Documents",
filed on Apr. 27, 2005; 60/675,167, by inventors Richard Trujillo,
Bryan Dobbs, Rakesh Bhakta, Howard Tsoi, Jack Randall, Howard Liu,
Yongjian Zhou and Daniel Cermak, entitled "Parser for Use with
System for Creation of Multiple, Hierarchical Documents", filed on
Apr. 27, 2005 and 60/675,115, by inventors John Derrick, Richard
Trujillo, Daniel Cermak, Bryan Dobbs, Howard Liu, Rakesh Bhakta,
Udi Kalekin, Russell Davoli, Clifford Hall and Avinash Palaniswamy,
entitled "General Architecture for a System for Creation of
Multiple, Hierarchical Documents", filed on Apr. 27, 2005 the
entire contents of which are hereby expressly incorporated by
reference for all purposes.
TECHNICAL FIELD OF THE INVENTION
[0002] The invention relates in general to methods and systems for
processing structured documents, and more particularly, to the
design and implementation of efficient architectures for the
processing, transformation or rendering of structured
documents.
BACKGROUND OF THE INVENTION
[0003] Electronic data, entertainment and communications
technologies are growing increasingly prevalent with each passing
day. In the past, the vast majority of these electronic documents
were in a proprietary format. In other words, a particular
electronic document could only be processed or understood by the
application that created that document. Up until relatively
recently this has not been especially troublesome.
[0004] This situation became progressively more problematic with
the advent of networking technologies, however. These networking
technologies allowed electronic documents to be communicated
between different and varying devices, and as these network
technologies blossomed, so did user's desires to use these
networked devices to share electronic data.
[0005] Much to the annoyance of many users, however, the
proprietary formats of the majority of these electronic documents
prevented them from being shared between different platforms: if a
document was created by one type of platform it usually could not
be processed, or rendered, by another type of platform.
[0006] To that end, data began to be placed in structured
documents. Structured documents may be loosely defined as any type
of document that adheres to a set of rules. Because the structured
document conforms to a set of rules it enables the cross-platform
distribution of data, as an application or platform may process or
render a structured document based on the set of rules, no matter
the application that originally created the structured
document.
[0007] The use of structured documents to facilitate the
cross-platform distribution of data is not without its own set of
problems, however. In particular, in many cases the structured
document does not itself define how the data it contains is to be
rendered, for example for presentation to a user. Exacerbating the
problem is the size of many of these structured documents. To
facilitate the organization of data intended for generic
consumption these structured documents may contain a great deal of
meta-data, and thus may be larger than similar proprietary
documents, in some cases up to twenty times larger or more.
[0008] In many cases, instructions may be provided for how to
transform or render a particular structured document. For example,
one mechanism implemented as a means to facilitate processing XML
is the extensible stylesheet language (XSL) and stylesheets written
using XSL. Stylesheets may be written to transform XML documents
from one markup definition (or "vocabulary") defined within XML to
another vocabulary, from XML markup to another structured or
unstructured document form (such as plain text, word processor,
spreadsheet, database, pdf, HTML, etc.), or from another structured
or unstructured document form to XML markup. Thus, stylesheets may
be used to transform a document's structure from its original form
to a form expected by a given user (output form).
[0009] Typically, structured documents are transformed or rendered
with one or more software applications. However, as many
definitions for these structured languages were designed and
implemented without taking into account conciseness or efficiency
of parsing and transformation, the use of software applications to
transform or render these structured documents may be prohibitively
inefficient.
[0010] Thus, as can be seen, there is a need for methods and
systems for an architecture for the efficient processing of
structured documents.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The drawings accompanying and forming part of this
specification are included to depict certain aspects of the
invention. A clearer impression of the invention, and of the
components and operation of systems provided with the invention,
will become more readily apparent by referring to the exemplary,
and therefore nonlimiting, embodiments illustrated in the drawings,
wherein identical reference numerals designate the same components.
Note that the features illustrated in the drawings are not
necessarily drawn to scale.
[0012] FIG. 1 depicts an embodiment of an architecture for the
implementation of web services.
[0013] FIG. 2 depicts one embodiment of the processing of
structured documents using a document processor.
[0014] FIG. 3 depicts one embodiment of an architecture for a
device for the processing of structured documents.
[0015] FIG. 4 depicts one embodiment of an architecture for the
processing of structured documents utilizing an embodiment of the
device depicted in FIG. 3.
DETAILED DESCRIPTION
[0016] Embodiments of the invention and the various features and
advantageous details thereof are explained more fully with
reference to the nonlimiting embodiments that are illustrated in
the accompanying drawings and detailed in the following
description. Descriptions of well known starting materials,
processing techniques, components and equipment are omitted so as
not to unnecessarily obscure the invention in detail. Skilled
artisans should understand, however, that the detailed description
and the specific examples, while disclosing preferred embodiments
of the invention, are given by way of illustration only and not by
way of limitation. Various substitutions, modifications, additions
or rearrangements within the scope of the underlying inventive
concept(s) will become apparent to those skilled in the art after
reading this disclosure.
[0017] Reference is now made in detail to the exemplary embodiments
of the invention, examples of which are illustrated in the
accompanying drawings. Wherever possible, the same reference
numbers will be used throughout the drawings to refer to the same
or like parts (elements).
[0018] Before describing embodiments of the present invention it
may be useful to describe an exemplary architecture for a web
service. Although web services are known in the art, a description
of such an architecture may be helpful in better explaining the
embodiments of the invention depicted herein.
[0019] FIG. 1 depicts an embodiment of one such architecture for
implementing a web service. Typically, web services provide a
standard means of interoperating between different software
applications running on a variety of platforms and/or frameworks. A
web service provider 110 may provide a set of web services 112.
Each web service 112 may have a described interface, such that a
requestor may interact with the web service 112 according to that
interface.
[0020] For example, a user at a remote machine 120 may wish to use
a web service 112 provided by web service provider 110. To that end
the user may use a requester agent to communicate message 130 to a
service agent associated with the desired web service 112, where
the message is in a format prescribe by the definition of the
interface of the desired web service 112. In many cases, the
definition of the interface describes the message formats, data
types, transport protocols, etc. that are to be used between a
requester agent and a provider agent.
[0021] The message 130 may comprise data to be operated on by the
requested web service 112. More particularly, message 130 may
comprise a structured document and instructions for transforming
the structured document. For example, message 130 may be a SOAP
(e.g. Simple Object Access Protocol) message comprising an
eXtensible Markup Language (XML) document and an XSL Transformation
(XSLT) stylesheet associated with the XML document. It should be
noted that, in some cases, transformation instructions (e.g. a DTD,
schema, or stylesheet) may be embedded in a structured document,
for example, either directly or as a pointer. In such cases the
transformation instructions may be extracted from the document
before being utilized in any subsequent method or process.
[0022] Thus, in some cases the provider agent associated with-a
particular web service 112 may receive message 130; web service 112
may process the structured document of message 130 according to the
instructions for transforming the structured document included in
message 130; and the result 140 of the transformation returned to
the requester agent.
[0023] In some cases, many structured documents may be sent to a
particular web service 112 with one set of transformation
instructions, so that each of these documents may be transformed
according to the identical set of instructions. Conversely, one
structured document may be sent to a particular web service 112
with multiple sets of transformation instructions to be applied to
the structured document.
[0024] Hence, as can be seen from this brief overview of the
architecture for implementing web services 112, it may be highly
desired to process these structured documents as efficiently as
possible such that web services 112 may be used on many data sets
and large data sets without creating a bottleneck during the
processing of the structured documents and processing resources of
web service provider 110 may be effectively utilized.
[0025] Attention is now directed to embodiments of systems, methods
and apparatuses for a general architecture for the efficient
transformation or processing of structured documents. Embodiments
of the present invention may allow a transformation to be performed
on a structured document according to transformation instructions.
To this end, embodiments of the architecture may comprise logical
components including a parser, a pattern expression processor, a
transformation engine and an output generator, one or more of which
may be implemented in hardware circuitry, for example a hardware
processing device such as an Application Specific Integrated
Circuit (ASIC) which comprises all the above mentioned logical
components
[0026] More particularly, embodiments of the invention may compile
the transformation instructions to create instruction code and a
set of data structures. The parser parses the structured document
associated with the transformation instructions to generate
structures representative of the structured document. The pattern
expression processor (PEP) identifies data in the structured
document corresponding to definitions in the transformation
instructions. The transformation engine transforms the parsed
document or identified data according to the transformation
instructions and the output generator assembles this transformed
data into an output document.
[0027] By compiling transformation instructions corresponding to
the structured document, and processing the structured document
accordingly, certain efficiency advantages may be attained by
embodiments of the present invention.
[0028] Specifically, the transformation instructions may be
analyzed to determine which of the transformation instructions may
be executed substantially simultaneously, or in parallel, to speed
the transformation of a structured document (it will be understood
that for purposes of this disclosure that the occurrence of two
events substantially simultaneously indicates that each of the two
events may at least partially occur before the completion of the
other event). Similarly, by analyzing a structured document before
the transformation takes place, similar content in a structured
document may be identified such that any transformations on this
content may also be done substantially in parallel. Likewise, by
producing instruction code from transformation instructions where
the code is executable to transform at least a portion of a
structured document, multiple sets of instruction code
corresponding to various jobs, may also be executed in
parallel.
[0029] Certain other advantages may also accrue to the architecture
described according to embodiments of the present invention. As
mentioned above, in one embodiment the compiler may be implemented
in software and the logical components for the architecture
implemented in hardware. In many cases, transformation instructions
(e.g. stylesheets and/or schemas, etc.) may change relatively
infrequently as compared to the number of documents being
processed. For example, a given stylesheet may be applied to
multiple documents before any changes to a stylesheet are made
(e.g. to an updated stylesheet or to apply a different stylesheet
altogether). Accordingly, capturing the relatively invariant
information from the transformation instructions in data structures
that may be efficiently accessed by dedicated, custom hardware
(e.g. logical components) may provide a high performance solution
to the transformation of structured documents. Additionally, having
compilation of transformation instructions performed in software
provides the flexibility to accommodate different formats for
transformation instructions and to implement changes in the
language specifications for these transformation instructions
without having to change the custom hardware. For example, XSLT,
XPath, and XML schema may evolve and new features added to these
languages in the future. The compiler may be adapted to handle
these new features.
[0030] While the advantages discussed above have been discussed
with respect to a compiler implemented in software and logical
components implemented in hardware, in other embodiments, the
compiler may be implemented in hardware; one or more of the logical
components may be implemented in software; or both the logical
components and compiler may be implemented in a combination of
hardware and software.
[0031] Turning to FIG. 2, a block diagram for the transformation of
structured documents using embodiments of the present invention is
depicted. A structured document may be received at a web service
112 from a variety of sources such as a file server, database,
internet connection, etc. Additionally, a set of transformation
instructions, for example an XSLT stylesheet, may also be received.
Document processor 210 may apply the transformation instructions to
the structured document to generate an output document which may be
returned to the requesting web service 112, which may, in turn,
pass the output document to the requestor.
[0032] In one embodiment, compiler 220, which may comprise software
(i.e. a plurality of instructions) executed on one or more
processors (e.g. distinct from document processor 210) may be used
to compile the transformation instructions to generate data
structures and instruction code in memory 270 for use by document
processor 210. Document processor 210 may be one or more ASICs
operable to utilize the data structures and instruction code
generated by compiler 220 to generate an output document.
[0033] FIG. 3 depicts a block diagram of one embodiment of an
architecture for a document processor operable to produce an output
document from a structured document. Document processor 210
comprises Host Interface Unit (HIU) 310, Parser 320, PEP 330,
Transformation Engine (TE) 340, Output Generator (OG) 350, each of
which is coupled to memory interface 360, to Local Command Bus
(LCB) 380 and, in some embodiments, to one another through signal
lines or shared memory 270 (e.g. a source unit may write
information to be communicated to a destination unit to the shared
memory and the destination unit may read the information from the
shared memory), or both. Shared memory 270 may be any type of
storage known in the art, such as RAM, cache memory, hard-disk
drives, tape devices, etc.
[0034] HIU 310 may serve to couple document processor 210 to one or
more host processors (not shown). This coupling may be
accomplished, for example, using a Peripheral Component
Interconnect eXtended (PCI-X) bus. HIU 310 also may provide an
Applications Programming Interface (API) through which document
processor 210 can receive jobs. Additionally, HIU 310 may interface
with LCB 380 such that various tasks associated with these jobs may
be communicated to components of document processor 210.
[0035] In one embodiment, these jobs may comprise context data,
including a structured document and the data structures and
instruction code generated from the transformation instructions by
the compiler. Thus, the API may allow the context data to be passed
directly to HIU 310, or, in other embodiments, may allow references
to one or more locations in shared memory 270 where context data
may be located to be provided to HIU 310. HIU 310 may maintain a
table of the various jobs received through this API and direct the
processing of these jobs by document processor 210. By allowing
multiple jobs to be maintained by HIU 310, these jobs may be
substantially simultaneously processed (e.g. processed in parallel)
by document processor 210, allowing document processor 210 to be
more efficiently utilized (e.g. higher throughput of jobs and lower
latency).
[0036] Parser 320 may receive and parse a structured document,
identifying data in the structured document for PEP 330 and
generating data structures comprising data from the structured
document by, for example, creating data structures in shared memory
270 for use by TE 340 or OG 350. An exemplary embodiment of parser
320 is illustrated in Appendix A.
[0037] PEP 330 receives data from parser 320 identifying data of
the structured document being processed and compares data
identified by the parser 320 against expressions identified in the
transformation instructions. PEP 330 may also create one or more
data structures in shared memory 270, where the data structures
comprises a list of data in the structured document which match
expressions. An exemplary embodiment of PEP 330 is illustrated in
Appendix A.
[0038] Transformation engine 340 may access the data structures
built by parser 320 and PEP 330 and execute instruction code
generated by compiler 220 and stored in memory 270 to generate
results for the output document. In some embodiments, one or more
instructions of the instruction code generated by compiler 220 may
be operable to be independently executed (e.g. execution of one
instruction does not depend directly on the result of the output of
the execution of another instruction), and thus execution of the
instruction code by transformation engine 340 may occur in
substantially any order. An exemplary embodiment of a
transformation engine is illustrated in Appendix A.
[0039] Output generator 350 may assemble the results generated by
transformation engine 340 in an order specified by the
transformation instructions or corresponding to the structured
document and provide the output document to the initiating web
service 112 through HIU 310, for example, by signaling the web
service 112 or a host processor that the job is complete and
providing a reference to a location in memory 270 where an output
document exists. An exemplary embodiment of an output generator is
illustrated in Appendix A.
[0040] While it should be understood that embodiments of the
present invention may be applied with respect to almost any
structured document (e.g. a document having a defined structure
that can be used to interpret the content) whether the content is
highly structured (such as an XML document, HTML document, .pdf
document, word processing document, database, etc.) or loosely
structured (such as a plain text document whose structure may be,
e.g., a stream of characters) and associated transformation
instructions (which is used generally referred to a file which may
be used with reference to a structured document e.g. document type
definitions (.dtd) schema such as .xsd files, XSL transformation
files, etc.) for the structured document, it may be helpful to
illustrate various embodiments of the present invention with
respect to a particular example of a structured document and
transformation instructions.
[0041] Generally, an XML document is a structured document which
has a hierarchical tree structure, where the root of the tree
identifies the document as a whole and each other node in the
document is a descendent of the root. Various elements, attributes,
and document content form the nodes of the tree. The elements
define the structure of the content that the elements contain. Each
element has an element name, and the element delimits content using
a start tag and an end tag that each include the element name. An
element may have other elements as sub-elements, which may further
define the structure of the content. Additionally, elements may
include attributes (included in the start tag, following the
element name), which are name/value pairs that provide further
information about the element or the structure of the element
content. XML documents may also include processing instructions
that are to be passed-to the application reading the XML document,
comments, etc.
[0042] An XSLT stylesheet is a set of transformation instructions
which may be viewed as a set of templates. Each template may
include: (i) an expression that identifies nodes in a document's
tree structure; and (ii) a body that specifies a corresponding
portion of an output document's structure for nodes of the source
document identified by the expression. Applying a stylesheet to a
source document may comprise attempting to find a matching template
for one or more nodes in the source document, and instantiating the
structures corresponding to the body of the matching template in an
output document.
[0043] The body of a template may include one or more of: (i)
literal content to be instantiated in the output document; (ii)
instructions for selection of content from the matching nodes to be
copied into the output document; and (iii) statements that are to
be evaluated, with the result of the statements being instantiated
in the output document. Together, the content to be instantiated
and the statements to be evaluated may be referred to as "actions"
to be performed on the nodes that match the template.
[0044] The body of a template may include one or more "apply
templates" statements, which include an expression for selecting
one or more nodes and causing the templates in the stylesheet to be
applied to the selected nodes, thus effectively nesting the
templates. If a match to the apply templates statement is found,
the resulting template is instantiated within the instantiation of
the template that includes the apply templates statement. Other
statements in the body of the template may also include expressions
to be matched against nodes (and the statements may be evaluated on
the matching nodes).
[0045] The expressions used in a stylesheet may generally comprise
node identifiers and/or values of nodes, along with operators on
the node identifiers to specify parent/child (or
ancestor/descendant) relationships among the node identifiers
and/or values. Expressions may also include predicates, which may
be extra condition(s) for matching a node. A predicate is an
expression that is evaluated with the associated node as the
context node (defined below), where the result of the expression is
either true (and the node may match the expression node) or false
(and the node does not match the expression). Thus, an expression
may be viewed as a tree of nodes to be matched against a document's
tree.
[0046] A given document node may satisfy an expression if the given
document node is selected via evaluation of the expression. That
is, the expression node identifiers in the expression match the
given document node's identifier or document node identifiers
having the same relationship to the given document node as
specified in the expression, and any values used in the expression
are equal to corresponding values related to the given document
node.
[0047] A document node may also be referred to as a "matching node"
for a given expression if the node satisfies the given expression.
In some cases in the remainder of this discussion, it may be
helpful for clarity to distinguish nodes in expression trees from
nodes in a structured document. Thus, a node may be referred to as
an "expression node" if the node is part of an expression tree, and
a node may be referred to as a "document node" if the node is part
of the document being processed. A node identifier may comprise a
name (e.g. element name, attribute name, etc.) or may comprise an
expression construct that identifies a node by type (e.g. a node
test expression may match any node, or a text test expression may
match any text node). In some cases, a name may belong to a
specific namespace. In such cases, the node identifier may be a
name associated with a namespace. In XML, the namespace provides a
method of qualifying element and attribute names by associating
them with namespace names. Thus, the node identifier may be the
qualified name (the optional namespace prefix, followed by a colon,
followed by the name). A name, as used herein (e.g. element name,
attribute name, etc.) may include a qualified name. Again, while
XSLT stylesheets may be used in one example herein of
transformation instructions, generally a "transformation
instructions" may comprise any specification for transforming a
source document to an output document, which may encompass, for
example, statements indented to identify data of the source
document or statements for how to transform data of the source
document. The source and output documents may be in the same
language (e.g. the source and output documents may be different XML
vocabularies), or may differ (e.g. XML to pdf, etc.).
[0048] Moving now to FIG. 4, an example application of one
embodiment of the present invention to an XML document and an XSLT
stylesheet is illustrated. It is noted that, while the description
herein may include examples in which transformation instructions
are applied to a single source document, other examples may include
applying multiple sets of transformation instructions to a source
document (either concurrently or serially, as desired) or applying
a set of transformation instructions to multiple source documents
(either concurrently with context switching or serially, as
desired).
[0049] Returning to the example of FIG. 4, an XML document and an
associated XSL stylesheet may be received by web service 112. Web
service 112 may invoke embodiments of the present invention to
transform the received document according to the received
stylesheet. More specifically, in one embodiment, compiler 220 may
be used to compile the XSL stylesheet to generate data structures
and instruction code for use by document processor 210. Compiler
220 may assign serial numbers to node identifiers in the stylesheet
so that expression evaluation may be performed by document
processor 210 by comparing numbers, rather than node identifiers
(which would involve character string comparisons).
[0050] Compiler 220 may also store a mapping of these node
identifiers to serial numbers in one or more symbol tables 410 in
memory 270. Additionally, compiler 220 may extract the expressions
from the stylesheet and generate expression tree data structures in
memory 270 to be used by the document processor 210 for expression
matching (e.g. one or more parse-time expression trees 420
comprising expression nodes). Still further, compiler 220 may
generate an instruction table 430 in memory 270 with instructions
to be executed for one or more matching expressions. The
instructions in the instruction table 430 may be executable by
document processor 210 that, when executed by the document
processor 210, may result in performing the actions defined when an
expression associated with the instruction is matched. In some
embodiments, the instructions may comprise the actions to be
performed (i.e. there may be a one-to-one correspondence between
instructions and actions). In other embodiments, at least some
actions may be realized by executing two or more instructions. The
compiler may also generate whitespace tables 440 defining how
various types of whitespace in the source document are to be
treated (e.g. preserved, stripped, etc.), an expression list table
450, a template list table 460 and one or more DTD tables 462 to
map entity references to values or specify default values for
attributes.
[0051] At this point, processing of the source document by document
processor 210 may begin. Parser 320 receives the structured
document and accesses the symbol tables 410, whitespace tables 440,
or DTD tables 462 in memory 470 to parse the structured document,
identify document nodes, and generate events (e.g. to identify
document nodes parsed from the document) to PEP 330. More
particularly, parser 320 converts node identifiers in the source
document to corresponding serial numbers in the symbol tables 410,
and transmits these serial numbers as part of the events to the PEP
330. Additionally, parser 320 may generate a parsed document tree
470 representing the structure of the source document in memory.
Nodes of the parsed document tree may reference corresponding
values stored in one or more parsed content tables 472 created in
memory by parser 320. PEP 330 receives events from the parser 320
and compares identified document nodes (e.g. based on their serial
numbers) against parse-time expression tree(s) 420 in memory 270.
Matching document nodes are identified and recorded in template or
expression match lists 480 in memory 270.
[0052] Transformation engine 340 executes instructions from
instruction table 430. When executing these instructions,
transformation engine 430 may accesses the template or expression
match lists 480, the parsed document tree 470, the parsed content
tables 472 or the instruction table 430 in memory 270. These
instructions may, in turn, be associated with one or more templates
of a stylesheet. Transformation engine 340 may execute the
instructions on each of the document nodes that matches the
expression associated with the template, for example to transform
or format document nodes according to the template. Transformation
engine 340 may request that the results of the execution of these
instructions to be stored in one or more output data structures 490
in memory 270. Thus, as transformation engine 340 executes
instructions of instruction table 430, a set of output data
structures 490 are created in memory 270 representing the structure
of an output document, and content for the output document placed
in, or associated with, these output data structures 490.
[0053] Output generator 350 may receive results from transformation
engine 340 for storing in output data structures 490 in memory 270.
Output generator may access these output data structures 490 or
data structures 410, 420, 450, 460, 470, 472 created by parser 320
or PEP 330 to assemble an output document. In some embodiments,
output generator 350 may access a set of formatting parameters for
the assembly of the output document. After the output document is
assembled, or as the output document is being assembled, the output
document (or portions thereof) may be returned to the proper web
service 112.
[0054] In the foregoing specification, the invention has been
described with reference to specific embodiments. However, one of
ordinary skill in the art appreciates that various modifications
and changes can be made without departing from the scope of the
invention as set forth in the claims below. Accordingly, the
specification and figures are to be regarded in an illustrative
rather than a restrictive sense, and all such modifications are
intended to be included within the scope of invention. For example,
it will be apparent to those of skill in the art that although the
present invention has been described with respect to a protocol
controller in a routing device the inventions and methodologies
described herein may be applied in any context which requires the
determination of the protocol of a bit stream.
[0055] Benefits, other advantages, and solutions to problems have
been described above with regard to specific embodiments. However,
the benefits, advantages, solutions to problems, and any
component(s) that may cause any benefit, advantage, or solution to
occur or become more pronounced are not to be construed as a
critical, required, or essential feature or component of any or all
the claims.
* * * * *