U.S. patent application number 11/413070 was filed with the patent office on 2007-01-18 for method, system and apparatus for an output generator for use in the processing of structured documents.
Invention is credited to Daniel M. Cermak, Robert A. Corley, Russell Davoli, Bryan Dobbs, Trenton J. Grale, Richard Trujillo, Howard Tsoi.
Application Number | 20070012601 11/413070 |
Document ID | / |
Family ID | 37215515 |
Filed Date | 2007-01-18 |
United States Patent
Application |
20070012601 |
Kind Code |
A1 |
Tsoi; Howard ; et
al. |
January 18, 2007 |
Method, system and apparatus for an output generator for use in the
processing of structured documents
Abstract
Embodiments of systems, methods and apparatuses for an output
generator for generating output from processed content of a
structured document are disclosed. More specifically, embodiments
of an output generator may comprise hardware circuitry operable to
order data resulting from the transformation of a structured
document as it is generated and format this data according to a
format of a corresponding output document to generate output
corresponding to the output document.
Inventors: |
Tsoi; Howard; (Austin,
TX) ; Cermak; Daniel M.; (Austin, TX) ;
Trujillo; Richard; (Austin, TX) ; Grale; Trenton
J.; (Austin, TX) ; Corley; Robert A.; (Cedar
Park, TX) ; Dobbs; Bryan; (Round Rock, TX) ;
Davoli; Russell; (Austin, TX) |
Correspondence
Address: |
Blakely, Sokoloff, Taylor & Zafman LLP
1279 Oakmead Parkway
Sunnyvale
CA
94085-4040
US
|
Family ID: |
37215515 |
Appl. No.: |
11/413070 |
Filed: |
April 27, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60675349 |
Apr 27, 2005 |
|
|
|
60675347 |
Apr 27, 2005 |
|
|
|
60675167 |
Apr 27, 2005 |
|
|
|
60675115 |
Apr 27, 2005 |
|
|
|
Current U.S.
Class: |
209/534 |
Current CPC
Class: |
G06F 40/154 20200101;
G06F 40/221 20200101; G06F 40/143 20200101 |
Class at
Publication: |
209/534 |
International
Class: |
B07C 5/00 20060101
B07C005/00 |
Claims
1. An apparatus, comprising a hardware circuit operable to
determine that a first output document data stored in an output
data structure associated with the first output document is in an
order corresponding to the first output document, obtain content
for the first output document based on the first output document
data, format the content according to a type of the first output
document and generate output comprising the content, wherein the
output comprises a first portion of the output document.
2. The apparatus of claim 1, wherein the hardware circuit comprises
an order control circuit operable to create the output data
structure associated with the first output document, receive the
first output document data to be stored in the output data
structure and store the first output document data to the output
data structure, wherein the first output document data may be
received in an order differing from an order corresponding to the
first output document.
3. The apparatus of claim 2, comprising an output walker circuit
operable to traverse the first output document and obtain a
reference related to the first output document data based on a walk
event command from the order control unit.
4. The apparatus of claim 3, comprising a value extractor circuit
operable to obtain content for the first output document based on a
command from the output walker circuit, wherein the first command
comprises the reference and format the content according to a type
of the first output document to generate a set of characters
comprising the first portion of the output document.
5. The apparatus of claim 4, comprising an output formatter circuit
operable to encode the set of characters according to an encoding
scheme.
6. The apparatus of claim 5, wherein the hardware circuit is
operable to generate output comprising a second portion of the
first output document and a first portion of a second output
document, wherein the first document is distinct from the second
document and the output comprising the first portion of the second
output document is generated after the output comprising the first
portion of the first document and before the second portion of the
first document.
7. The apparatus of claim 6, wherein the type of the first output
document and a type of the second output document is extensible
Markup Language (XML), Hyper Text Markup Language (HTML) or
text.
8. The apparatus of claim 7, wherein the type of the first output
document is different from the type of the second output
document.
9. A system, comprising: a transformation engine circuit operable
to generate first output document data associated with a first
output document, where the first output document data is generated
in an order that does not correspond to an order for the first
output document and store this first output document data to a
first output data structure associated with the first output
document; and an output generator circuit operable to determine
that the first output document data stored in the first output data
structure associated with the first output document is in an order
corresponding to the first output document, obtain content for the
first output document based on the first output document data,
format the content according to a type of the first output document
and generate output comprising the content, wherein the output
comprises a first portion of the first output document.
10. The system of claim 9, wherein the transformation engine
circuit is operable to send a first command and a second command to
the output generator circuit, wherein the first command is
executable to create the first output data structure and the second
command is executable to store the first output document data to
the first output data structure and the output generator circuit is
operable to execute the first command and second command to create
the first output data structure associated with the first output
document and store the first output document data to the first
output data structure.
11. The system of claim 9, wherein the transformation engine
circuit is operable to generate second output document data
associated with a second output document which does not correspond
to an order for the second output document and store this second
output document data to a second output data structure associated
with a second output document; and the output generator circuit is
operable to generate output comprising a first portion of the
second output document.
12. The system of claim 11, wherein the transformation engine
circuit is operable to generate third output document data
associated with the first output document and store this third
output document data to the first output data structure associated
with the first output document and the output generator circuit is
operable to generate output comprising a second portion of the
first output document.
13. The system of claim 12, wherein the first output document is
distinct from the second output document.
14. The system of claim 13, wherein the output comprising the first
portion of the second output document is generated after the output
comprising the first portion of the first output document and
before the output comprising the second portion of the first output
document.
15. The system of claim 14, wherein the type of the first output
document and a type of the second output document is XML, HTML or
text.
16. The system of claim 15, wherein the type of the first output
document is different from the type of the second output
document.
17. A method, comprising: in an output generator circuit,
determining that first output document data associated with a first
output document in a first output data structure associated with a
first output document is in an order corresponding to the first
output document; generating content based on the first output
document data; formatting the content according to a type of the
first output document; and generating output comprising the
content, wherein the output comprises a first portion of the first
output document.
18. The method of claim 17, comprising receiving the first output
document data, wherein the first output document data is received
in an order that does not correspond to the order for the first
output document; and storing this first output document data to the
first output document data structure.
19. The method of claim 17, comprising: generating content based on
second output document data; formatting the content according to a
type of the second output document; and generating output
comprising the content, wherein the output comprises a first
portion of a second output document.
20. The method of claim 19, wherein the second output document is
distinct from the first output document.
21. The method of claim 20, comprising generating output comprising
a third portion of a second document, wherein the output comprising
the first portion of the second output document is generated after
the output comprising the first portion of the first output
document and before the output comprising the second portion of the
first output document.
22. The method of claim 21, wherein the type of the first output
document and a type of the second output document is XML, HTML or
text.
23. The method of claim 22, wherein the type of the first output
document is different from the type of the second output document.
Description
RELATED APPLICATIONS
[0001] This application claims a benefit of priority under 35
U.S.C. .sctn. 119(e) to U.S. Provisional Patent Application Nos.
60/675,349, by inventors Howard Tsoi, Daniel Cermak, Richard
Trujillo, Trenton Grale, Robert Corley, Bryan Dobbs and Russell
Davoli, entitled "Output Generator for Use with System for Creation
of Multiple, Hierarchical Documents", filed on Apr. 27, 2005;
60/675,347, by inventors Daniel Cermak, Howard Tsoi, John Derrick,
Richard Trujillo, Udi Kalekin, Bryan Dobbs, Ying Tong, Brendon
Cahoon and Jack Matheson, entitled "Transformation Engine for Use
with System for Creation of Multiple, Hierarchical Documents",
filed on Apr. 27, 2005; 60/675,167, by inventors Richard Trujillo,
Bryan Dobbs, Rakesh Bhakta, Howard Tsoi, Jack Randall, Howard Liu,
Yongjian Zhou and Daniel Cermak, entitled "Parser for Use with
System for Creation of Multiple, Hierarchical Documents", filed on
Apr. 27, 2005 and 60/675,115, by inventors John Derrick, Richard
Trujillo, Daniel Cermak, Bryan Dobbs, Howard Liu, Rakesh Bhakta,
Udi Kalekin, Russell Davoli, Clifford Hall and Avinash Palaniswamy,
entitled "General Architecture for a System for Creation of
Multiple, Hierarchical Documents", filed on Apr. 27, 2005 the
entire contents of which are hereby expressly incorporated by
reference for all purposes.
TECHNICAL FIELD OF THE INVENTION
[0002] The invention relates in general to methods, systems and
apparatuses for processing structured documents, and more
particularly, to the efficiently generating output resulting from
the processing, transformation or rendering of structured
documents.
BACKGROUND OF THE INVENTION
[0003] Electronic data, entertainment and communications
technologies are growing increasingly prevalent with each passing
day. In the past, the vast majority of these electronic documents
were in a proprietary format. In other words, a particular
electronic document could only be processed or understood by the
application that created that document. Up until relatively
recently this has not been especially troublesome.
[0004] This situation became progressively more problematic with
the advent of networking technologies, however. These networking
technologies allowed electronic documents to be communicated
between different and varying devices, and as these network
technologies blossomed, so did user's desires to use these
networked devices to share electronic data.
[0005] Much to the annoyance of many users, however, the
proprietary formats of the majority of these electronic documents
prevented them from being shared between different platforms: if a
document was created by one type of platform it usually could not
be processed, or rendered, by another type of platform.
[0006] To that end, data began to be placed in structured
documents. Structured documents may be loosely defined as any type
of document that adheres to a set of rules. Because the structured
document conforms to a set of rules it enables the cross-platform
distribution of data, as an application or platform may process or
render a structured document based on the set of rules, no matter
the application that originally created the structured
document.
[0007] The use of structured documents to facilitate the
cross-platform distribution of data is not without its own set of
problems, however. In particular, in many cases the structured
document does not itself define how the data it contains is to be
rendered, for example for presentation to a user. Exacerbating the
problem is the size of many of these structured documents. To
facilitate the organization of data intended for generic
consumption these structured documents may contain a great deal of
meta-data, and thus may be larger than similar proprietary
documents, in some cases up to twenty times larger or more.
[0008] In many cases, instructions may be provided for how to
transform or render a particular structured document. For example,
one mechanism implemented as a means to facilitate processing XML
is the extensible stylesheet language (XSL) and stylesheets written
using XSL. Stylesheets may be written to transform XML documents
from one markup definition (or "vocabulary") defined within XML to
another vocabulary, from XML markup to another structured or
unstructured document form (such as plain text, word processor,
spreadsheet, database, pdf, HTML, etc.), or from another structured
or unstructured document form to XML markup. Thus, stylesheets may
be used to transform a document's structure from its original form
to a form expected by a given user (output form).
[0009] Typically, structured documents are transformed or rendered
with one or more software applications. However, as many
definitions for these structured languages were designed and
implemented without taking into account conciseness or efficiency
of parsing and transformation, the use of software applications to
transform or render these structured documents may be prohibitively
inefficient.
[0010] Thus, as can be seen, there is a need for methods and
systems for an architecture for the efficient processing of
structured documents and the generation of output from the
processing of these structured documents.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The drawings accompanying and forming part of this
specification are included to depict certain aspects of embodiments
of the invention. A clearer impression of embodiments of the
invention, and of the components and operation of systems provided
with embodiments of the invention, will become more readily
apparent by referring to the exemplary, and therefore nonlimiting,
embodiments illustrated in the drawings, wherein identical
reference numerals designate the same components. Note that the
features illustrated in the drawings are not necessarily drawn to
scale.
[0012] FIG. 1 depicts an embodiment of an architecture for the
implementation of web services.
[0013] FIG. 2 depicts an embodiment of an architecture for an
output generator.
[0014] FIG. 3 depicts one embodiment for the processing of
structured documents utilizing a document processor.
[0015] FIG. 4 depicts one embodiment of an architecture for a
device for the processing of structured documents.
[0016] FIG. 5 depicts one embodiment of an architecture for the
processing of structured documents utilizing an embodiment of the
device depicted in FIG. 4.
[0017] FIG. 6 depicts an embodiment of the interface between a
transformation engine and an output generator.
[0018] FIG. 7 depicts one embodiment of communications between a
transformation engine and an output generator.
[0019] FIG. 8 depicts one embodiment of an order control unit and
an output walker.
[0020] FIG. 9 depicts one embodiment of an output data
structure.
[0021] FIG. 10 depicts one embodiment of a value extractor.
[0022] FIG. 11 depicts one embodiment of a value extractor and an
output formatter.
DETAILED DESCRIPTION
[0023] Embodiments of the invention and the various features and
advantageous details thereof are explained more fully with
reference to the nonlimiting embodiments that are illustrated in
the accompanying drawings and detailed in the following
description. Descriptions of well known starting materials,
processing techniques, components and equipment are omitted so as
not to unnecessarily obscure the invention in detail. Skilled
artisans should understand, however, that the detailed description
and the specific examples, while disclosing preferred embodiments
of the invention, are given by way of illustration only and not by
way of limitation. Various substitutions, modifications, additions
or rearrangements within the scope of the underlying inventive
concept(s) will become apparent to those skilled in the art after
reading this disclosure.
[0024] Reference is now made in detail to the exemplary embodiments
of the invention, examples of which are illustrated in the
accompanying drawings. Wherever possible, the same reference
numbers will be used throughout the drawings to refer to the same
or like parts (elements).
[0025] Before describing embodiments of the present invention it
may be useful to describe an exemplary architecture for a web
service. Although web services are known in the art, a description
of such an architecture may be helpful in better explaining the
embodiments of the invention depicted herein.
[0026] FIG. 1 depicts an embodiment of one such architecture for
implementing a web service. Typically, web services provide a
standard means of interoperating between different software
applications running on a variety of platforms and/or frameworks. A
web service provider 110 may provide a set of web services 112.
Each web service 112 may have a described interface, such that a
requestor may interact with the web service 112 according to that
interface.
[0027] For example, a user at a remote machine 120 may wish to use
a web service 112 provided by web service provider 110. To that end
the user may use a requestor agent to communicate message 130 to a
service agent associated with the desired web service 112, where
the message is in a format prescribe by the definition of the
interface of the desired web service 112. In many cases, the
definition of the interface describes the message formats, data
types, transport protocols, etc. that are to be used between a
requester agent and a provider agent.
[0028] The message 130 may comprise data to be operated on by the
requested web service 112. More particularly, message 130 may
comprise a structured document and instructions for transforming
the structured document. For example, message 130 may be a SOAP
(e.g. Simple Object Access Protocol) message comprising an
eXtensible Markup Language (XML) document and an eXstensible Style
Sheet Language Transformation (XSLT) stylesheet associated with the
XML document. It should be noted that, in some cases,
transformation instructions (e.g. a Document Type Definition (DTD),
schema, or stylesheet) may be embedded in a structured document,
for example, either directly or as a pointer. In such cases the
transformation instructions may be extracted from the document
before being utilized in any subsequent method or process.
[0029] Thus, in some cases the provider agent associated with a
particular web service 112 may receive message 130; web service 112
may process the structured document of message 130 according to the
instructions for transforming the structured document included in
message 130; and the result 140 of the transformation returned to
the requestor agent.
[0030] In some cases, many structured documents may be sent to a
particular web service 112 with one set of transformation
instructions, so that each of these documents may be transformed
according to the identical set of instructions. Conversely, one
structured document may be sent to a particular web service 112
with multiple sets of transformation instructions to be applied to
the structured document.
[0031] Hence, as can be seen from this brief overview of the
architecture for implementing web services 112, it may be highly
desired to process these structured documents as efficiently as
possible such that web services 112 may be used on many data sets
and large data sets without creating a bottleneck during the
processing of the structured documents and processing resources of
web service provider 110 may be effectively utilized.
[0032] More particularly, after processing or transforming content
of structured documents it may be necessary to generate an output
document comprising this processed content, where the output
document in one or more formats similar or dissimilar to the format
of the original structured document so that an output document may
be provided to a requester. This output document may comprise a
certain order, in other words, the makeup of the output document
(e.g. markup, content, etc.) may need to have a certain order if
the output document is to be valid or usable. Furthermore, it may
be desired to generate the output document quickly, as requestors
or applications may be awaiting the arrival of such an output
document.
[0033] In certain cases, however, the transformation or processing
of content of a structured document may occur, and portions of the
transformation completed, in an order differing from an order
corresponding with the output document, and the transformation of
content corresponding to multiple structured documents may occur
substantially simultaneously. Thus, it is desirable to both
associate transformed content with an output document (which may,
in turn, correspond to a particular original structured document)
and to assemble transformed or processed content for an output
document in an order corresponding with an output document as that
content is processed or transformed. By associating processed
content with an output document, and assembling content for an
output document according to an order of the output document
substantially as it is generated, output documents may be assembled
quickly, and many output documents may be assembled substantially
simultaneously.
[0034] Attention is now directed to embodiments of systems and
methods for an architecture for the efficient generation of output
associated with a transformation process applied to a structured
document. Embodiments of the present invention may provide an
output generator which comprises hardware circuitry, for example a
hardware processing device such as an Application Specific
Integrated Circuit (ASIC), for generating output from processed
content of a structured document. In other words, embodiments of
the present invention may provide hardware with the capability to
process content resulting from the transformation of the content of
a structured document to provide an output stream (e.g.
substantially without the use of software). This hardware may
utilize data structures storing the resulting content, some of
which may reflect the structure of a desired output document. The
content in these data structures may then be formatted and output
according to a desired structure, format or encoding for an output
document.
[0035] More specifically, embodiments of the present invention may
provide an output generator circuit operable to allocate data
structures corresponding with an output document as a
transformation process in applied to a structured document. Content
resulting from the transformation of the content of a structured
document (which will be collectively referred to herein as
transformed content) may be placed in these data structures in an
order which does not conform to the desired order of the output
document. In other words, data from the transformation process of
an original structured document may be placed in these data
structures as it is transformed or otherwise processed, resulting
in the filling of these data structures in an arbitrary order (e.g.
an order that conforms with the processing of the content of the
structured document and not an order of the output document).
Consequently, the output generator may ascertain when data
corresponding with the order of the output document has been placed
in these data structures, and as data conforming with the order of
the output document is placed in these data structures it may be
formatted according to a format of a desired output document and
output corresponding with an output document may be generated
substantially as the data corresponding with that output document
is generated and before the generation of all the transformed
content for the output document has been completed. Furthermore,
the process of filling data structures associated with an output
document, formatting this data and outputting portions of the
output document may occur substantially simultaneously across
multiple output documents, thus multiple output documents, in
multiple formats may be created from one or more original
structured documents substantially in parallel.
[0036] Moreover, by allowing dynamic creation of output structures,
the filling of these output structures as the content of a
structured document is transformed, and the processing of this
transformed content to form an output stream corresponding to an
output document, the processing of a structured document to create
an output document may be made substantially more efficient, as the
operations of transforming the structured document can be
accomplished without concern to the order of these operations or
the order desired for an output document. As the transformed
content is generated for the output document, however, it may be
formatted into an output stream according to the order of a desired
output document. Thus, portions of an output document resulting
from a transformation of a structured document may be output even
before the transformation of the original structured document has
completed processing.
[0037] Embodiments of the output generator of the present invention
may have a set of logical units. One of the logical units may be
responsible for allocating data structures for an output document
and placing data in these data structures, another logical unit may
be operable to traverse these data structures to obtain references
to data in the data structures corresponding with an output
document, another logical unit may locate the transformed content
associated with those references and format the transformed content
according to a format for the output document to generate a
character stream comprising a portion of an output document which
may then be encoded by another logical unit into a suitable
character encoding format.
[0038] One particular embodiment of an output generator is depicted
in FIG. 2. Output generator 350 may have one or more interfaces 202
through which output generator 350 can receive a request for the
creation of a data structure in memory 270 corresponding with an
output document or content to place in a data structure
corresponding to an output document. Utilizing the data structures
in memory 270, output generator 350 outputs a character stream
corresponding to one or more output documents, or messages
associated with the transformation of a structured document,
through output interface 204.
[0039] Output generator 350 comprises a set of logical units 212,
222, 232, 242. Order control unit 212 is operable to receive
requests to create data structures corresponding to an output
document in memory 270, allocate these data structures and return a
reference or handle to these data structures to the requester.
Additionally, order control unit 212 is operable to receive data to
be placed in a data structure in memory 270 and place this data in
the appropriate data structure in memory 270. Order control unit
212 may signal (e.g. send a command, known as a walk event) to
output walker 222 when data associated with an output document has
arrive which may be output. (e.g. when data for an output document
has been placed in a data structure where substantially all
transformed content for that output document which would precede
the data in the output document has been, or is in the process of,
being output).
[0040] Output walker 222 is operable to receive these walk events
and traverse data structures in memory 270 to locate data
corresponding to the output document which is to be output and pass
one or more commands to value extractor 232 indicating the location
of this data in data structures in memory 270.
[0041] Value extractor 232 may locate and retrieve data referenced
by commands from output walker 222 and markup the data according to
a format of the output document (e.g. XML, HTML, text, etc.) to
form a character stream. Thus, value extractor 232 may add or
remove characters to the data referenced by command(s) to format
the data according to a format for the output document.
[0042] Output formatter 242 may take this character stream and
transcode each of these characters from an internal encoding scheme
to an encoding scheme according to a desired character encoding of
the output document. Thus, data may arrive at output formatter 242
in an internal encoding scheme and be converted to one of a variety
of encoding schemes such as Unicode Transformation Format (UTF),
Big Endian, Little Endian, International Standards Organization
(ISO) 8859, Windows (Win) 1252, etc. to produce an output stream
corresponding to at least a portion of an output document which may
be delivered through output interface 204.
[0043] While it should be understood that embodiments of the
present invention may be applied with respect to producing an
output document associated with the transformation of almost any
structured document (e.g. a document having a defined structure
that can be used to interpret the content) whether the content of
the original document is highly structured (such as an XML
document, Hypertext Markup Language (HTML) document, .pdf document,
word processing document, database, etc.) or loosely structured
(such as a plain text document whose structure may be, e.g., a
stream of characters), it may be useful to illustrate one
particular embodiment of an output generator in conjunction with an
architecture for transforming XML or other structured documents
utilizing a set of transformation instructions for the XML document
(e.g. a stylesheet). While this illustration of such an exemplary
architecture uses one embodiment of an output generator such as
that described herein it will be apparent that, as discussed above,
embodiments of an output generator may be utilized in a wide
variety of other architectures and may be applied to generate
output documents with or without the use of transformation
instructions, preparsed data, etc.
[0044] Attention is now directed to an architecture for the
efficient transformation or processing of structured documents in
which an embodiment of an output generator may be utilized.
Embodiments of the architecture may comprise an embodiment of the
aforementioned output generator along with other logical components
including a pattern expression processor, a transformation engine
and a parser, one or more of which may be implemented in hardware
circuitry, for example a hardware processing device such as an
Application Specific Integrated Circuit (ASIC) which comprises all
the above mentioned logical components, including the output
generator.
[0045] More particularly, transformation instructions associated
with a structured document may be compiled to generate instruction
code and a set of data structures. The parser parses the structured
document associated with the transformation instructions to
generate data structures representative of the structured document.
The pattern expression processor (PEP) identifies data in the
structured document corresponding to definitions in the
transformation instructions. The transformation engine transforms
the parsed document according to the transformation instructions
and the output generator assembles this transformed data into an
output document.
[0046] Turning to FIG. 3, a block diagram for the transformation of
structured documents using embodiments of the present invention is
depicted. A structured document may be received at a web service
112 from a variety of sources such as a file server, database,
internet connection, etc. Additionally, a set of transformation
instructions, for example an XSLT stylesheet, may also be received.
Document processor 210 may apply the transformation instructions to
the structured document to generate an output document which may be
returned to the requesting web service 112, which may, in turn,
pass the output document to the requester.
[0047] In one embodiment, compiler 220, which may comprise software
(i.e. a plurality of instructions) executed on one or more
processors (e.g. distinct from document processor 210) may be used
to compile the transformation instructions to generate data
structures and instruction code in memory 270 for use by document
processor 210. Document processor 210 may be one or more ASICs
operable to utilize the data structures and instruction code
generated by compiler 220 to generate an output document.
[0048] FIG. 4 depicts a block diagram of one embodiment of an
architecture for a document processor operable to produce an output
document from a structured document. Document processor 210
comprises Host Interface Unit (HIU) 310, Parser 320, PEP 330,
Transformation Engine (TE) 340, Output Generator (OG) 350, each of
which is coupled to memory interface 360, to Local Command Bus
(LCB) 380 and, in some embodiments, to one another through signal
lines or shared memory 270 (e.g. a source unit may write
information to be communicated to a destination unit to the shared
memory and the destination unit may read the information from the
shared memory), or both. Shared memory 270 may be any type of
storage known in the art, such as RAM, cache memory, hard-disk
drives, tape devices, etc.
[0049] HIU 310 may serve to couple document processor 210 to one or
more host processors (not shown). This coupling may be
accomplished, for example, using a Peripheral Component
Interconnect eXtended (PCI-X) bus. HIU 310 also may provide an
Applications Programming Interface (API) through which document
processor 210 can receive jobs. Additionally, HIU 310 may interface
with LCB 380 such that various tasks associated with these jobs may
be communicated to components of document processor 210.
[0050] In one embodiment, these jobs may comprise context data,
including a structured document, data structures, and instruction
code generated from transformation instructions by the compiler.
Thus, the API may allow the context data to be passed directly to
HIU 310, or, in other embodiments, may allow references to one or
more locations in shared memory 270 where context data may be
located to be provided to HIU 310. HIU 310 may maintain a table of
the various jobs received through this API and direct the
processing of these jobs by document processor 210. By allowing
multiple jobs to be maintained by HIU 310, these jobs may be
substantially simultaneously processed (e.g. processed in parallel)
by document processor 210, allowing document processor 210 to be
more efficiently utilized (e.g. higher throughput of jobs and lower
latency).
[0051] Parser 320 may receive and parse a structured document,
identifying data in the structured document for PEP 330 and
generating data structures comprising data from the structured
document by, for example, creating data structures in shared memory
270 for use by transformation engine 340 or output generator
350.
[0052] PEP 330 receives data from parser 320 identifying data of
the structured document being processed and compares data
identified by the parser 320 against expressions identified in the
transformation instructions. PEP 330 may also create one or more
data structures in shared memory 270, where the data structures
comprises a list of data in the structured document which match
expressions.
[0053] Transformation engine 340 may access the data structures
built by parser 320 and PEP 330 and execute instruction code
generated by compiler 220 and stored in memory 270 to generate data
for an output document. In some embodiments, one or more
instructions of the instruction code generated by compiler 220 may
be operable to be independently executed (e.g. execution of one
instruction does not depend directly on the result of the output of
the execution of another instruction), and thus execution of the
instruction code by transformation engine 340 may occur in
substantially any order.
[0054] Output generator 350 may assemble the results generated by
transformation engine 340 in an order corresponding to an output
document to form one or more character streams corresponding to an
output document. The output document may then be provided to the
initiating web service 112 through HIU 310, for example, by
signaling the web service 112 or a host processor that the job is
complete and providing a reference to a location in memory 270
where an output document exists, or by streaming the output
document as it is produced.
[0055] Moving now to FIG. 5, an example application of one
embodiment of a document processor to an XML document and an XSLT
stylesheet is illustrated. It is noted that, while the description
herein may include examples in which transformation instructions
are applied to a single source document, other examples may include
applying multiple sets of transformation instructions to a source
document (either concurrently or serially, as desired) or applying
a set of transformation instructions to multiple source documents
(either concurrently with context switching or serially, as
desired). Generally, an XML document is a structured document which
has a hierarchical tree structure, where the root of the tree
identifies the document as a whole and each other node in the
document is a descendent of the root. Various elements, attributes,
and document content form the nodes of the tree. The elements
define the structure of the content that the elements contain. Each
element has an element name, and the element delimits content using
a start tag and an end tag that each include the element name. An
element may have other elements as sub-elements, which may further
define the structure of the content. Additionally, elements may
include attributes (included in the start tag, following the
element name), which are name/value pairs that provide further
information about the element or the structure of the element
content. XML documents may also include processing instructions
that are to be passed to the application reading the XML document,
comments, etc.
[0056] An XSLT stylesheet is a set of transformation instructions
which may be viewed as a set of templates. Each template may
include: (i) an expression that identifies nodes in a document's
tree structure; and (ii) a body that specifies a corresponding
portion of an output document's structure for nodes of the source
document identified by the expression. Applying a stylesheet to a
source document may comprise attempting to find a matching template
for one or more nodes in the source document, and instantiating the
structures corresponding to the body of the matching template in an
output document.
[0057] Again, while XSLT stylesheets may be used in one example
herein of transformation instructions, generally a "transformation
instructions" may comprise any specification for transforming a
source document to an output document, which may encompass, for
example, statements indented to identify data of the source
document or statements for how to transform data of the source
document. The source and output documents may be in the same
language (e.g. the source and output documents may be different XML
vocabularies), or may differ (e.g. XML to pdf, etc.).
[0058] Referring still to FIG. 5, an XML document and an associated
XSL stylesheet may be received by web service 112. Web service 112
may invoke embodiments of the present invention to transform the
received document according to the received stylesheet. More
specifically, in one embodiment, compiler 220 may be used to
compile the XSL stylesheet to generate data structures and
instruction code for use by document processor 210. Compiler 220
may assign serial numbers to node identifiers in the stylesheet so
that expression evaluation may be performed by document processor
210 by comparing numbers, rather than node identifiers (which would
involve character string comparisons).
[0059] Compiler 220 may also store a mapping of these node
identifiers to serial numbers in one or more symbol tables 410 in
memory 270. Additionally, compiler 220 may extract the expressions
from the stylesheet and generate expression tree data structures in
memory 270 to be used by the document processor 210 for expression
matching (e.g. one or more parse-time expression trees 420
comprising expression nodes). Still further, compiler 220 may
generate an instruction table 430 in memory 270 with instructions
to be executed for one or more matching expressions. The
instructions in the instruction table 430, when executed by
document processor 210, may result in performing the actions
defined when an expression associated with the instruction is
matched. In some embodiments, the instructions may comprise the
actions to be performed (i.e. there may be a one-to-one
correspondence between instructions and actions). The compiler may
also generate whitespace tables 440 defining how various types of
whitespace in the source document are to be treated (e.g.
preserved, stripped, etc.), an expression list table 450, a
template list table 460 and one or more DTD tables 462 to map
entity references to values or specify default values for
attributes.
[0060] At this point, processing of the source document by document
processor 210 may begin. Parser 320 receives the structured
document and accesses the symbol tables 410, whitespace tables 440,
or DTD tables 462 in memory 470 to parse the structured document,
identify document nodes, and generate events (e.g. to identify
document nodes parsed from the document) to PEP 330. More
particularly, parser 320 converts node identifiers in the source
document to corresponding serial numbers in the symbol tables 410,
and transmits these serial numbers as part of the events to the PEP
330. Additionally, parser 320 may generate a parsed document tree
470 representing the structure of the source document in memory.
Nodes of the parsed document tree may reference corresponding
values stored in one or more parsed content tables 472 created in
memory by parser 320.
[0061] PEP 330 receives events from the parser 320 and compares
identified document nodes (e.g. based on their serial numbers)
against parse-time expression tree(s) 420 in memory 270. Matching
document nodes are identified and recorded in template or
expression match lists 480 in memory 270.
[0062] Transformation engine 340 accesses the template or
expression match lists 480, the parsed document tree 470, the
parsed content tables 472 or the instruction table 430. The
transformation engine 340 executes instructions from the
instruction table 430 in memory 270. These instructions may be
associated with one or more expressions. Transformation engine 340
may execute the instructions on each of the document nodes that
matches the expression associated with the expression.
Transformation engine 340 may request the construction of one or
more output data structures from output generator 350 and send
commands to output generator 350 requesting that data resulting
from the execution of these instructions be stored in one or more
of these data structures.
[0063] This may be illustrated more clearly with reference to FIGS.
6 and 7. FIG. 6 illustrates one embodiment of logical interfaces
between transformation engine 340 and output generator 350. In one
embodiment, transformation engine 340 may comprise a set of
application engines, an event generator, a hardware accelerator or
other logical units which are operable to process events associated
with the transformation instructions generated by compiler 220. The
execution of these events may result in a request to generate a
data structure (e.g. output table) from a logical unit of the
transformation engine to the output generator 350, a request for a
reference to a created data structure or a command to place data in
a location in a previously created data structure. More
particularly, in one embodiment, two buses 610, 620 may facilitate
communications between transformation engine 340 and output
generator 350. Bus 610 is a bi-directional bus allowing logical
components of transformation engine 340 to send a request (e.g.
command) that a data structure be built by output generator 350,
and that a reference, or handle, to that data structure be returned
to transformation engine 340. The request sent by a logical
component of transformation engine 340 may, in one embodiment, be
received and processed by order control unit 212.
[0064] Bus 620 may be a unidirectional bus which allows logical
components of transformation engine to send a request to output
generator 350 to place data in a previously created data structure.
Requests received on bus 620 may be placed at the tail of FIFO
queue 622 associated with order control unit 212. Order control
unit 212 obtains requests from the head of queue 622 and processes
these requests.
[0065] The communications between transformation engine 340 and
output generator 350 may be better explained with reference to FIG.
7, which illustrates one embodiment of these communications in more
detail. Transformation engine 340, or logical units thereof, may
send requests 712 over bus 610, where request 712 is a request to
build an output table with a number of entries and return a handle
to that output table. Order control unit 212 of output generator
350 may receive request 712, create the requested output table in
memory 270 and return a reference (e.g. memory handle) to that
output table to transformation engine 340 through communication 714
over bus 610.
[0066] During the execution of transformation instructions, then,
if a logical component of transformation engine 340 desires to put
data corresponding to an output document (which may be referred to
as output document data) in an entry of an output table in memory
270, the logical component may issue one or more requests 722 over
bus 620 to output generator 350, where the request includes a
reference to a particular output table, and the data to place in
the output table. As mentioned with respect to FIG. 6, these
requests 722 may be placed in FIFO queue 622 associated with order
control unit 212, which may process these requests in the order
they are placed in queue 622.
[0067] In one embodiment, requests 712 and 722 may serve to create
and fill output tables associate with one or more output documents.
For each output document at least one root table may be requested
by transformation engine 340 and allocated by output generator 350,
where the root table comprises a number of entries.
[0068] Linked tables for each output document may also be
constructed in substantially the same manner, and these link tables
linked to entries in the root table such that the entry in the root
table references the linked table and the linked table references
the next entry (i.e. the entry in the root table following the
entry that references the linked table) in the root table (tables
linked to the root table may be referred to as first level linked
tables). It will be apparent that linked tables may, in turn, be
linked to other linked tables to create a set of linked data
structures of arbitrary depth, for example a link table (e.g.
second level linked table) may be created and linked to an entry in
a first level linked table, such that the entry in the first level
table references the linked table and the second level linked table
references the next entry in the first level table.
[0069] Thus, by initiating the construction of various output
tables, and setting the entries of the various output tables,
including the linking of the various output tables, transformation
engine 340 may serve to create a set of linked output tables in
memory 270 representative of the structure of an output document
and comprising, or referencing, at least a portion of the content
comprising the output document through communication with order
control unit 212. However, from the above description it may be
gleaned that these output tables may be neither created nor filled
in the same order as the output document (i.e. data belonging at
one place in the output document may be placed in an output table
before data which will come before it in the document, etc.).
[0070] To allow an output document to be output in a streaming
manner, as data associated with the output document is generated by
transformation engine 340, order control unit 212 may generate a
walk event to output walker 222 to indicate an output table or
entry in an output table has been completed. This command may be
placed in a walk event queue for output walker 222, which obtains
walk events from this queue. Output walker 222 may traverse data
structures in memory 270 based on these walk events to assemble
commands to value extractor 232 operable produce output which may
be in an order corresponding with the order of a desired output
document. When output walker 222 has reached the end the output
tables for a particular output document, output walker 222 may send
a command signaling that an output document has been completed.
[0071] FIG. 8 illustrates one embodiment of order control unit 212
and output walker 222 in more detail. Order control unit 212
creates, maintains and updates output data structures 810 in memory
270, as discussed above. As order control unit 212 generates and
updates output data structures 810, order control unit 212 may
generate events to output walker 222 identifying content of an
output document which is available to be output, where previous
data of that output document may have been (or may be in the
process of) being output, and place these events in a queue for
output walker 222. This event may identify an entry in an output
data structures 810, such that output walker 222 can begin
traversing the output data structures 810 at that entry, and
continue traversing the output data structures 810 until entries,
tables or other structures 810 which have not been completed or
filled are encountered. The next walk event can then be obtained
from the walk event queue.
[0072] In one embodiment, a set of bits may be associated with
entries in each output data structure 810 and are utilized by order
control unit 212 and output walker 214 to generate events and
traverse the output tables 810. This may be better explained with
reference to FIG. 9 which depicts one embodiment of one output data
structure 810. Output structure 810a may be a table comprising a
set of entries 920, each entry having two bits 912, 914. One bit
912 is a valid bit which may be set when a value of the entry is
written. Progress bit 914 may be set by either order control unit
212 or output walker 222. When a valid bit 912 or progress bit 914
of an entry is written by order control unit both bits may be
checked and, if both nits 912, 914 are set, a walk event may be
generated to output walker 222.
[0073] Output walker 222 may then traverse output tables 810a
starting with the entry 920 referenced in the walk event. Thus, the
entry 920 may comprise a value, which may be a reference to another
memory location in a previously created data structure 820 (e.g. a
data structure created by compiler 220, parser 320 or PEP 330)
comprising content from a corresponding original structured
document. Output walker 222 may output a command and an associated
reference corresponding to this entry 920 to value extractor 232,
at which point output walker 222 may set the progress bit 914 of
the next entry 920 in the output structure 810a, and, if both bits
associated with the entry 920 are set, this entry 920 may too be
processed by output walker 222. If both bits 912, 914 of the next
entry 920 are not set, output walker 222 may obtain the next walk
event from the walk event queue.
[0074] After reading the above description, however, it may be
noticed that an entry of an output structure 810 may reference
another output structure 810 (and entries in this linked output
structure 810 may, in turn, reference another linked output
structure 810, etc.). Thus, if the referenced entry is a link to
another output structure 810, output walker 222 may process each of
the entries of the linked output structure 810 (and any output
structures 810 referenced by entries in this linked output
structure 810, etc.) until each output structure 810 at a deeper
level than the original entry has been processed. During this
traversal process, output walker 222 may output one or more
commands and associated references corresponding to the traversed
entries in output structures 810.
[0075] A particular example may be helpful in illustrating the use
of progress bit 914 and valid bit 912. In one embodiment, when
output structure 810a is created by order control unit 212,
progress bit 914 and valid bit 912 corresponding with each entry
may be cleared. Initially, order control unit 212 may receive a
command from transformation engine 340 to set entry 920a to a
value. Order control unit 212 may set the value of entry 920a,
valid bit 914a and progress bit 912a, and since both valid bit 912a
and progress bit 914a are set, a walk event with the address of
entry 920a may be sent to output walker 222. Output walker 222 may
receive this walk event and process entry 920a, after which output
walker 222 may then set progress bit 914b associated with entry
920b. However, as valid bit 912b is not yet set, output walker 222
may stop traversing output table 810a and obtain another walk event
from the walk event queue.
[0076] Suppose now that order control unit 212 sets the value of
entries 920b and 920c (setting valid bits 912b and 912c associated
with entries 920b and 920c). Order control unit 212 may then check
both valid bit 912b and progress bit 914b, and since both valid bit
912b and progress bit 914b are set, a walk event with the address
of entry 920b may be sent to output walker 222. Output walker 222
may receive this walk event and process entry 920b, after which
output walker may then set progress bit 914c associated with entry
920c. Output walker 22 may then checks progress bit 914c and valid
bit 912c, determine that entry 920c should be processed, and
process entry 920c. While the above description illustrates one way
to order the processing of output structures 810 by output walker
222, it will be apparent that other method of ordering entries in
output structures 810 may be utilized, such as a registers which
hold address of entries 920 of output structures 810, etc.
[0077] When processing each of the entries 920 of an output
structure 810, output walker 222 may generate a command for value
extractor 232 based on the entry 920 being processed. In one
embodiment, a command for value extractor 232 comprises an opcode
derived from a type of entry 920 in the output table 810 being
processed and associated information for the entry 920. This
associated information may comprise one or more memory references
or serial numbers which correspond to values associated with that
entry. In certain cases, a memory reference or serial number
corresponding to a value may be obtained by accessing one or more
data structures 820 previously created in memory 270 during
previous processing of the original structured document, or the
transformation of the structured document (e.g. by parser 320, PEP
330, TE 340, etc.). These data structures 820, may include a data
structure 820 representing the structure of the original document
and comprising values, or serial numbers corresponding to values,
of the original structured document or data structures 820
comprising a list of nodes of the original document, etc., as
discussed previously. To obtain data from these data structures
820, output walker 222 may traverse one or more of these data
structures 820 in conjunction with the processing of an entry 920
of an output structure 810. This traversal of data structures 820
may be done by a logical unit, which may be separate hardware
circuitry, of output walker 222.
[0078] In one embodiment, while processing entries 920 in output
structures 810, output walker 222 may encounter an entry which
corresponds to a message associated with a set of transformation
instructions associated with the output document being assembled.
This message may correspond to a transformation instruction
indicating that a message is to be generated to the output during
the transformation or processing of a structured document (e.g.
xsl:message command). If output walker 222 encounters such an entry
920 in an output table, and this entry has been completed, output
walker 222 may substantially immediately transmit a command to
value extractor 232 to output the host bound message to HIU 310
before continuing traversal of data structures 810.
[0079] Value extractor 232 may receive commands from output walker
222, and based on these commands may retrieve content or values
from memory 270 and generate text output corresponding with an
output document. More particularly, value extractor 232 may utilize
the reference or serial number associated with a command received
from output walker, and may access data structures in memory 270
utilizing the reference or serial number to obtain a character
string (e.g. the value of content for the output document). Markup
may be added to the character string (i.e. appended to the
character string, prepended to the character string or inserted
within the character string) depending on the desired type of the
output document, indention of the character stream accomplished,
attributes of the output document merged and the resulting
character string streamed or output to output formatter 242.
[0080] A block diagram for one embodiment of value extractor 232 is
depicted in FIG. 10. As detailed above, value extractor 232 may
receive commands from output walker 222 and produce a stream of
characters for an output document to output formatter 242. In one
embodiment, value extractor 232 comprises a five stage logic
pipeline, which may be implemented substantially completely in
hardware. This logic pipeline may comprise decode stage 1010,
compare stage 1020, update stage 1030, fetch stage 1040, markup
stage 1050 and, in one embodiment, comprise control logic 1060
operable to control the operation of stages 1010, 1020, 1030, 1040,
1050. Each stage 1010, 1020, 1030, 1040, 1050 may perform specific
functions to ensure that well-formed constraints corresponding with
a type of the output document (e.g. XML, HTML, text, etc.) are
placed on the character stream generated for the output document.
In case of violations the transformation may either terminated or
continues depending upon severity of the error. In one embodiment,
an error may be reported to HIU 310 in such a situation.
[0081] More particularly, decode stage 1010 retrieves commands from
a queue where commands from output walker 222 may have been placed.
These commands may comprise an opcode and, in some cases,
associated data such as the type of the output document, reference
to a location in memory 270, or a serial number corresponding with
a value in a data structure 820 in memory 270. The decode stage
1010 may initiate a fetch of a value associated with the memory
reference or serial number and generate a command for fetch stage
1040. Additionally, decode stage 1010 may generate a command for
compare stage 1020 to obtain data to identify if a particular node
is unique, or a command for update stage 1030 to update one or more
data structures in memory 270. Decode stage 1010 may stall until
results are received in response commands issued to compare stage
1020.
[0082] Compare stage 1020 may be operable to identify unique or
duplicate nodes in the output document or merge attributes in the
output document based on commands received from decode stage 1010.
These nodes may include elements, attributes, namespaces, etc. To
determine if a node is a duplicate or unique, references or serial
numbers associated with nodes may be compared with data structures
820 or 1032 corresponding to the node type being analyzed. The
results of the comparison performed by compare stage 1020 may be
passed to the update stage 1030 or the decode stage 1010. More
particularly, compare stage may generate set match flags
(indicating if a node has been matched) for decode stage 1010 and
send commands to update stage 1030.
[0083] Update stage 1030 maintains stacks and tables or other data
structures 1032 used during output generation. It may interface
directly with these memory structures 1032 in memory 270 (e.g.
through memory interface 360) and may provide busy indication when
a data structure 1032 which is being accessed is being fetched or
updated. More particularly, update stage 1030 may maintain stacks
and tables 1032 which may be used for providing output scope
context, generate namespace declarations, and eliminate attribute
duplication. These stacks may include an Element Declaration Stack
(EDS), an Output Namespace Scope Stack (ONSS), and a Exclude
Results Prefix Stack (ERPS). The tables may include an Attribute
Merge Content Table (AMCT). The EDS may be used for determining the
scope range of a namespace. The ONSS is used to determine whether a
new namespace has been declared in the current scope or within an
ancestor element. The AMCT is used to remove duplicate attribute
names and attribute set names.
[0084] Fetch stage 1040 receives commands from decode stage 1010.
These commands may comprise an opcode, the type of the output
document and a pointer or reference to a location in memory. Based
on the opcode of a command, fetch stage 1040 may retrieve the
character strings that make up node names, targets, values, text
etc. using pointer values supplied in the command, and insert
markup characters for the appropriate type of the output document.
Pointers associated with a received command may reference data
structures 810 or 820 comprising a set of tables with the character
strings comprising content of the output document, and fetch stage
1040 may obtain these character strings by accessing these data
structures 810, 820 using the reference associated with the
command. Fetch stage 1040 may insert markup characters into, or
append or prepend markup characters to, the character string,
before sending the character string to markup stage 1050.
[0085] Markup stage 1050 may receive the character string and check
to make sure the character string is well-formed according to a
standard for the type of the output document, and, if possible, fix
any errors detected. If it is not possible to fix the error
according to a standard for the type of the output document markup
stage 1050 may generate an error, for example to HIU 310. Markup
stage 1050 may also perform output escaping and character escaping
on this character stream, based on the type of the output document
associated with this character stream.
[0086] Thus, a character stream comprising at least a portion of an
output document, corresponding to a command received at value
extractor 233 and corresponding to a type of the output document
(e.g. HTML, XML, text, etc.) may be produced by stages 1010, 1020,
1030, 1040, 1050. Before outputting this character stream, however,
value extractor 232 (e.g. markup stage 1050) may perform output
escaping and character escaping on this character stream, based on
the type of the output document associated with this character
stream. As the escaping is performed, then, this stream of
characters may be delivered to output formatter.
[0087] FIG. 11 depicts one embodiment of value extractor 232 and
output formatter 242. As just discussed, value extractor 232
produces a stream of characters associated with a particular output
document to output formatter 242. Output formatter 242 may receive
this stream of characters and convert each of these received
characters from an internal format or representation (e.g. UTF-16)
to an encoding desired for the characters of an output document
(e.g. UTF-8, WIN 1252, Big Endian, Little Endian, etc.).
[0088] As the characters in the character stream are converted
then, output generator 350 may return each of the characters to the
HIU 310 by associating each of the characters with a job ID and
placing the character in a FIFO queue for reception by HIU 310.
Thus, output streams comprising portions of different output
documents may be interleaved in the output of output generator 350.
In other words, a portion of one output document may be output by
output generator 350 followed by a portion of another output
document before, followed by another portion of the first output
document. By tagging each of the output streams with a job ID, HIU
310 may receive these various portions of output documents and
assemble the portions into a complete output document, or may
output each portion of an output document to a host substantially
as it arrives.
[0089] In the foregoing specification, the invention has been
described with reference to specific embodiments. However, one of
ordinary skill in the art appreciates that various modifications
and changes can be made without departing from the scope of the
invention as set forth in the claims below. Accordingly, the
specification and figures are to be regarded in an illustrative
rather than a restrictive sense, and all such modifications are
intended to be included within the scope of invention. For example,
it will be apparent to those of skill in the art that although the
present invention has been described with respect to a protocol
controller in a routing device the inventions and methodologies
described herein may be applied in any context which requires the
determination of the protocol of a bit stream.
[0090] Benefits, other advantages, and solutions to problems have
been described above with regard to specific embodiments. However,
the benefits, advantages, solutions to problems, and any
component(s) that may cause any benefit, advantage, or solution to
occur or become more pronounced are not to be construed as a
critical, required, or essential feature or component of any or all
the claims.
* * * * *