U.S. patent application number 09/884230 was filed with the patent office on 2001-11-22 for automatic query and transformative process.
This patent application is currently assigned to Vignette Corporation. Invention is credited to Nasr, Roger I., Webber, Neil.
Application Number | 20010044794 09/884230 |
Document ID | / |
Family ID | 22462540 |
Filed Date | 2001-11-22 |
United States Patent
Application |
20010044794 |
Kind Code |
A1 |
Nasr, Roger I. ; et
al. |
November 22, 2001 |
Automatic query and transformative process
Abstract
A computer-implemented method of retrieving information in a
first markup language through a query engine and presenting the
information in any required markup language. A user inputs a query
and may invoke a number of transformative sequences. These
sequences contain a markup language pattern and an action, which
may include transforming the tags in the first markup language to
tags in a different markup language. The appropriate transformative
sequence is selected and the pattern from the transformative
sequence is compiled. The compiled pattern is used to perform rapid
and efficient searches of documents in the database. A predicate
check using the binary coding of the node as well as ancestor
information confirms the node. The leaf information associated with
a confirmed node is then stored. If necessary, the action from the
transformative sequence is applied to change the markup language of
the leaf information to that of the user.
Inventors: |
Nasr, Roger I.; (Austin,
TX) ; Webber, Neil; (Round Rock, TX) |
Correspondence
Address: |
GRAY, CARY, WARE & FREIDENRICH LLP
1221 SOUTH MOPAC EXPRESSWAY
SUITE 400
AUSTIN
TX
78746-6875
US
|
Assignee: |
Vignette Corporation
|
Family ID: |
22462540 |
Appl. No.: |
09/884230 |
Filed: |
June 19, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09884230 |
Jun 19, 2001 |
|
|
|
09134263 |
Aug 14, 1998 |
|
|
|
6263332 |
|
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.004; 707/999.005; 707/E17.006 |
Current CPC
Class: |
Y10S 707/99935 20130101;
Y10S 707/956 20130101; Y10S 707/99933 20130101; G06F 16/258
20190101; Y10S 707/99943 20130101; Y10S 707/99945 20130101 |
Class at
Publication: |
707/4 ;
707/5 |
International
Class: |
G06F 017/30 |
Claims
We claim:
1. A query engine for searching for an electronic document,
comprising: (a) means for accepting a query; (b) means for
compiling the query into query engine instructions, such
instructions including tags and attributes instructions; (c) means
for conducting a partial search for nodes based on tags and
attributes; (d) means for conducting a predicate check on the nodes
found by the partial search to ensure that the node conforms to the
query; and (e) means for generating an output based on the
predicate check.
2. The query engine of claim 1 wherein the means for generating an
output includes storing the search in a continuation state.
3. The query engine of claim 2 wherein the means for compiling is
reactivating the continuation state.
4. The query engine of claim 1 wherein the predicate check
comprises the analysis of at least the following: (a) the
identification of the immediate parent of each node; (b) the
absolute depth of each node; and (c) binary encoding.
5. The query engine of claim 1 wherein, in response to at least one
successful result obtained by the means for conducting a predicate
check, the means for generating an output is activated.
6. The query engine of claim 5 wherein the means for conducting a
predicate check is reactivated by the user after each output.
7. The query engine of claim 1 further comprising a means for the
user to conduct another partial search after the predicate check
through the means for conducting a partial search.
8. The query engine of claim 1 further comprising a means for the
user to conduct another partial search after a generated output
through the means for conducting a partial search.
9. The query engine of claim 1 further comprising a means for
conducting a complete search of the nodes based on tags and
attributes.
10. The query engine of claim 9 wherein, if the engine is searching
a document in memory, the query engine implements the means for
conducting a complete search of the nodes of document
in-memory.
11. A method of searching, comprising: accepting a query; compiling
the query into query engine instructions, such instructions
including tags and attributes instructions; conducting a partial
search for nodes based on tags and attributes; conducting a
predicate check on the nodes found by the partial search; and
generating an output based on the predicate check.
12. The method of searching of claim 11 wherein the partial search
is based on tags and attributes of the nodes.
13. The method of searching of claim 11 wherein the partial search
is conducted on a document parse tree.
14. The method of claim 11 wherein during the step of conducting a
predicate check if a result fulfilling the query is determined,
said result is moved to the next step of generating an output.
15. The method of claim 14 wherein when said result is determined,
the rest of the partial search is stored in a continuation
state.
16. The method of claim 15 further comprising the step of
reactivating the partial search in the continuation step after
generating an output.
17. A computer program product for conducting searches on a
database, the computer program product comprising: a computer
usable medium having computer readable program code means embodied
in said medium for searching, said computer readable program code
means comprising; (a) means for accepting a search query (b) means
for compiling the query into query engine instructions; said means
for compiling the query comprising; means for identifying the
language of the query; means for associating the language of the
query with a correct compiler; and means for using the correct
compiler to generate the query engine instructions (c) means for
conducting a partial search for nodes based on tags and attributes;
(d) means for conducting a predicate check on the nodes found by
the partial search; and (e) means for generating an output.
18. The computer program product of claim 17 wherein the means for
generating an output comprises; means for identifying data on the
node; means for identifying the correct compiler for the data;
means for compiling the data into the language of the query; and
means for providing the compiled data to the user.
19. A computer-implemented method of retrieving information in a
first markup language through a query engine comprising: a.
receiving a query in a patterned format b. determining an
appropriate transformative sequence from the query c. compiling a
pattern from the transformative sequence d. assigning the compiled
pattern a keyword; e. searching the databases for nodes with the
keyword; f. performing a predicate check on a keyword node; g.
obtaining the leaf information associated with a checked keyword
node; and h. presenting the leaf information to the user.
20. The method of claim 19 wherein the transformative sequence
comprises: the pattern; and an action, wherein the action defines
changing the leaf information from the first markup language to a
second markup language.
21. The method of claim 20 wherein in presenting the leaf, the
transformative sequence is applied to the leaf information changing
it from the first markup language to the second markup
language.
22. The method of claim 19 wherein performing the predicate check
on the keyword comprises: obtaining selected information of the
keyword node; and comparing selective information to confirm the
keyword node with the search request.
23. The method of claim 22 wherein the selected information is (i)
the identification of the immediate parent of the keyword node,
(ii) the depth of the keyword node, and (iii) the binary coding of
the keyword node
24. The method of claim 23 wherein the binary coding is the binary
code assigned to node clusters, including leaves with a depth of
three nodes.
25. A query and transformative engine for residing on a server
comprising: a. means for determining the browser format; b. means
for accepting query from the browser; c. means for conducting query
search; d. means for transforming results from the query search
into the browser format; and e. means for delivering the
transformed results to the browser.
26. A computer program product, for allowing query and
transformative functions on a server, comprising: a computer
application processable by a computer for causing the server to:
receive a query request from a web browser; process the query
request to identify suitable portions of Web documents; and present
the identified portions to the web browser; and apparatus from
which the computer program is accessible by the computer.
27. The product of claim 26 wherein to process the query requests
comprises: identifying the markup language format of the web
browser; if the web browser markup language is different from the
document markup language, altering the query request to the
document markup language; and conducting the query request in the
document markup language.
28. The product of claim 27 wherein to present the identified
portions to the web browser comprises: obtaining results to the
query request in the document markup language, if the document
markup language is different from the document markup language,
altering the results to the web browser markup format; and
presenting the results in the web browser markup format.
29. The product of claim 27 wherein the document markup language is
XML.
30. The product of claim 27 wherein the web browser markup format
is HTML.
Description
TECHNICAL FIELD
[0001] This patent application is related, in general, to
information retrieval and in particular to a query and
transformative engine applicable to XML documentation.
BACKGROUND
[0002] As society becomes increasingly more computerized and as
greater access is allowed to information stored on computers, it
has become increasingly more important to find such information in
as efficient a manner as possible.
[0003] For example, the development of computerized information
resources, such as the Internet, and various on-line services, such
as Compuserve, America Online, Prodigy, and other services, has led
to a proliferation of electronically available information. In
fact, this electronic information is increasingly displacing more
conventional means of information transmission, such as newspapers,
magazines, and even, television. The World Wide Web consists of a
number of Web sites located on numerous servers, most of which are
accessible through global computer networks. The primary issue in
all of these resources is filtering the vast amount of information
available in order that a user obtain that information of interest
to him and receiving such information in an acceptable format. To
assist in searching information available on the Internet, a number
of search techniques have been devised to find information
requested by the user.
[0004] These search techniques are based upon a node by node
search. When the node does not contain "speech" (defined as
viewable material for the reader), the search will navigate to the
first child of the node and keep on navigating down each node
string until speech is found. By being forced into examining each
node separately, such searches are time and resource consuming.
[0005] In addition, none of these search techniques incorporate a
transformative sequence for adjusting the information to the
requirements of the user.
[0006] There is a need in the art to develop a query system that is
easy to use and intuitive. There is an additional need to combine
such a query engine with a transformative sequence to allow
documents to be presented to users in the format they require.
SUMMARY OF THE INVENTION
[0007] A computer-implemented method of retrieving information in a
first markup language through a query engine and presenting the
information in any required markup language is shown. A user inputs
a query to achieve one of two possible outputs: In the first usage,
a query stands alone and the output of the engine is the
information matching the query. In the second usage, transformative
sequences are combined with queries. These sequences contain a
markup language pattern and an action; the action may include
transforming the tags in the first markup language to tags in a
different markup language. The output of the engine in this second
case is information matching the queries and transformed by the
sequences specified. In either usage, the query is compiled from
its source format into a sequence of instructions for the query
engine. The compiled query is assigned tags and attributes. The
database is then searched node by node for the corresponding tags
and attributes. A predicate check using the binary coding of the
node as well as ancestor and descendant information confirms the
node. The leaf information associated with a confirmed node is then
stored. If necessary, the action from the transformative sequence
is applied to change the markup language of the leaf information to
that of the user.
[0008] A primary object of the invention is to provide a query
engine capable of making partial searches and conducting predicate
checks on such searches.
[0009] Yet another object of the present invention is to provide an
abstract engine with both query and transformative capabilities to
access a document and transform it to a requisite format.
[0010] It is still another object of the invention to provide a
query engine that can produce more than one result on demand.
[0011] It is another object of the invention for the query engine
to be state-preserving so that the engine can reactivate a prior
search.
[0012] An object of the invention is to execute XML tag-level
search and retrieval.
[0013] Furthermore, another object of the invention is to provide
an engine that can both process a query and validate the results
efficiently.
[0014] Still another object of the invention is to provide natural
language query.
[0015] A further object of the invention is for the transformative
engine to present the XML scripted document in HTML, XSML, HDML,
and other presentation formats.
[0016] Another object of the invention is to access XML tag-level
scripting and perform XSL-ready transformation on such
scripting.
BRIEF DESCRIPTION OF THE FIGURES
[0017] For a more complete understanding of the present invention
and the advantages thereof, reference should be made to the
following Detailed Description taken in connection with the
accompanying drawings in which:
[0018] FIG. 1A is a diagram illustrating the prior art
implementation of conducting searches;
[0019] FIG. 1B is a diagram illustrating the implementation of
conducting a search using an abstract engine;
[0020] FIG. 1 is a relationship diagram showing the Query Engine
components;
[0021] FIG. 2 is a detailed flowchart of the Query Engine;
[0022] FIG. 3 is a relationship diagram showing the Query Engine
incorporated into a Transformation Processing Engine;
[0023] FIG. 4 is an illustration a document tree with binary coding
assignments;
[0024] FIG. 5 is a block diagram of a computer network;
[0025] FIG. 6 is an example page of a Web site;
[0026] FIG. 7 is a process for searching and displaying a Web
document; and
[0027] FIG. 8 is an example program of an extensible style language
(XSL) transformation.
DETAILED DESCRIPTION
[0028] In the context of an electronic environment, a document is
stored using a markup language. A markup language defines the
descriptions of the structure and content of different types of
electronic documents. There is a need to be able to search such
electronic documents to obtained needed information. In the prior
art, as shown in FIG. 1A, a single query engine I would not be able
to handle query requests in a number of differing languages. It
would take a number of query engines 1a, 1b, 1c, and 1d receiving
similar search requests 5, in a number of differing languages, 5a,
5b, 5c and 5d, to compile and generate a number of differing
searches, 10a, 10b, 10c, and 10d, in order to obtain a search
result 15. In its preferred embodiment, if the present invention,
as shown in FIG. 1B, received a number of similar search requests
in a number of differing languages, 5a, 5b, 5c, and 5d, it would
compile the search request 20 into the abstract engine language 25
and then have the abstract engine 30 run the search to obtain
search result 15. The advantage is that the abstract engine can
support any number of query languages. The prior art cannot support
a number of query languages and would have to implement separate
search engines for the separate languages. This provides the user
of the abstract engine with a memory advantage. The abstract engine
can be used in a network in an electronic environment or on a
stand-alone console.
[0029] FIG. 1 is a relationship diagram 100 showing the primary
elements of the search engine of the present patent application.
The Query Engine Abstract Machine 140 takes as input The following:
Query Engine Instructions 130 and a Document Parse Tree 150
representation of a document. The query engine instructions tell
the query engine what parts of the document parse tree to select
and return as Query Results 160. In addition to Query Results 160,
the other output of the query engine is the Continuation State 170.
In cases where multiple query results would be produced by the
query engine by following the query engine instructions, the query
engine only produces the first result and outputs the intermediate
engine state as the Continuation State 170. At a later time, the
Continuation State may be supplied back to the engine to cause it
to resume operation at the saved state and produce the next
result.
[0030] FIG. 2 is a flowchart 200 showing the query engine in more
detail. The process can start with a new query, or with the
Continuation State of a previous query. There are two different
paths 210 for these two cases. If this is a new query, the user
inputs a Query 211 in one of the Query Languages understood by the
engine. A typical query might look like:
<title>under<chapter>under<play
name="hamlet">
[0031] Such a typical query would, for example, be addressed at a
electronic database containing the works of a number of authors.
The objective of the query is to find all the chapter title
headings for any plays entitled "Hamlet."
[0032] As noted earlier, the engine can support any number of query
languages, because the processing steps are the same for all
languages, this description uses "L" as a generic variable
indicating any query language understood by the engine.
[0033] The engine compiles the query language into query engine
instructions 220. In the next step 221, specific tag names and
attributes are attached to the instructions as required to
correctly describe the query. In the example query shown above, the
tags are <title>, chapter> and <play>, "name" is an
attribute name, and "hamlet" is an attribute value. An initialized
query engine internal state is then created at step 222.
[0034] If instead of being a new query this is a resumption of a
previously run query, the query is resumed using the Continuation
State 212 from the previously processed query. The appropriate
query engine internal state is then reactivated 230.
[0035] In either the new or resumed query case, the engine now
determines 240 if the user desires to search documents in a
relational database, or in memory.
[0036] When searching a relational database, the engine performs a
coarse search 250 of the database, executing query engine
instructions and looking for matches based on the
tags/attributes/values assigned to the instructions in step 221.
This produces a candidate list of possible matches for the query.
In this search, the engine does not search the entire database, but
rather stops once it has accumulated a partial set of results. This
method is more efficient because it allows the query engine to use
less memory when searching. For illustrative purposes, FIG. 4 shows
an example of a document tree 400. As the search engine travels
from node to node of the document tree, the search engine
determines whether the contents of the node may partially fulfill
the search requirement based on the coarse search criteria 251.
This is determined based only on the tags and attributes in the
instructions obtained during the compilation 221. In this
particular example, the tag is <title>. For example, in FIG.
4, there are multiple instances of <title> 402 and 409.
During the coarse search the search engine may find any of these
<title> nodes based on a tag match. However, <title>
node 402 will be checked (as explained later) and discarded because
it is not a <title> under a <chapter> under a
<play>; instead, it is a <title> directly under a
<play> 401. The search engine will continue its search until
it encounters node 409, which satisfies all the tag and attribute
criteria and additionally satisfies the predicate checks, as will
be described later. The text information to node 409 is "Prologue"
which is the leaf information 412.
[0037] If no candidates at all are found 251, the engine is
finished 298 and no more results are returned. Otherwise, the
candidate list is further refined using predicate checks 252,
details of which will be described later. If the refinement finds
no matching candidates 253, then the engine returns to the database
and searches for additional candidates 250.
[0038] If the refinement finds a match 253, the engine is ready to
generate its two outputs: the Query Results 271 and the
Continuation State 270. As noted earlier, the Continuation State
describes the current state of the engine, so that a later
invocation may resume the search at the point where the current
operation left off. For example, in FIG. 4, the search engine can
return the correct <title> node 409 as well as any additional
<title> nodes found under the Chapter nodes 405 and 406
(which are not fully elaborated in the FIGURE). The first result
will be presented first, and the user indicates when to resume
processing 280, at which time the entire process begins again at
step 230, with the Continuation State supplied as input 212.
[0039] Returning to step 240, the other method of searching is for
documents that are not stored in a relational database and instead
are contained completely in memory. These documents can be searched
much more efficiently than database documents, and so the query
engine uses a different path. A simplified search for the proper
query results is performed 260 on the document directly in memory.
As with the database case, only the first results are used. If no
results are found 265, the query engine is finished. Otherwise, the
engine proceeds directly to create the Continuation State 270 and
the query results 271.
[0040] The benefits of the tag, attribute, and attribute value
checking mechanism is that it provides a less memory intensive
manner of conducting a query since the search is merely looking for
simple word associations as opposed to placement of the node in
relation to other nodes. This partial checking mechanism 250 allows
a much more efficient implementation when searching documents
stored in a relational database or in any non-memory resident form,
which is important for large documents. To complete the search
query, however, the engine must refine the coarse results to
eliminate incorrect matches such as the case of a <title> 402
directly under a <play> 401. This requires a descendant
predicate check. Typically, such a check on a number of documents
and a large number of nodes would consume a great deal of time and
resources, especially in an electronic environment. It therefore
becomes preferable to devise a constant time method to determine if
an element is a descendant of another. The preferred embodiment is
a unique binary encoding mechanism and corresponding descendant
predicate algorithm to perform such a predicate check operation. In
order to determine whether node A is a descendant of node B, this
operation will require three pieces of information (1) the
identification of the immediate parent, (2) the absolute depth of
the node, and (3) binary encoding.
[0041] To explain the preferred embodiment of the binary coding
mechanism used by the query engine, the following terms must be
defined: newcode(), subtree depth, and absolute depth.
[0042] C=newcode(Cp) creates a new binary code, C, from the code,
Cp, of the parent, P. The new code must have the property that for
any two nodes, A and B, with codes Ca=code of node A and Cb=code of
node B, the following relationship
(Ca & Cb)=Cb
[0043] where "==" indicates equality, and "&" indicates bitwise
binary AND is true IF AND ONLY IF node A is a descendant of B,
"descendant" being meant in the most general sense, not limited
only to immediate descendants.
[0044] The subtree depth of a tag node is defined as follows:
[0045] the subtree depth of a leaf tag, meaning a tag node with no
descendants (only its own value node), is zero.
[0046] the subtree depth of a node, P, with immediate descendants
D1, D2, . . . is equal to the maximum subtree depth of any
descendant, plus 1.
[0047] FIG. 4 illustrates the assignment of subtree depths notated
as "sd=" in the Figure. Note that subtree depths are only assigned
to tags, not to their values.
[0048] The absolute depth of a node is defined as follows:
[0049] the absolute depth of the root of the tree is zero.
[0050] the absolute depth of any node, D, with parent P, is equal
to the absolute depth of the parent node, plus 1.
[0051] Given these definitions, the method used by the query engine
for assigning codes to a tree is as follows:
[0052] 1) Assign code zero to the root node.
[0053] 2) Start with the children of the root node, descend the
tree in depth-first, left-to-right order.
[0054] 3) For each node visited, N, with parent P and parent's code
Cp:
[0055] 3a) If the subtree depth of N is greater than 2 then assign
a new code, Cn=newcode(Cp) to this node N.
[0056] 3b) If the subtree depth of N equals 2 then assign a new
code, Cn=newcode(Cp) to this node N, and all descendants of N,
recursively.
[0057] 3c) If the subtree depth of N is less than 2 and this is the
first subtree of depth less than 2 encountered under parent P, then
assign a new code Cpshared=newcode(Cp) to serve as a "shared code"
for this parent. Then assign Cpshared as the code for N, and all
descendants of N.
[0058] 3d) If the subtree depth of N is less than 2 and this is not
the first subtree of depth less than 2 encountered under parent P,
then a code, Cpshared, for parent P already exists. Assign Cpshared
as the code for N, and all descendants of N.
[0059] This method results in codes being assigned such that:
[0060] All nodes in any single subtree of subtree depth 2 or less
share a single common code generated as a new code based on the
parent's code. This is illustrated as the circled nodes 430 in FIG.
4.
[0061] Furthermore, in a collection of related subtrees of depth 1
or 0, being related by having a common parent, all nodes in those
subtrees share a single common code generated as a new code based
on the common parent's code. This is illustrated as the circled
nodes 440 in FIG. 4.
[0062] Using these encoding procedures allows the element encodings
to be presented as packets of information nearly a factor of 100
times smaller than prior techniques since each node will not
require separate binary numbers, thereby improving speed and
performance during the searches.
[0063] FIG. 3 is a relationship diagram 300 showing the query
engine incorporated into a transformative sequence processor. The
user will supply a transformative sequence 310 in the form of an
XSL specification. XSL is a standard in development by the World
Wide Web Consortium (W3C). FIG. 8 is an example of an XSL
transformation specification. First, the XSL tag is defined 800.
Within the XSL tag, a rule tag is defined 810. The rule tag is
composed of two elements, a Pattern 820 and an Action 830. The
Pattern defines a set of items at which the transformative function
implements the Action. In FIG. 8, the Pattern is defined as a title
tag 840 when it occurs under a chapter tag 850, which itself occurs
a book tag 860, should be transformed into an <H4> tag 870,
when a document (or subdocument) containing it is rendered.
[0064] Note that XSL specifications may contain multiple rules,
patterns, and actions; in this simple example only one rule with
one pattern and one action is shown.
[0065] Referring back to FIG. 3, the XSL specification 310 is
compiled by Query Compiler 320 into Query Engine Instructions 330.
During compilation, only the Pattern 520 of the XSL rule is
compiled. In FIG. 8, the Pattern is compiled with the
<title>tag 840 becoming a tag value in the query engine
instruction as previously described for step 221 in FIG. 2.
[0066] The Action 830 of the XSL transformation rule is not
compiled during this sequence, and instead is supplied directly 335
to the transformative engine 380, along with the compiled query
engine instructions 330.
[0067] The transformative engine consists of a Query Engine
Abstract Machine 340 and a Rendering Algorithm 345. The
Continuation State 370 produced by the query engine abstract
machine is also held within the transformative engine.
[0068] The transformative engine uses the query engine to determine
which nodes match the patterns in the XSL specification. As
incremental results are supplied by the query engine, the
transformation engine applies the appropriate matching
transformation actions (830) to the query engine results.
WORLD WIDE WEB EXAMPLE
[0069] An example of the preferred embodiment of the query and
transformation sequence can be viewed in the context of the World
Wide Web and the various markup languages that are associated with
the Web although other embodiments cover with the non-networked
computer databases. A `web browser` is traditionally defined as a
computer program which supports the displaying of documents,
presently most of which include Hypertext Markup Language (HTML)
formatting markup tags (discussed further below), and hyperlinking
to other documents, or phrases in documents, across a network. In
particular, web browsers are used to access documents across the
Internet's World Wide Web. The discussion of the present invention
defines both `web browser` and `browser` to include browser
programs that enable accessing hyperlinked information over the
Internet and other networks, as well as from magnetic disk, CD-ROM,
or other memory, and does not limit web browsers to just use over
the Internet. A number of web browsers are available, some of them
commercially. Any viewer of the World Wide Web will typically use a
web browser. Indeed, a viewer viewing documents created by the
present invention normally uses a web browser to access the
documents that a database provider may make available on the
network. Web browsers allow clicking on "hot areas" (generated by
source anchors containing a document reference name and a hyperlink
to that document so that clicking on the hot area causes the
specified document to be downloaded over the network and displayed
for the viewer). Most web browsers s also maintain a history of
previously used source anchors and display a hot area which allows
hyperlinking back to the database provider's home page (or back
through the locations the viewer has previously "visited") so the
viewer can always go back to a familiar place.
[0070] A viewer and a server, which is where web documents are
contained, communicate using the functionality provided by
Hypertext Transfer Protocol (HTTP). The Web includes all the
servers adhering to this standard which are accessible to clients
via Uniform Resource Locators (URL's). For example, communication
can be provided over a communication medium. In some embodiments,
the client and server may be coupled via Serial Line Internet
Protocol (SLIP) or TCP/IP connections for high-capacity
communication. The web browser is active within the client and
presents information to the user.
[0071] One way of organizing information on the Internet in order
to minimize download time has been to provide users with an
overview interface, called a `home page,` to the information.
Although a home page is often merely used as a visually interesting
trademark, the home page typically contains a key topic summary of
the information provided by one author or database provider, and
hyperlinks that take a viewer to the information the viewer has
chosen.
[0072] A `hyperlink` is defined as a point-and-click mechanism
implemented on a computer which allows a viewer to link (or jump)
from one screen display where a topic is referred to (called the
`hyperlink source`), to other screen displays where more
information about that topic exists (called the `hyperlink
destination`). These hyperlinked screen displays can be portions of
the media data (media data can include, e.g., text graphics, audio,
video, etc.) from a single data file, or can be portions of a
plurality of different data files; these can be stored in a single
location, or at a plurality of separate locations. A hyperlink thus
provides a computer-assisted way for a human user to efficiently
jump between various locations containing information.
[0073] Finally, to support the Internet and the World Wide Web, a
markup language called Hypertext Markup Language (HTML) was
developed. HTML has two major objectives. First, HTML provides a
way to specify the structural elements of text (e.g., this is a
heading, this is a body of text, this is a list, etc.) using tags
which are independent of the content of the text. A web browser
uses these tags to format the displayed text for the particular
display device of a particular viewer. So, for example, HTML allows
an author to specify up to six levels of heading information
bracketed by six different heading-tag pairs. Applications (e.g.,
web browsers) on different computers then process the HTML
documents for visual presentation in a manner customized for
particular display devices. An application on one computer could
display a level 1 heading as 10 point bold Courier while an
application on another computer could display it as a 20 point
italic Times Roman. A level 1 sequence is heralded with the
sequence token </h1>. Thus, a heading might be displayed
as:
<h1> This is a level 1 heading </h1>
[0074] for a level one heading or
<h4> this is a level 4 heading </h4>
[0075] for a level 4 heading. As a markup language, HTML enables a
document to be displayed within the capabilities of any particular
display system even though that display system does not support
italic, or bold, color, or any particular typeface or size. Thus
HTML supports writing documents so they can be output to everything
from simple monospaced, single-size fonts to proportional-spaced,
multiple-size, multiple-style fonts. Each computer program that
accesses an HTML document can translate that HTML document into a
display format supported by the hardware running the program.
[0076] On the World Wide Web, the documents being generated are
typically done in HTML. HTML defines hypertext structure within
basic limits. It allows a programmer to define a link but it does
not allow for differentiation between links or sublinks. An HTML
document cannot be parsed into a multi-stage tree. In addition,
differing tags cannot be defined in HTML, which reduces its
flexibility.
[0077] These limitations to HTML are presently being addressed. One
of the options is the Standard Generalized Markup Language
("SGML"). HTML can actually be viewed as a subset of SGML. SGML
defines a language for use with presenting any form of information.
However, SGML presents so many options for defining tags and
presenting information that it is very difficult to use in
standardizing a way for defining and presenting documents and their
contents.
[0078] The difficulties in using SGML have led to the development
of a hybrid, which would contain the advantages of SGML and HTML.
This new language for establishing documents on the World Wide Web
is the "Extensible Markup Language" (known as "XML"), which is
termed extensible because it is not a fixed format like HTML. XML
is designed to allow the use of SGML on the World Wide Web but with
some limitations on the options that SGML provides. Basically, XML
allows a programmer to devise his or her own set of markup
elements. XML documents can be accessed through DTD or DTD-less
operations. DTD is usually a file, which contains a formal
definition of a particular type of document. This sets out what
names can be used for elements, where they may occur and how they
all fit together. Basically DTD is a formal language that allows
the processors to parse a document and define the interrelations of
the elements within an XML document. However, an XML document has
additional flexibility since it can define its own markup elements
by the existence and location of elements where created thereby
allowing DTD-less reading. Pure SGML documents typically would
require a DTD file to assist in the translation.
[0079] Even for XML documents, the reader must have the ability to
efficiently find and retrieve more information about any particular
item in a document. Presently, the query engines that exist for XML
are comparatively slow. As noted earlier, these search engines rely
on a node by node search ("node travel") of an XML document that
consists of examining the nodes. If the node has a leaf with the
requested information, the engine will access the information. If
the node does not have the information, the search will then move
down to the node child and perform the same analysis. This type of
search is time-consuming. In addition, these search engines do not
have the capability to accept directions from non-XML compatible
web browsers or present the information in a format compatible to
such a web browser.
[0080] FIG. 5 is a block diagram of a system, indicated generally
at 700, according to the illustrative embodiment. System 500
includes a Transmission Control Protocol/Internet Protocol (TCP/IP)
network 510, a real media server computer 512 for executing a real
media server process and a web server computer 516 for executing a
Web server process. Web server 516 contains multiple web site
518a-n, as shown in FIG. 5.
[0081] Moreover, as shown in FIG. 5, each of servers 512, 514 and
516 is coupled through TCP/IP network 510 to each of clients 502,
504, 506 and 508. Through TCP/IP network 510, information is
communicated by servers 512, 514 and 516, and by clients 502, 504,
506 and 508 to one another.
[0082] Clients 502, 504, 506 and 508 are substantially identical to
one another. Client 502 is a representative one of clients 502,
504, 506 and 508. Client 502 includes a user 520, input devices
522, media devices 524, speakers 526, a display device 528, a print
device 530 and a client computer 532. Client computer 532 is
connected to input devices 522, media devices 524, speakers 526,
display device 528 and print device 530. Display device 528 is, for
example, a conventional electronic cathode ray tube. Print device
530 is, for example, a conventional electronic printer or
plotter.
[0083] User 520 and client computer 532 operate in association with
one another. For example, in response to signals from client
computer 530, display device 528 displays visual images, and user
520 views such visual images. Also, in response to signals from
client computer 532, print device 530 prints visual images on
paper, and user 520 views such visual images. Further, in response
to signals from client computer 532, speakers 526 output audio
frequencies, and user 520 listens to such audio frequencies.
Moreover, user 520 operates input devices 522 and media devices 524
in order to output information to client computer 532, and client
computer 532 receives such information from input devices 522 and
media devices 524.
[0084] Input devices 522 include, for example, a conventional
electronic keyboard and a pointing device such as a conventional
electronic "mouse", rollerball or light pen. User 520 operates the
keyboard to output alphanumeric text information to client computer
532, and client computer 532 receives such alphanumeric text
information from the keyboard. User 520 operates the pointing
device to output cursor-control information to client computer 532,
and client computer 532 receives such cursor-control information
from the pointing device.
[0085] User 520 operates media devices 524 in order to output
information to client computer 532 in the form of media signals,
and client computer 532 receives such media signals from media
devices 524. Media signals include for example video signals and
audio signals. Media devices 524 include, for example, a
microphone, a video camera, a videocassette player, a CD-ROM
(compact disc, read-only memory) player, and an electronic scanner
device.
[0086] A web browser typically is loaded onto a client computer and
is launched by the client computer when accessing the World Wide
Web. The web browser is used for accessing Web sites 518 (a-n)
through the web server 516.
[0087] The advantages of a web browser on a network such as the
Internet is that any of the documents viewed with the program may
be located (or scattered in pieces) on any computer connected to
network 500. The viewer can use a mouse 522, or other pointing
device, to click-on a hot area, such as highlighted text or a
button, and cause the relevant portion of the referenced document
to be downloaded to the viewer's computer 532 for viewing. These
downloaded documents in turn can contain hyperlinks to other
documents on the same or other computers. `Downloading` is defined
as the transmitting of a document or other information from the
database provider 518 over a network 500 to the viewer's computer
532.
[0088] As noted earlier, information is presented to World Wide Web
viewers as a collection of `documents` and `pages`. As mentioned
above, a `document` is defined in a broad sense to indicate text,
pictorial, audio, video and other information stored in one or more
computer files. Viewing such multimedia files can be much like
watching television. Documents include everything from simple short
text documents to large computer multi-media databases.
[0089] A `page` is defined as any discrete file, which can be
downloaded as a single download segment. Technically, a web browser
does not recognize or access documents per se, but instead accesses
pages. Typically, a web browser downloads one page as the result of
clicking on a hot area. A page often has several source anchors
with hyperlinks to various other pages or to specific locations
within pages.
[0090] One problem with accessing documents over the Internet is
that many documents are quite long, and thus can take quite some
time to download over the network. This means that viewers are
often reluctant to access a document unless they know it will be
useful. FIG. 6 shows the typical information available at a web
site. A web site 600 might contain a number of internal lines 610
and/or sections with multiple pages. The presentation of text and
or graphics 620 on a web site 600 is defined by a markup language.
A page is thus a document, which contains a portion of a source
document.
[0091] FIG. 7 shows a process for displaying/searching a web
document using a web browser. A session typically commences when
the HTTP server detects a request for a client connect. After
connection, a simple query can be implemented through the web
browser. In the prior art, such a query would usually just include
a term to be found in the Web document. Then, the requested page,
typically the home page, is displayed on the client browser. As
noted above, the client and server may be coupled via a TCP/IP
connection. Active within the client 532 is the web browser 510,
which establishes the connection with the web server 516. The web
server 516 executes the corresponding server software which
presents information to the client in the form of HTTP responses
720. The HTTP responses correspond to Web pages represented using
markup language. In this embodiment, the markup language is XML.
The web browser will activate the search engine 730 on the web
server.
[0092] The XML versions of articles are searched for the presence
of specified search terms, if the web browser is compatible. If the
web browser is not compatible, the XML results are converted to a
compatible format. The XML results of these search requests can
then be displayed on the client's console.
[0093] The transformative process on a server is called a
server-side transformation. If the browser is XML/XSL-enabled MS
IE4 is an example, then server-side transformations need not be
implemented on the server since the browser has XML/XSL
capabilities. If the browser is not XML/XSL-enabled, and there are
commands that can be provided to transform information, then
server-side transformation is implemented. As a matter of fact,
there may be multiple transformation (XSL) specifications for a
variety of formats on each server. The server will enable the
appropriate XSL specification given the available browser
information; i.e., if the browser is not XML-enabled but is CSS
(cascading style sheets)-enabled, the server-side transformations
using the "CSS" XSL specification will be implemented, and if the
browser is not even CSS-enabled then a "raw HTML" XSL specification
can be used, and so forth.
[0094] These capabilities are very "back end" oriented, in the
sense that they constitute implementation details of commands on
the server, as opposed to having graphical manifestation on the GUI
of the client computer. The following is an example of the
transformation and query process using the following XML
document:
[0095] <MYDOC>
[0096] <SEC>
[0097] Section 1 content . . .
[0098] <PAR>
[0099] Paragraph 1 content . . .
[0100] </PAR>
[0101] <PAR>
[0102] Paragraph 2 content . . .
[0103] </PAR>
[0104] etc.
[0105] <SEC>
[0106] <SEC color-blue>
[0107] Section 2 content . . .
[0108] etc.
[0109] <SEC>
[0110] </MYDOC>
[0111] The corresponding example query expressions are:
[0112] "<SEC>(1) WHERE (COLOR="BLUE") UNDER
<MYDOC>"
[0113] which fetches the first section whose color attribute is
blue and which is located under MYDOC . . . and
[0114] "<PAR>(2) 2 LEVELS UNDER <MYDOC>"
[0115] which fetches the second paragraph, which must be exactly
two levels under MYDOC.
[0116] Therefore, in a preferred server side embodiment, the server
does not have to depend on XML DTDs with the preferred query and
transformative engine in order to present information to a user
either in an HTML, XML or other markup format.
[0117] In such a preferred embodiment, the XML query and
transformative engine is located on the server to perform
server-side tranformations. The XML and query engine allows
XML/XSL-enabled browsers to access the XML documents on the server,
whereas those browsers not enabled with XML will have the XML
documents on the server transformed into a presentation format
acceptable by the browser.
[0118] This is a unique approach, which allows a Web site user to
have control of the content through their queries, and based on the
user's browser and client computer. This server side embodiment
therefore allows for access to XML documents for many of the web
browsers on the market.
[0119] Again, referring back to FIG. 4, which depicts the potential
tree ordering of an XML document. In this tree, each leaf contains
presentable material. Each individual leaf is defined as a child of
a certain number of branches. These branches are labeled as tags.
The title for the play Hamlet would be a leaf. The Hamlet leaf
would be child of the "Title" branch of the "Play" branch.
Therefore, a user requesting a search for the title of the play
[<title> under <play>] would receive the term Hamlet in
node 408 and would not receive the term Prologue from node 412. The
convenience of XML is that it is able to allow a user to define a
number of its own tags and therefore categorize leafs with a
greater level of detail.
[0120] The implementation of XML documents on a Web site does lead
to a number of potential problems. With HTML as the primary
language of use on Web sites and with a majority of web browsers,
many users with such browsers will not be able to access
information coded in XML.
[0121] In order to allow such access by HTML based web browsers, a
transformative sequence is integrated with the query engine so that
based on the web browser used to access the Web site, a certain
transformative sequence will be implemented. The transformative
sequence will then access a set of extensible Style Language (XSL)
transformative rules that will establish the display for the XML
information into the necessary format.
[0122] It should be appreciated by those skilled in the art that
the specific embodiments disclosed above may be readily utilized as
a basis for modifying or designing other methods for carrying out
the same purposes of the present invention. It should also be
realized by those skilled in the art that such equivalent
constructions do not depart from the spirit and scope of the
invention as set forth in the appended claims.
* * * * *