U.S. patent application number 13/491459 was filed with the patent office on 2012-11-29 for accessing deep web information using a search engine.
This patent application is currently assigned to DEEP WEB, LLC. Invention is credited to John R. Kasha, JR., Michael Z. Moricz.
Application Number | 20120303607 13/491459 |
Document ID | / |
Family ID | 46150493 |
Filed Date | 2012-11-29 |
United States Patent
Application |
20120303607 |
Kind Code |
A1 |
Moricz; Michael Z. ; et
al. |
November 29, 2012 |
ACCESSING DEEP WEB INFORMATION USING A SEARCH ENGINE
Abstract
Methods, apparatuses, and articles for receiving a search
request associated with a transportation service from a client
device, the search request including a plurality of search
criteria, are described herein. Additionally, the methods,
apparatuses, and articles further return to the client device an
answer page having a plurality of answers potentially associated
with the transportation service, the plurality of answers
identifying a plurality of information locations having information
potentially associated with at least a one of the plurality of
search criteria, where at least one of the answers includes at
least one input field of a query answer page for entry of at least
one feature of the transportation service, the query answer page to
be dynamically generated by one of the information locations in
response to a query.
Inventors: |
Moricz; Michael Z.;
(Bellevue, WA) ; Kasha, JR.; John R.; (North
Potomac, MD) |
Assignee: |
DEEP WEB, LLC
Woodinville
WA
|
Family ID: |
46150493 |
Appl. No.: |
13/491459 |
Filed: |
June 7, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12768621 |
Apr 27, 2010 |
|
|
|
13491459 |
|
|
|
|
11595090 |
Nov 8, 2006 |
7716303 |
|
|
12768621 |
|
|
|
|
10397992 |
Mar 25, 2003 |
7693956 |
|
|
11595090 |
|
|
|
|
60366817 |
Mar 25, 2002 |
|
|
|
Current U.S.
Class: |
707/711 ;
707/E17.11 |
Current CPC
Class: |
G06F 16/951 20190101;
G06F 16/955 20190101; G06F 16/9535 20190101 |
Class at
Publication: |
707/711 ;
707/E17.11 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method comprising: storing, by one or more computing servers,
a first index effectively indexing a first query answer web page to
be dynamically generated by a first website responsive to a first
received query having a first set of associated query parameters,
the first index to be provided to a client in response to the
client submitting a set of search criteria; receiving from a client
device, by the one or more computing servers, a search request
submitting the search criteria; dynamically generating in response,
by the one or more computing servers, a search answer web page
comprising said first index; and returning, by the one or more
computing servers, the search answer web page to the client
device.
2. The method of claim 1, wherein the method further comprises
processing, by the one or more computing servers, a proxy web page
specification corresponding to the first query answer web page to
be dynamically generated by the first website responsive to the
first query having the first set of associated search criteria, to
create the first index.
3. The method of claim 2, wherein the proxy web page specification
comprises the first query and the set of search criteria,
identified as keywords of the proxy web page specification, the
first query comprises a location of the first website and the first
set of associated query parameters, and the created first index
comprises the first query.
4. The method of claim 2, wherein the proxy web page specification
comprises the first query and the set of search criteria,
identified as keywords of the first proxy web page specification,
the first query comprises a location of the first website, with the
first set of associated query parameters to be separately provided
to the first website, and the created first index comprises a proxy
request for a proxy server to submit the first query and its first
set of associated query parameters to the first website
accordingly.
5. The method of claim 4, wherein the proxy server comprises the
first set of associated query parameters, and instructions to
pre-fill a form associated with the query, and post the form to the
first website, on request.
6. The method of claim 1, wherein the method further comprises
storing, by the one or more computing servers, a second index
effectively indexing a second query answer web page to be
dynamically generated by a second website responsive to a second
received query having a second set of associated query parameters,
the second index to be provided to a client in response to the
client submitting the set of search criteria; and said dynamically
generated search answer web page further comprises said second
index.
7. The method of claim 1, wherein the method further comprises
processing, by the one or more computing servers, a first proxy web
page specification corresponding to the first query answer web page
to be dynamically generated by the first website responsive to the
first query having the first set of associated search criteria, to
create the first index, and processing, by the one or more
computing servers, a second proxy web page specification
corresponding to the second query answer web page to be dynamically
generated by the second website responsive to the second query
having the second set of associated query parameters, to create the
second index.
8. The method of claim 7, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website and the first set of associated query parameters, the
created first index comprises the first query, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website and the second set of associated query parameters, and the
created second index comprises the second query.
9. The method of claim 7, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website and the first set of associated query parameters, the
created first index comprises the first query, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website, with the second set of associated query parameters to be
separately provided to the second website, and the created second
index comprises a proxy request for a proxy server to submit the
second query and its second set of associated query parameters to
the second website accordingly.
10. The method of claim 7, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website, with the first set of associated query parameters to be
separately provided to the first website, and the created first
index comprises a first proxy request for a first proxy server to
submit the first query and its first set of associated query
parameters to the first website accordingly, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website, with the second set of associated query parameters to be
separately provided to the second website, and the created second
index comprises a proxy request for a proxy server to submit the
second query and its second set of associated query parameters to
the second website accordingly.
11. The method of claim 10, wherein the first and second proxy
server are the same proxy server.
12. The method of claim 6, wherein the first and second websites
are the same website.
13. The method of claim 1, wherein the method further comprises
automatically selecting, by the one or more computing servers, the
first index to obtain the first query answer web page; and
including, by the one or more computing servers, said first query
answer web page with said search result web page in lieu of said
first index.
14. An apparatus comprising: storage medium having stored therein a
first index effectively indexing a first query answer web page to
be dynamically generated by a first website responsive to a first
received query having a first set of associated query parameters,
the first index to be provided to a client in response to the
client submitting a set of search criteria, and a web server to
receive from a client a search request submitting the search
criteria, dynamically generate in response a search answer web page
comprising said first index, and return the search answer web page
to the client; and at least one processor coupled to the storage
medium to execute the web server.
15. The apparatus of claim 14, wherein the storage medium further
has stored therein an index generator to process a proxy web page
specification corresponding to the first query answer web page to
be dynamically generated by the first website responsive to the
first query having the first set of associated search criteria, to
create the first index, and the at least one processor is further
employed to execute the index generator.
16. The apparatus of claim 15, wherein the proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the proxy web page
specification, the first query comprises a location of the first
website and the first set of or associated query parameters, and
the created first index comprises the first query.
17. The apparatus of claim 15, wherein the proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website, with the first set of associated query parameters to be
separately provided to the first website, and the created first
index comprises a proxy request for a proxy server to submit the
first query and its first set of associated query parameters to the
first website accordingly.
18. The apparatus of claim 14, wherein the storage medium further
has stored therein a second index effectively indexing a second
query answer web page to be dynamically generated by a second
website responsive to a second received query having a second set
of associated query parameters, the second index to be provided to
a client in response to the client submitting the set of search
criteria; said search answer web page dynamically generated by the
web server further comprises said second index.
19. The apparatus of claim 14, wherein the storage medium further
has stored therein an index generator to process a first proxy web
page specification corresponding to the first query answer web page
to be dynamically generated by the first website responsive to the
first query having the first set of associated search criteria, to
create the first index, and process a second proxy web page
specification corresponding to the second query answer web page to
be dynamically generated by the second website responsive to the
second query having the second set of associated query parameters,
to create the second index; and the at least one processor is
further employed to execute the index generator.
20. The apparatus of claim 19, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website and the first set of associated query parameters, the
created first index comprises the first query, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website and the second set of associated query parameters, and the
created second index comprises the second query.
21. The apparatus of claim 19, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website and the first set of associated query parameters, the
created first index comprises the first query, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website, with the second set of associated query parameters to be
separately provided to the second website, and the created second
index comprises a proxy request for a proxy server to submit the
second query and its second set of associated query parameters to
the second website accordingly.
22. The apparatus of claim 19, wherein the first proxy web page
specification comprises the first query and the set of search
criteria, identified as keywords of the first proxy web page
specification, the first query comprises a location of the first
website, with the first set of associated query parameters to be
separately provided to the first website, and the created first
index comprises a first proxy request for a first proxy server to
submit the first query and its first set of associated query
parameters to the first website accordingly, the second proxy web
page specification comprises the second query and the set of search
criteria, identified as keywords of the second proxy web page
specification, the second query comprises a location of the second
website, with the second set of associated query parameters to be
separately provided to the second website, and the created second
index comprises a proxy request for a proxy server to submit the
second query and its second set of associated query parameters to
the second website accordingly.
23. The apparatus of claim 14, wherein the web server is further
designed to automatically select the first index to obtain the
first query answer web page; and include said first query answer
web page with said search result web page in lieu of said first
index.
24. In a proxy server, a method of operation comprising: storing,
by the proxy server, a first set of associated query parameters of
a first query of a first website to pre-fill a first form, which
when posted to the first website, causes a first query answer web
page of the first website to be dynamically generated by the first
website for a recipient; receiving, by the proxy server, a first
proxy request from a first client to submit the first query to the
first website for the first client; and pre-filling, by the proxy
server, the first form, and posting, by the proxy server, the first
pre-filled form to the first website for the first client, to cause
the first query answer web page of the first website to be
dynamically generated by the first website, and be provided to the
first client.
25. The method of claim 24, wherein the method further comprises
storing, by the proxy server, a second set of associated query
parameters of a second query of a second website to pre-fill a
second form, which when posted to the second website, causes a
second query answer web page of the second website to be
dynamically generated by the second website for a recipient;
receiving, by the proxy server, a second proxy request from a
second client to post the second query to the second website for
the second client; and pre-filling, by the proxy server, the second
form, and posting, by the proxy server, the second pre-filled form
to the second website for the second client, to cause the second
query answer web page of the second website to be dynamically
generated by the second website, and be provided to the second
client.
26. The method of claim 25, wherein the first and second clients
are the same client.
27. The method of claim 25, wherein the first and second websites
are the same website.
28. An apparatus comprising: storage medium having stored therein a
first set of associated query parameters of a first query of a
first website to pre-fill a first form, which when posted to the
first website, causes a first query answer web page of the first
website to be dynamically generated by the first website for a
recipient; a proxy function to receive a first proxy request from a
first client to submit the first query to the first website for the
first client, and in response to pre-fill the first form, and post
to the first website, for the first client, the first pre-filled
form, to cause the first query answer web page of the first website
to be dynamically generated by the first website, and be provided
to the first client; and at least one processor coupled to the
storage medium to execute the proxy function.
29. The apparatus of claim 28, wherein the storage medium further
has stored therein a second set of associated query parameters of a
second query of a second website to pre-fill a second form, which
when posted to the second website, causes a second query answer web
page of the second website to be dynamically generated by the
second website for a recipient; and the proxy function is further
used to receive a second proxy request from a second client to
submit the second query to the second website for the second
client, and in response, to pre-fill the second form, and post to
the second website, for the second client, the second pre-filled
form, to cause the second query answer web page of the second
website to be dynamically generated by the second website, and be
provided to the second client.
30. The apparatus of claim 29, wherein the first and second clients
are the same client.
31. The apparatus of claim 29, wherein the first and second
websites are the same website.
Description
RELATED APPLICATION
[0001] The present invention is a continuation application of U.S.
patent application Ser. No. 12/768,621, filed 27 Apr. 2010, which
is a continuation of U.S. patent application Ser. No. 11/595,090,
filed 8 Nov. 2006, which is a continuation-in-part application of
non-provisional application Ser. No. 10/397,992, filed Mar. 25,
2003, entitled "Accessing deep web information using a search
engine", which claims priority to provisional application
60/366,817, filed Mar. 25, 2002. The present application claims
priority to said Ser. No. 12/768,621, Ser. No. 11/595,090 and Ser.
No. 10/397,992 non-provisional applications, and said 60/366,817
provisional application, and incorporates their specifications in
their entirety by reference.
FIELD OF THE INVENTION
[0002] The present invention relates to the field of data
processing. More specifically, the present invention is related to
accessing deep web pages potentially associated with transportation
services using a search engine.
BACKGROUND OF THE INVENTION
[0003] Currently, Internet search engines index the information
found on static web pages on public websites. Static web pages are
web pages that exist before and after a user accesses them. A
search engine saves two pieces of information from each static web
page it indexes. The first piece of information is the location of
the web page, commonly referred to as the uniform resource locator
(URL). The second piece of information is one or more keywords from
the text of the web page. A search engine is then, essentially, a
collection of URLs and associated keywords.
[0004] Deep web information, however, is not contained on static
web pages. Instead, it is produced in response to a specific query
by a user and results in dynamic web pages. Search engines
themselves are deep websites. A dynamic web page only exists after
a user's query and disappears after the user exits the dynamic web
page. As a result, deep web information or the dynamic web pages
that can be generated from the deep web information are not
typically available from search engines.
[0005] As a result, a user using a search engine to locate
information, for certain type of information searching, often finds
himself/herself not finding the right information, or having to
re-enter the search criteria, as query parameters of queries to
generate the dynamic web pages having the desired information.
[0006] FIGS. 8a-8d illustrate one example of this problem. For the
illustrated example, a user uses a search service/engine (such as
Goggle) to search for "Flights from Seattle to San Jose". FIG. 8a
illustrates selected portions of an example search answer web page
returned from the search service/engine. An example search answer
web page includes indices and indexing flight deals on Priceline
and Orbitz's websites respectively, and indices and indexing static
promotional pages on Southwest and Alaska Airline's websites
respectively.
[0007] FIG. 8b illustrates the example static web page indexed by
an index. It is essentially a form for facilitating a user to
search for specific flight deals available on Priceline's website.
In other words, because the flight deal web page for flights
between Seattle and San Jose, if available, is dynamically
generated based on flight deal information in a database, the
proper query answer web page is not indexed by the search
service/engine. Consequently it is not available. Only the generic
static root web page (which could have been retrieved with merely
the search criteria of "flight deals") is indexed, resulting in the
user essentially having to start the search over again, once he/she
enters Priceline's website.
[0008] Similar disadvantages exist, if the Orbitz website is
entered via an index.
[0009] FIGS. 8c-8d illustrate the static promotional web pages
indexed by indices. They are merely promotional web pages, and not
the flight schedule or availability web pages that the user is
looking for. Again, the reason is that the flight schedule and
availability web pages are dynamically generated based on the
flight schedule and availability information in the databases of
Southwest and Alaska Airlines.
[0010] FIGS. 9a-9b illustrate the end results that the user
desires. That is, in response to the user's provision to a search
engine, of the search criteria "flights between Seattle and San
Jose", the user is provided with a search answer page having
indices that index not only the relevant static web pages, but also
the relevant dynamic web pages to be generated, such as would be
provided using this invention. Further, on selection of one of such
indices, the user can be taken to the indexed dynamic web page,
with the search criteria preserved, i.e. the from and to cities of
Seattle and San Jose, filled in, without requiring the user to
re-fill or re-select the search criteria again.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The present invention will be described by way of exemplary
embodiments, but not limitations, illustrated in the accompanying
drawings in which like references denote similar elements, and in
which:
[0012] FIG. 1 illustrates network view of a computing environment
suitable for practicing present invention, in accordance with one
embodiment;
[0013] FIG. 2 illustrates an overview of the method of the present
invention, in accordance with one embodiment;
[0014] FIGS. 3a-3b illustrate a number of reconciled queries with
their associated query parameters, of a number of websites, for a
number of example search criteria, created in accordance with the
teachings of the present invention;
[0015] FIGS. 4a-4b illustrate a number of example proxy web page
specifications, created in accordance with the teachings of the
present invention;
[0016] FIGS. 5a-5b illustrate two example indices created using
proxy web page specifications of the present invention;
[0017] FIGS. 6a-6d illustrate a source view and an end-user view
each of two example pre-filled forms for two example queries
employing the POST method to post its associated query parameters
to the corresponding websites;
[0018] FIG. 7 illustrates an example computer system suitable for
use as either a client or to host a proxy function, a search engine
or a web server, to practice the present invention, in accordance
with one embodiment;
[0019] FIGS. 8a-8d illustrate the inadequacies of the prior art in
facilitating a user in accessing deep web information using a
search engine, for an example information search;
[0020] FIGS. 9a-9b illustrate two user desired results of the
example information search of FIG. 8a-8d;
[0021] FIGS. 10a-10d illustrate four example search result pages
associated with car rental service queries, in accordance with an
embodiment of the present invention;
[0022] FIG. 11 illustrates an exemplary fragment of source code of
a search result page, the page associated with a car rental service
query;
[0023] FIGS. 12a-12d illustrate four example search result pages
associated with train reservation service queries, in accordance
with an embodiment of the present invention;
[0024] FIG. 13 illustrates an exemplary fragment of source code of
a search result page, the page associated with a train reservation
service query;
[0025] FIGS. 14a-14d illustrate four example search result pages
associated with flight status information service queries, in
accordance with an embodiment of the present invention; and
[0026] FIG. 15 illustrates an exemplary fragment of source code of
a search result page, the page associated with a flight status
service query.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
[0027] The present invention includes search engines enhanced to
effectively index dynamic web pages, companion proxy servers, and
related methods and apparatuses.
[0028] Part of the description will be presented in terms commonly
employed by those skilled in the art to convey the substance of
their work to others skilled in the art, including but are not
limited to terms such as websites, static and dynamic web pages,
queries, query parameters, query answer pages, search criteria,
search answer web pages and forms. On a computing device, these
logical entities are represented and manipulated in the form of
electrical and/or optical signals.
[0029] In the following description, various aspects of the present
invention will be described. However, it will be apparent to those
skilled in the art that the present invention may be practiced with
only some or all aspects of the present invention. For purposes of
explanation, specific numbers, materials and configurations are set
forth in order to provide a thorough understanding of the present
invention. However, it will be apparent to one skilled in the art
that the present invention may be practiced without the specific
details. In other instances, well-known features are omitted or
simplified in order not to obscure the present invention.
[0030] Various operations will be described as multiple discrete
steps in turn, in a manner that is most helpful in understanding
the present invention, however, the order of description should not
be construed as to imply that these operations are necessarily
order dependent. In particular, these operations need not be
performed in the order of presentation.
[0031] The use of heading and sections are to facilitate
understanding of the present invention, and they are not to be
limiting on the present invention. The phrase "in one embodiment"
is used repeatedly. The phrase generally does not refer to the same
embodiment, however, it may. The terms "comprising", "having" and
"including" are synonymous, unless the context dictates
otherwise.
Overview
[0032] Referring now to FIG. 1, wherein a network view of a
computing environment suitable for practicing the present
invention, is shown. As illustrated, computing environment 100
includes a number of websites 108 offering services and/or
information. In particular, each website 108 includes dynamic web
page generator 118 to generate various dynamic web pages to
facilitate delivery of the services and/or information offered.
[0033] Typically, dynamic web page generator 118 generates the
various dynamic web pages using information stored in various
databases (not shown) of websites 108, which may be co-resident or
remotely disposed from the server(s) hosting websites 108 (but
communicatively coupled). Each dynamic web page is typically
generated in response to a query having a particular set of
associated query parameters. Accordingly, these dynamic web pages
will also be referred to as query answer web pages.
[0034] Further, the query answer web pages may be generated using
Common Gateway Interface (CGI) scripts, in conjunction with
instructions of one or more programming languages (Perl, C, C++,
Basic, Fortran, Pascal, etc.), or a proprietary dynamic web page
generator, such as Microsoft's generated ASP pages.
[0035] For the illustrated embodiment, communication with websites
108 are conducted using the Hypertext Transmission Protocol (HTTP)
over Transmission Control Protocol/Internet Protocol (TCP/IP), and
the dynamically generated web pages are specified using the
Hypertext Markup Language (HTML). For some queries, the associated
query parameters are submitted to websites 108 using a GET method,
i.e. the associated query parameters of the queries are submitted
in conjunction with the corresponding URLs, identifying the
locations of the corresponding websites 108. Whereas, for other
queries, the associated query parameters are submitted to websites
108 using a POST method, i.e. the associated query parameters of
the queries are submitted separately (typically subsequently),
after corresponding connections have been established with the
websites 108, using their corresponding URLs. Thus, in the second
method called the POST method, the URL is disassociated from the
input parameters. The associated input parameters with the URL
would make the resulting URL unique. However, without a unique URL,
that is, without the association of the FORM URL with the specific
input parameters, the completed FORM cannot be indexed by a search
engine or cannot even be bookmarked by a browser.
[0036] In alternate embodiments, the present invention may be
practiced with communication with websites 108 being practiced
using other communication protocols, including but are not limited
to HTTP Secure (HTTPS), File Transfer Protocol (FTP), Secure FTP
(SFTP), Telnet, Secure Shell, Secure Copy (SCP), Goopher, and so
forth.
[0037] Continuing to refer to FIG. 1, computing environment 100
further includes a number of clients 102, and search services 104,
coupled to each other and to websites 108 via network 110. Clients
102, assisted by search services 104, selectively consume the
services and/or information offered by websites 108.
[0038] For the embodiment, clients 102 include browsers 112, and
use browsers 112 to consume the service and/or information offered
by websites 108 in the form of web pages. Clients 102 further
include operating system services (not shown) in support of
browsers 112. Browsers 112, supported by the operating system
services, are equipped to communicate with search services 104 and
websites 108 in one or more of the earlier described protocols.
[0039] In alternate embodiments, browsers 112 and the underlying
operating system services may be fully integrated. In other
embodiments, some functions of browsers 112 and the underlying
operating system services may be offered by an intermediate
"middle" layer of services instead.
[0040] For the embodiment, search services 104 include search
engine 114, web page indices 115 indexing web pages of websites 108
offering services and/or information, and index generator 113 to
generate indices 115. As will be described in more detail below,
under the present invention, web page indices 115 also
advantageously index web pages of websites 108 that are dynamically
generated in response to queries having associated query
parameters.
[0041] Similar to clients 102, search services 104 further include
operating system services (not shown) in support of elements
113-115. Search engine 114 supported by the operating system
services, are equipped to communicate with clients 102 and websites
108 in the appropriate protocols.
[0042] In alternate embodiments, search engine 114 and the
underlying operating system services may be fully integrated. In
other embodiments, some functions of search engine 114 and the
underlying operating system services may be offered by an
intermediate "middle" layer of services instead.
[0043] For the embodiment, search services 104 are complemented by
at least one proxy server 106, which is coupled to websites 108,
clients 102, and search services 104 via network 110. Proxy server
106 includes in particular proxy function 116, and associated query
parameter values 117 for pre-filling query forms to be provided to
their websites 108. More specifically, for the earlier described
HTML and HTTP oriented embodiment, pre-filled query forms
facilitate posting of the associated query parameters to their
websites 108 via the POST method.
[0044] The term "form" as used herein in the specification and in
the claims, in general, refers to an electronic form which may be
one of any one of a number of form types, including but are not
limited to, HTML FORMs, Windows FORMs, ASP generated FORMs, Java
FORMs, Javascript FORMs, directX FORMs, and so forth, as well as
form-like documents and/or data structures/organizations, unless
the context of a particular usage instance specifically narrows the
meaning for the particular usage instance to a particular form
type.
[0045] As illustrated, indices 115 that effectively index the
dynamically generated query answer web pages, and if applicable,
associated query parameter values 117 for pre-filling query forms
are generated in accordance with proxy web page specifications 119
of the dynamically generated query answer web pages. Except for
indices 115 that effectively index the dynamically generated query
answer web pages, associated query parameter values 117 for
pre-filling query forms (if applicable) to facilitate submission of
associated query parameters of queries of websites 108, and proxy
web page specifications 119, clients 102 (including browsers 112),
search services 104 (including search engine 114 and indices 115
that index static web pages), proxy server 106 (including proxy
function 116), websites 108 (including dynamic web page generators
118 and their databases (not shown)), and network 110 represent a
broad range of these respective elements known in the art or to be
designed. In particular, network 110 may include one or more
private and/or public networks, such as the Internet. Their
functions and constitutions are known, accordingly will not be
further described.
[0046] Otherwise, the method of the present invention, proxy web
page specifications 119, indices 115 that effectively index the
dynamically generated query answer web pages, and associated query
parameter values 117 for pre-filling forms (if applicable) to
facilitate submission of associated query parameters of queries of
websites 108, will be described in turn, in the description to
follow.
Initialization/Set up
[0047] FIG. 2 illustrates an overview of a method of the present
invention, in accordance with one embodiment. More specifically,
FIG. 2 illustrates the initialization or set up operations for
setting up search services 104, and if applicable, proxy servers
106, to facilitate clients 102 to access deep web information of
websites 108.
[0048] As illustrated, for each set of search criteria of interest,
block 202, the queries including the associated query parameters,
and the manner they are submitted to websites 108, e.g. whether
through the GET method or the POST method, are identified, block
204. The identified queries are then reconciled, block 206.
[0049] The above operations are best illustrated by a couple of
examples. The first example illustrates how queries of various used
car websites of interest for various used car searches of a
HTML/HTTP based embodiment are identified and reconciled, and the
second example illustrates how queries of various airline websites
of interest for various used flight schedule/availability searches
of a HTML/HTTP based embodiment are identified and reconciled.
Used Car Sites
[0050] Suppose two used car deep web sites 108 are of interest.
These sites are Automall.com and JoesCars.com, and for ease of
understanding, their dynamically generated web pages are assumed to
be HTML based. Moreover, communication with Automall.com and
JoesCars.com are assumed to be HTTP based.
[0051] Further, the queries of their query answer web pages have
similar query parameters. More specifically, Automall.com's query
parameters are car make, car model, and user's zip code.
JoesCars.com's query parameters are car make, car model, user's zip
code and car year or years.
[0052] The first step in identifying and reconciling the queries of
these sites is to find a minimum number of query parameters for
this group of queries. In this example case, JoesCars.com has one
more query parameter, which is the year or years. Since
Automall.com does not have this parameter, it can be eliminated. If
JoesCars.com requires the year or years, the largest range of years
may be selected. For example, if JoesCars.com provides support for
years between 1982 and 2002, then the year range for its query will
always be 1982-2002.
[0053] Choosing a default constant value for the car year leaves
three query parameters for both sites to be specified. This,
however, may be reduced further. If it is assumed that every model
name is unique, then the user may be able to make a query with just
two pieces of information. These are the car model and user's zip
code. If the car make is required by both sites, this will be
provided in the query. The user will not, however, be required to
enter it in the search engine query.
[0054] In various embodiments, during this stage of query
identification and reconciliation, the user's options for query
parameters may also be expanded. For example, if a list exists
which associates a zip code to a city and state, the process may
make plans to allow for a user to substitute a city or state in the
search engine query. The query that is sent to each deep web site
108 will contain the zip code even though the user entered a city
or state. If the list of associated cities and states for zip codes
is used, the minimum number of query parameters for used car sites
is still two. However, the parameters are now car model and user
location.
[0055] Once the minimum number of query parameters is found, the
next step is to identify how the query of each deep web site 108 in
the group is invoked. This involves determining the query syntax
and method of the HTTP request and the number and type of
parameters required. Most of this information can be obtained by
examining the HTML within the FORM statement of the dynamically
generated query answer page. Any additional information that is
required may be found by executing some sample queries.
[0056] Suppose that in the example given, it is found that
Automall.com uses a GET HTTP request method and JoesCars.com uses a
POST request method.
[0057] Furthermore, Automall.com's URL syntax is
http://www.automall.com/used.html?make=chevrolet&model=corvette&zip=22043
[0058] and JoesCars.com's POST parameters are:
[0059] Make, Model, Year and Zip
[0060] The request method and query syntax for the deep web sites
108 are then known. What remains to be understood is how the GET
and POST parameters need to be added. For example, if the make
parameter for Automall.com is entered through a text field, then it
may be insensitive to the case of a make name. However, if the make
parameter is obtained from a pulldown, there is only one way of
representing each make name.
[0061] Assume that Automall.com and JoesCars.com list car make and
model in user pulldowns which translate to HTML SELECT statements.
All of the possible combinations of make and model can then be
obtained from the HTML. Assume that the Year input for JoesCars.com
is a TEXT field and "1982-2002" is an acceptable range syntax.
Finally, assume that both sites accept a five-digit zip code
through a TEXT field. The method for invoking the deep web sites is
then well understood.
[0062] After defining the minimum set of query parameters and
understanding the methods of invocation of each deep web site 108,
the next step is to develop a list of site queries and parameter
values.
[0063] For example, an excerpt from Automall.com's list is
illustrated in FIG. 3a. The list contains the query URL followed by
a set of query parameter values that a user might enter as a set of
search criteria.
[0064] Note that the query URL contains the car make, but the set
of query values does not. This is because the minimum set of query
parameters was chosen. Note also that some query values contain the
city name instead of the zip code, but the zip code was entered in
the URL query. This is an example of the expansion of the possible
query parameter values by using an additional file that associates
zip codes and city names.
[0065] An excerpt from JoesCars.com list of site queries and
parameter values 302-304 is also shown in FIG. 3a.
[0066] The first noticeable difference between Automall.com's list
and JoesCars.com's list is the URL destination. Instead of
JoesCars.com's URL, the destination is a proxy server's URL
(outside.com). This is because of the HTTP request method used on
the JoesCars.com site. A search engine cannot execute a POST
request from one of its links. As a result, the link given to the
search engine must be a GET request to a third party proxy site
that can execute the post and return the JoesCars.com results page
to the user.
[0067] Another difference is the additional Year parameter. As
mentioned above, all queries are given the default range of
"1982-2002" for this additional parameter. Also, since each site
might span a different range of years the list of possible query
parameter values might be different. For example, JoesCars.com
might show Ford Granada while this model might be too old for
Automall.com. Remember that in this example the car models were
found by searching the SELECT statements of the HTML.
[0068] Note that while these lists might contain all possible
inputs for the minimum set of query parameters, not all items in
the list may be submitted to the search engine. Instead, each item
on the list might be compared to previous search engine queries for
relevance. If such a query is never entered, it may be excluded
from the information given to the search engine.
[0069] Airline Reservation Sites
[0070] Suppose that two deep web airline reservations sites 108 are
of interest for various flight schedule and availability searches.
One is a general travel site that lists the reservations of many
different airlines. This site is called GeneralTravel.com. The
other is a site for a specific airline. This site is called
FastJet.com.
[0071] Again, for ease of understanding, the dynamically generated
web pages of GeneralTravel.com and FastJet.com are assumed to be
HTML based, and communications with the two sites are HTTP
based.
[0072] As with the used cars, the first step in identifying and
reconciling the queries of the websites is to identify the minimum
set of query parameters. For airline reservations the minimum set
of parameters is four. These are the "departure date", "from
location", "return date" and "to location". These parameters may
also be used for hotel and car rental reservations. However, the
"from location" is not needed for these types of reservations.
[0073] Suppose GeneralTravel.com also requires the number of people
traveling. Since this is not part of the minimum set of parameters,
the default can be used. In this case the default is one
person.
[0074] The next step is to understand the method of query
invocation on each site. Suppose that both GeneralTravel.com and
FastJet.com use the GET HTTP request method. Also suppose that the
URL syntax for each of these sites is as follows.
[0075]
http://www.peneraltravel.com/res.html?persons=1&from=bwi&to=sfo&lea-
ve=0504&return=0506
[0076]
http://www.fastiet.com/res.html?fdest=bwi&tdest=sfo&start=0504&end=-
0506
[0077] The third step in the process is then to create the lists of
site queries and parameter values. In the case of used car sites,
the car models were limited by the values placed in the HTML select
statements. Even the number of user zip codes was a finite set.
However, the number of possible departure dates and return dates is
infinite. As a result, the list of possible parameter values must
be limited by some other means.
[0078] Suppose that most reservations are booked no more than one
year in advance. Also, suppose that most travel takes place within
one month. The possible parameter values are now considerably
limited. First of all, the departure dates are then the next 365
days. For each departure date, there will be 30 return dates
covering the following 30 days. Thus, there will be a total of 365
times 30, or 10,950 possible combinations of dates.
[0079] To complete the list of values, a list of all possible
destinations must be available. Every possible combination of to
and from destination can then be generated. Furthermore, the
destinations can be specified as cities or airport codes. The
complete list of parameter values is then 10,950 possible
combinations of dates for each possible combination of to and from
destinations.
[0080] An excerpt from GeneralTravel.com's list of site queries and
parameter values 312-314 is also shown in FIG. 3b.
[0081] Note that the URL contains information not in the query. In
this case, this is the "number of persons traveling." Also, note
that query may contain the city or the airport, but the URL only
contains the airport code. This implies that a list correlating
airport codes and city names was used to generate the list of
parameter values.
[0082] Referring back to FIG. 2, once the queries of the websites
108 of interest have been identified and reconciled, corresponding
proxy web page specifications 119 for the dynamically generated
query answer web pages are created, block 208, and submitted to
search services 104 for processing to create indices 115 to index
the dynamically generated query answer web pages.
[0083] Mores specifically, as earlier described, corresponding
proxy web page specifications 119 are submitted to index generator
113, which generates, in response, the earlier described indices
115.
[0084] As described earlier, index generator 113 may be any one of
a number of such elements known in the art or to be designed. Its
functions and constitution are known, and except for its usage, the
manner it is constituted is not essential to the present
invention.
[0085] FIGS. 4a-4b illustrate one each, an example proxy web page
specification 119a-119b for the queries of Automall.com and
Generaltravel.com. In alternate embodiments, the information may be
provided to search services 104 via other formats and/or
approaches.
[0086] FIGS. 5a-5b illustrate the corresponding example indices
that may be generated to effectively index the dynamic query answer
pages for the earlier described search examples, i.e. used car
search, and flight schedule and availability search. As
illustrated, example indices 500a may comprise indices 502a-502b
that advantageously index the dynamically generated query answer
pages for the example used car search, and example indices 500b may
comprise indices 502c-502b that advantageously index the
dynamically generated query answer pages for the example flight
schedule/availability search, and thereby overcoming the prior art
deficiency of not having any index that indexes dynamically
generated web pages.
[0087] Referring again to FIG. 2, and recalling from earlier
descriptions, for queries of the HTML and HTTP embodiment employing
the FORM and the POST method to post their associated query
parameters to their websites 108, associated query parameter values
117 for pre-filling query forms are created and pre-stored in proxy
server 106 to facilitate the form filling and parameter value
posting. Accordingly, upon creation of appropriate proxy web page
specifications 119 and their provisions to index generator 113 to
generate indices 115, proxy web page specifications 119 are
employed to create the associated query parameter values 117 and
pre-filling forms and pre-store them with proxy server 106 for use
during operation.
[0088] Associating query parameter values 117 for pre-filling query
forms may be stored on proxy servers 106 using any one of a number
of data structures and/or organizations known in the art or to be
designed, including but are not limited to hash tables, flat files,
relational databases, and so forth.
[0089] FIGS. 6a and 6c illustrate one each, a source view of an
example pre-filled form 117a for the above described used car
search example, and a source view of an example pre-filled form
117b for the above described flight reservation example, to
facilitate posting of associated query parameters of queries
corresponding to two dynamically generated query answer pages.
[0090] As illustrated, pre-filled form 117a has the values "Ford"
and "Focus" pre-selected for the Make and Model parameters.
Further, pre-filled form 117a has the zip code value "22043"
pre-set for the zip code parameter.
[0091] Similarly, pre-filled form 117b has the parameters values
for the departing city, destination city, number of passenger,
whether it is a round trip reservation, departure date, return
date, and so forth pre-filled.
[0092] For ease of understanding, some of the less illustrative
source codes (in the "blank" areas) are omitted.
[0093] FIGS. 6b and 6d illustrate the end user view of the
corresponding example pre-filled forms 117a and 117b for the above
described used car and flight reservation search examples, when a
user selects the index that effectively index the corresponding
dynamically generated query answer page.
[0094] Those skilled in the art will appreciate that the examples
have been purposely kept simple for ease of understanding.
Nevertheless, they convey the substances of the present invention,
even though in practice, typically more query parameters and query
values are involved.
Operation
[0095] Thus, during operation, on receipt of a set of search
criteria from a client 102, search service 104 is able to return a
search answer page comprising indices indexing query answer web
pages that are not persistent, and are to be dynamically
generated.
[0096] As described earlier, example search result web pages for
the earlier described examples, i.e. used car search and flight
schedule and availability search may comprise indices 502a-502b and
502c-502d that advantageously index dynamically generated web
pages, thereby overcoming the prior art deficiency of not having
any index that indexes dynamically generated web pages.
[0097] Further, on selection, the search criteria will be carried
forward, and incorporated into the dynamically generated answer web
page, thereby also overcoming the prior art disadvantage of
requiring the user to re-enter the search criteria. As described
earlier, this is effectuated through the use of proxy server 106
and the complementary associated query parameter values 117 for
pre-filling query forms pre-generated and pre-stored on proxy
server 106.
[0098] In alternate embodiments, in lieu of retrieving the relevant
forms in real time from the applicable websites 108, the present
invention may be practiced with the forms pre-fetched, pre-filled
and cached.
[0099] For the illustrated embodiment, preservation and transfer of
the search criteria or query parameters are accomplished without
resorting to the use of cookies (and client approval of such use).
However, in alternate embodiments, some or all of the query
parameters may be preserved and/or transferred involving the use of
cookies (and client approval of such use).
[0100] Moreover, the present invention may be practiced with the
preservation and transfer of other state information, in addition
to the search criteria. Furthermore, the present invention may also
be practiced in conjunction with the employment of one or more
security techniques.
Example Computer System
[0101] FIG. 7 illustrates an example computing system suitable for
use as either a client or to host the proxy function, the search
engine or a web server to practice the present invention. As a
client system, computing system 700 may be a desktop computer
system, a laptop computer system, a tablet computing device, a palm
sized computing device, a wireless mobile phone, a set-top box, an
Internet appliance and the like. As a server, computing system 700
may be a single or a cluster of computer systems.
[0102] As shown, exemplary computing system 700 includes one or
more processors 702 and system memory 704. Additionally, system 700
may include mass storage devices 706 (such as diskette, hard drive,
CDROM and so forth), input/output devices 708 (such as keyboard,
cursor control and so forth) and communication interfaces 710 (such
as network interface cards, modems and so forth). The elements are
coupled to each other via system bus 712, which represents one or
more buses. In the case of multiple buses, the buses are bridged by
one or more bus bridges (not shown).
[0103] Each of these elements performs its conventional functions
known in the art. In particular, system memory 704 and mass storage
706 are employed to store a working copy and a permanent copy of
the programming instructions and/or data, implementing the
teachings of the present invention (e.g. the enhanced indices of
the search engine, the complementary pre-filled forms, and so
forth). The programming instructions may be instructions of any one
of a number of known or to be designed languages, including but are
not limited to C, C++, Perl, Java, Javascript, XML, HTML and so
forth. All or a portion of the permanent copy of the programming
instructions/data may be loaded into mass storage 706 in the
factory, in the field (via a distribution medium) or through
communication interface 710 (from a distribution server (not
shown)).
[0104] The constitution of these elements 702-712 are known, and
accordingly will not be further described.
Additional Embodiments
[0105] In additional embodiments, illustrated by FIGS. 10-15, the
search services and the proxy server may be effectively combined,
by having the search services automatically select the indices that
index the dynamically generated web pages, and cause the query
result web pages to be embedded as part of the search result page.
As can be seen from FIGS. 10, 12, and 14, embedded query result
pages may be included as answers of the search results page. Such
answers may include one or more input fields. In some embodiments,
one or more of these fields may have values entered corresponding
to one or all of the search criteria of the user search request.
Other fields may be left empty or contain default values, and may
thus enable a user to enter new values or alter default or
previously entered values.
[0106] In one embodiment, the query results may be embedded into
the search result page as other "hits"/answers, and placed as a
group above, below, or among the other "normal" hits/answers.
Example search result pages 1002a, 1202a, and 1402a of FIGS. 10a,
12a, and 14a, respectively, illustrate such as embodiment. In
another embodiment, the query results may be embedded into the
search result page as hits/answers, and may be interspersed among
the other "normal" hits/answers. Example search result pages 1002b,
1202b, and 1402b of FIGS. 10b, 12b, and 14b, respectively,
illustrate such as embodiment. In a third embodiment, the query
results may be embedded into the search result page a single
hits/answer, the single hit/answer providing an embedded query to
serve as a proxy for a plurality of query result pages. Example
search result pages 1002c, 1202c, and 1402c of FIGS. 10c, 12c, and
14c, respectively, illustrate such as embodiment. In yet another
embodiment, the query results may be embedded into the search
result page as hits/answers, and may be placed along the side of
the other "normal" hits/answers. Example search result pages 1002d,
1202d, and 1402d of FIGS. 10d, 12d, and 14d, respectively,
illustrate such as embodiment.
[0107] As is shown by FIGS. 10a-10d, embodiments of the present
invention may make available deep web information associated with
vehicular rental services to users by embedding query result pages
of information locations associated with vehicular rentals as
hits/answers of a search results page, the search results page
having a plurality of answers. Each hit/answer embedding a query
result page may include an input field for a rental location,
fields for a pick-up date and time, field for a return date and
time, and a field for a vehicle type. As shown, the times and
vehicle type may be selected from a pre-determined choice list of
options using a select box. In alternate embodiments, however any
of the fields may be implemented as input text fields, select box
lists, radio buttons, or any other form elements known in the art.
Also, as illustrated, each hit/answer embedding a query result page
may include a form element or hyperlink (in FIGS. 10a-10d, shown as
the "Check" box) effectively indexing to the information location
having the query result page that is embedded by the search results
page. When actuated by a user, the values entered into the input
elements of the form may be transmitted to the information
location, and the information location may dynamically generate the
query results page, including the transmitted values in its fields.
Also, as is discussed above, the hits/answers embedding query
result pages may be included as a group above the other normal
hits/answers (FIG. 10a), may be interspersed among the other normal
hit/answers (FIG. 10b), may be presented as a single hit/answer
serving as a proxy for a plurality of query result pages (FIG.
10c), or may be presented along side of the other normal
hits/answers (FIG. 10d). Also, FIG. 11 illustrates an exemplary
fragment of source code of a search result page, the page
associated with a car rental service query.
[0108] As illustrated by FIGS. 12a-12d, embodiments of the present
invention may make available deep web information associated with
train reservation services to users by embedding query result pages
of information locations associated with train reservations as
hits/answers of a search results page, the search results page
having a plurality of answers. Each hit/answer embedding a query
result page may include an input text field for a departure
location, an input text field for an arrival location, and an input
text field for a travel date. In alternate embodiments, any of the
fields may be implemented as input text fields, select box lists,
radio buttons, or any other form elements known in the art. Also,
as illustrated, each hit/answer embedding a query result page may
include a form element or hyperlink (in FIGS. 12a-12d, shown as the
"Check" box) effectively indexing to the information location
having the query result page that is embedded by the search results
page. When actuated by a user, the values entered into the input
elements of the form may be transmitted to the information
location, and the information location may dynamically generate the
query results page, including the transmitted values in its fields.
Also, as is discussed above, the hits/answers embedding query
result pages may be included as a group above the other normal
hits/answers (FIG. 12a), may be interspersed among the other normal
hit/answers (FIG. 12b), may be presented as a single hit/answer
serving as a proxy for a plurality of query result pages (FIG.
12c), or may be presented along side of the other normal
hits/answers (FIG. 12d). Also, FIG. 13 illustrates an exemplary
fragment of source code of a search result page, the page
associated with a train reservation service query.
[0109] As illustrated by FIGS. 14a-14d, embodiments of the present
invention may make available deep web information associated with
flight status information services to users by embedding query
result pages of information locations associated with flight status
information as hits/answers of a search results page, the search
results page having a plurality of answers. Each hit/answer
embedding a query result page may include an input text field for a
flight number and an input text field for a flight date. In
alternate embodiments, any of the fields may be implemented as
input text fields, select box lists, radio buttons, or any other
form elements known in the art. Also, as illustrated, each
hit/answer embedding a query result page may include a form element
or hyperlink (in FIGS. 14a-14d, shown as the "Check" box)
effectively indexing to the information location having the query
result page that is embedded by the search results page. When
actuated by a user, the values entered into the input elements of
the form may be transmitted to the information location, and the
information location may dynamically generate the query results
page, including the transmitted values in its fields. Also, as is
discussed above, the hits/answers embedding query result pages may
be included as a group above the other normal hits/answers (FIG.
14a), may be interspersed among the other normal hit/answers (FIG.
14b), may be presented as a single hit/answer serving as a proxy
for a plurality of query result pages (FIG. 14c), or may be
presented along side of the other normal hits/answers (FIG. 14d).
Also, FIG. 15 illustrates an exemplary fragment of source code of a
search result page, the page associated with a flight status
service query.
[0110] In one embodiment, rather than embedding the exact query
result web page in the search answer page, an approximation of the
query result page may be embedded. For example, the exact query
result page may include separate fields for a month and a date,
while the embedded entry may include one combination field for the
month and date. In such an embodiment, a transformation program may
be applied to the search answer page's embedded submission,
splitting, for example, the submitted month-year combination. The
split submission may then be further submitted to the query result
web site. Such a transformation program may be a JavaScript program
or a program of any programming language known in the art. Such a
transformation program may be a proxy between the search engine of
the search results page and the target web site of the query result
web page. In such a case, where the transformation program is a
proxy, the HTTP request that a browser of the user displaying the
search answer page generates may be redirected through the proxy,
the proxy converting the combined form value, such as the combined
month-date, of the HTTP request to the desired format, such as a
split month and date, and sending the converted HTTP request to the
target site. Also, the same (or different) proxy may receive the
resulting web page from the target site and forward it back to the
browser.
[0111] Also, embodiments of the present invention may make
available deep web information associated with one or more of
limousine reservation services, taxi reservations services, bus
reservation/schedule services, both local and long distance, boat
reservation/schedule services, ship reservation/schedule services,
car rental location services, train status information services,
ferry reservation/schedule services, and cruise liner
reservation/schedule services to users by embedding query result
pages of information locations associated with the service(s) as
hits/answers of a search results page.
Alternate Embodiments
[0112] While the present invention has been described in terms of
the foregoing embodiments, those skilled in the art will recognize
that the invention is not limited to the embodiments described. The
present invention can be practiced with modification and alteration
within the spirit and scope of the appended claims.
[0113] In particular, while for ease of understanding, the search
services and the proxy server have been described as two separate
entities, in alternate embodiments, the two may be combined, or one
or more functions of one or more of search services and proxy
server may be distributed among additional implementation
entities.
[0114] Further, in alternate embodiments, the present invention may
also be suitably implemented, at least in part, in firmware or
hardware, or some suitable combination of at least two of the
three. Such firmware or hardware embodiments may include
programmable logic devices, such as programmable array logic (PALs)
and programmable logic arrays (PLAs), field programmable gate
arrays (FPGAs), application specific integrated circuits (ASICs),
large scale integrated circuits (LSIs), very large scale integrated
circuits (VLSI) or the like--to form the various types of modules,
circuitry, controllers, routines and systems described and claimed
herein.
CONCLUSION AND EPILOGUE
[0115] Thus, it can be seen from the above descriptions, a novel
method, including associated apparatuses, for accessing deep web
information associated with transportation services has been
described. The description is to be regarded as illustrative
instead of restrictive on the present invention, which scope is set
forth in the claims to follow.
* * * * *
References