U.S. patent application number 12/200648 was filed with the patent office on 2009-11-19 for multi-modal query generation.
This patent application is currently assigned to MICROSOFT CORPORATION. Invention is credited to Yun-Cheng Ju, Bongshin Lee, Christopher A. Meek, Timothy Seung Yoon Paek, Bo Thiesson.
Application Number | 20090287626 12/200648 |
Document ID | / |
Family ID | 41317081 |
Filed Date | 2009-11-19 |
United States Patent
Application |
20090287626 |
Kind Code |
A1 |
Paek; Timothy Seung Yoon ;
et al. |
November 19, 2009 |
MULTI-MODAL QUERY GENERATION
Abstract
A multi-modal search system (and corresponding methodology) is
provided. The system employs text, speech, touch and gesture input
to establish a search query. Additionally, a subset of the
modalities can be used to obtain search results based upon exact or
approximate matches to a search result. For example, wildcards,
which can either be triggered by the user or inferred by the
system, can be employed in the search.
Inventors: |
Paek; Timothy Seung Yoon;
(Sammamish, WA) ; Thiesson; Bo; (Woodinville,
WA) ; Ju; Yun-Cheng; (Bellevue, WA) ; Lee;
Bongshin; (Issaquah, WA) ; Meek; Christopher A.;
(Kirkland, WA) |
Correspondence
Address: |
LEE & HAYES, PLLC
601 W. RIVERSIDE AVENUE, SUITE 1400
SPOKANE
WA
99201
US
|
Assignee: |
MICROSOFT CORPORATION
Redmond
WA
|
Family ID: |
41317081 |
Appl. No.: |
12/200648 |
Filed: |
August 28, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61053214 |
May 14, 2008 |
|
|
|
Current U.S.
Class: |
706/46 ;
707/999.005; 707/E17.014 |
Current CPC
Class: |
G06F 16/3322 20190101;
G10L 15/26 20130101 |
Class at
Publication: |
706/46 ; 707/5;
707/E17.014 |
International
Class: |
G06F 7/06 20060101
G06F007/06; G06F 17/30 20060101 G06F017/30; G06N 5/02 20060101
G06N005/02 |
Claims
1. A system that facilitates multi-modal search, comprising: a
query administration component that converts a multi-modal input
into a wildcard search query; and a search engine component that
employs the wildcard search query to retrieve a list of query
suggestion results.
2. The system of claim 1, further comprising: a query generation
component that employs a plurality of modalities to generate the
wildcard search query; and an analysis component that evaluates the
wildcard search query and renders the list of query suggestion
results as a function of the search query
3. The system of claim 2, wherein the plurality of modalities
includes at least two of text, touch or speech.
4. The system of claim 2, wherein the query generation component
facilitates generation of the wildcard search query based upon at
least a portion of the list of query suggestion results.
5. The system of claim 1, wherein the list of query suggestion
results includes one of an n-best list or alternates list from a
speech recognizer and a list of supplementary results that includes
at least one of an `exact` match via a wildcard expression or an
`approximate` match via an information retrieval algorithm.
6. The system of claim 5, wherein a wildcard expression is
generated from at least part of the n-best list obtained from a
speech recognizer and used to retrieve items in an index or
database which match the substrings of the wildcard search
query.
7. The system of claim 5, wherein at least part of the n-best list
obtained from the speech recognizer is submitted as a query to an
information retrieval algorithm that is indifferent to the order of
words in the wildcard search query.
8. The system of claim 7, wherein the information retrieval
algorithm is a Term Frequency Inverse Document Frequency (TFIDF)
algorithm.
9. The system of claim 2, wherein the query generation component
employs user generated text to constrain speech recognition upon
generating the wildcard search query.
10. The system of claim 2, wherein the query generation component
dynamically converts a user input into a wildcard, and wherein the
analysis component employs the wildcard to retrieve a subset of the
suggested query results.
11. The system of claim 10, wherein a user conveys uncertainty, and
wherein the wildcard search query is a regular expression
query.
12. The system of claim 1, further comprising an artificial
intelligence (AI) component that employs at least one of a
probabilistic and a statistical-based analysis that infers an
action that a user desires to be automatically performed.
13. A computer-implemented method of multi-modal search,
comprising: receiving a multi-modal input from a user; establishing
a wildcard query based upon portions of the multi-modal input; and
rendering a plurality of suggested query results based upon the
wildcard query.
14. The computer-implemented method of claim 13, wherein the
multi-modal input includes at least two of text, speech, touch or
gesture input.
15. The computer-implemented method of claim 13, further
comprising: converting a portion of the multi-modal input into a
wildcard; and retrieving a subset of the query suggestion results
based upon the wildcard.
16. The computer-implemented method of claim 13, further comprising
analyzing the input as a function of an algorithm irrespective of
word order.
17. The computer-implemented method of claim 16, wherein the
algorithm is a TFIDF algorithm.
18. The computer-implemented method of claim 13, wherein the
multi-modal input includes at least a text hint coupled with a
spoken input.
19. A computer-executable system that facilitates generation of a
wildcard search query based upon a multi-modal input, comprising:
means for receiving the multi-modal input from a user, wherein the
multi-modal input includes at least two of text, speech, touch or
gesture input; means for analyzing the multi-modal input
irrespective of order or portions of the order; and means for
generating the wildcard search query based upon the analysis.
20. The computer-executable system of claim 19, further comprising
means for generating a wildcard based at least in part upon a
portion of the multi-modal input, wherein the wildcard search query
employs the wildcard to match zero or more characters.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Patent application Ser. No. 61/053,214 entitled "MULTI-MODALITY
SEARCH INTERFACE" and filed May 14, 2008. This application is
related to pending U.S. patent application Ser. No. _______
entitled "MULTI-MODAL QUERY REFINEMENT" filed on ______ and to
pending U.S. patent application Ser. No. ______ entitled
"MULTI-MODAL SEARCH WILDCARDS" filed on ______. The entireties of
the above-noted applications are incorporated by reference
herein.
BACKGROUND
[0002] The Internet continues to make available ever-increasing
amounts of information which can be stored in databases and
accessed therefrom. With the proliferation of mobile and portable
terminals (e.g., cellular telephones, personal data assistants
(PDAs), smartphones and other devices), users are becoming more
mobile, and hence, more reliant upon information accessible via the
Internet. Accordingly, users often search network sources such as
the Internet from their mobile device.
[0003] There are essentially two phases in an Internet search.
First, a search query is constructed that can be submitted to a
search engine. Second the search engine matches this search query
to actual search results. Conventionally, these search queries were
constructed merely of keywords that were matched to a list of
results based upon factors such as relevance, popularity,
preference, etc.
[0004] The Internet and the World Wide Web continue to evolve
rapidly with respect to both volume of information and number of
users. As a whole, the Web provides a global space for
accumulation, exchange and dissemination of information. As mobile
devices become more and more commonplace to access the Web, the
number of users continues to increase.
[0005] In some instances, a user knows the name of a site, server
or URL (uniform resource locator) to the site or server that is
desired for access. In such situations, the user can access the
site, by simply typing the URL in an address bar of a browser to
connect to the site. Oftentimes, the user does not know the URL and
therefore has to `search` the Web for relevant sources and/or
URL's. To maximize likelihood of locating relevant information
amongst an abundance of data, Internet or web search engines are
regularly employed.
[0006] Traditionally, to locate a site or corresponding URL of
interest, the user can employ a search engine to facilitate
locating and accessing sites based upon alphanumeric keywords
and/or Boolean operators. In aspects, these keywords are text- or
speech-based queries, although, speech is not always reliable.
Essentially, a search engine is a tool that facilitates web
navigation based upon textual (or speech-to-text) entry of a search
query usually comprising one or more keywords. Upon receipt of a
search query, the search engine retrieves a list of websites,
typically ranked based upon relevance to the query. To enable this
functionality, the search engine must generate and maintain a
supporting infrastructure.
[0007] Upon textual entry of one or more keywords as a search
query, the search engine retrieves indexed information that matches
the query from an indexed database, generates a snippet of text
associated with each of the matching sites and displays the results
to the user. The user can thereafter scroll through a plurality of
returned sites to attempt to determine if the sites are related to
the interests of the user. However, this can be an extremely
time-consuming and frustrating process as search engines can return
a substantial number of sites. More often than not, the user is
forced to narrow the search iteratively by altering and/or adding
keywords and Boolean operators to obtain the identity of websites
including relevant information, again by typing (or speaking) the
revised query.
[0008] Conventional computer-based search, in general, is extremely
text-centric (pure text or speech-to-text) in that search engines
typically analyze content of alphanumeric search queries in order
to return results. These traditional search engines merely parse
alphanumeric queries into `keywords` and subsequently perform
searches based upon a defined number of instances of each of the
keywords in a reference.
[0009] Currently, users of mobile devices, such as smartphones,
often attempt to access or `surf` the Internet using keyboards or
keypads such as, a standard numeric phone keypad, a soft or
miniature QWERTY keyboard, etc. Unfortunately, these input
mechanisms are not always efficient for the textual input to
efficiently search the Internet. As described above, conventional
mobile devices are limited to text input to establish search
queries, for example, Internet search queries. Text input can be a
very inefficient way to search, particularly for long periods of
time and/or for very long queries.
SUMMARY
[0010] The following presents a simplified summary of the
innovation in order to provide a basic understanding of some
aspects of the innovation. This summary is not an extensive
overview of the innovation. It is not intended to identify
key/critical elements of the innovation or to delineate the scope
of the innovation. Its sole purpose is to present some concepts of
the innovation in a simplified form as a prelude to the more
detailed description that is presented later.
[0011] The innovation disclosed and claimed herein, in one aspect
thereof, comprises a search system and corresponding methodologies
that can couple speech, text and touch for search interfaces and
engines. In other words, rather than being completely dependent
upon conventional textual input, the innovation can combine speech,
text, and touch to enhance usability and efficiency of search
mechanisms. Accordingly, it can be possible to locate more
meaningful and comprehensive results as a function of a search
query.
[0012] In aspects, a multi-modal search management system employs a
query administration component to analyze multi-modal input (e.g.,
text, speech, touch) and to generate appropriate search criteria.
Accordingly, comprehensive and meaningful search results can be
gathered. The features of the innovation can be incorporated into a
search engine or, alternatively, can work in conjunction with a
search engine.
[0013] In other aspects, the innovation can be incorporated or
retrofitted into existing search engines and/or interfaces. Yet
other aspects employ the features, functionalities and benefits of
the innovation in mobile search applications, which has strategic
importance given the increasing usage of mobile devices as a
primary computing device. As described above, mobile devices are
not always configured or equipped with full-function keyboards,
thus, the multi-modal functionality of the innovation can be
employed to greatly enhance comprehensiveness of search.
[0014] In yet another aspect thereof, machine learning and
reasoning is provided that employs a probabilistic and/or
statistical-based analysis to prognose or infer an action that a
user desires to be automatically performed.
[0015] To the accomplishment of the foregoing and related ends,
certain illustrative aspects of the innovation are described herein
in connection with the following description and the annexed
drawings. These aspects are indicative, however, of but a few of
the various ways in which the principles of the innovation can be
employed and the subject innovation is intended to include all such
aspects and their equivalents. Other advantages and novel features
of the innovation will become apparent from the following detailed
description of the innovation when considered in conjunction with
the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] FIG. 1 illustrates an example block diagram of a system that
establishes a query from a multi-modal input in accordance with
aspects of the innovation.
[0017] FIG. 2 illustrates an example user interface in accordance
with an aspect of the innovation.
[0018] FIG. 3 illustrates an example of a typical speech
recognition system in accordance with an aspect of the
innovation.
[0019] FIG. 4 illustrates an alternative example block diagram of a
speech recognition system in accordance with an aspect of the
innovation.
[0020] FIG. 5 illustrates an example flow chart of procedures that
facilitate generating a query from a multi-modal input in
accordance with an aspect of the innovation.
[0021] FIG. 6 illustrates an example flow chart of procedures that
facilitate analyzing a multi-modal input in accordance with an
aspect of the innovation.
[0022] FIG. 7 illustrates an example block diagram of a query
administration component in accordance with an aspect of the
innovation.
[0023] FIG. 8 illustrates an example analysis component in
accordance with an aspect of the innovation.
[0024] FIG. 9 illustrates a block diagram of a computer operable to
execute the disclosed architecture.
[0025] FIG. 10 illustrates a schematic block diagram of an
exemplary computing environment in accordance with the subject
innovation.
DETAILED DESCRIPTION
[0026] The innovation is now described with reference to the
drawings, wherein like reference numerals are used to refer to like
elements throughout. In the following description, for purposes of
explanation, numerous specific details are set forth in order to
provide a thorough understanding of the subject innovation. It may
be evident, however, that the innovation can be practiced without
these specific details. In other instances, well-known structures
and devices are shown in block diagram form in order to facilitate
describing the innovation.
[0027] As used in this application, the terms "component" and
"system" are intended to refer to a computer-related entity, either
hardware, a combination of hardware and software, software, or
software in execution. For example, a component can be, but is not
limited to being, a process running on a processor, a processor, an
object, an executable, a thread of execution, a program, and/or a
computer. By way of illustration, both an application running on a
server and the server can be a component. One or more components
can reside within a process and/or thread of execution, and a
component can be localized on one computer and/or distributed
between two or more computers.
[0028] As used herein, the term to "infer" or "inference" refer
generally to the process of reasoning about or inferring states of
the system, environment, and/or user from a set of observations as
captured via events and/or data. Inference can be employed to
identify a specific context or action, or can generate a
probability distribution over states, for example. The inference
can be probabilistic--that is, the computation of a probability
distribution over states of interest based on a consideration of
data and events. Inference can also refer to techniques employed
for composing higher-level events from a set of events and/or data.
Such inference results in the construction of new events or actions
from a set of observed events and/or stored event data, whether or
not the events are correlated in close temporal proximity, and
whether the events and data come from one or several event and data
sources.
[0029] While certain ways of displaying information to users are
shown and described with respect to certain figures as screenshots,
those skilled in the relevant art will recognize that various other
alternatives can be employed. The terms "screen," "web page," and
"page" are generally used interchangeably herein. The pages or
screens are stored and/or transmitted as display descriptions, as
graphical user interfaces, or by other methods of depicting
information on a screen (whether personal computer, PDA, mobile
telephone, or other suitable device, for example) where the layout
and information or content to be displayed on the page is stored in
memory, database, or another storage facility.
[0030] Conventional voice-enabled search applications encourage
users to "just say what you want" in order to obtain useful content
such as automated directory assistance (ADA) via a mobile device.
Unfortunately, when users only remember part of what they are
looking for, they are forced to guess, even though what they know
may be sufficient to retrieve the desired information.
Additionally, oftentimes, quality of the voice recognition is
impaired by background noise, speaker accents, speaker clarity,
quality of recognition applications or the like.
[0031] The innovation discloses systems (and corresponding
methodologies) that expand the conventional capabilities of
voice-activated search to allow users to explicitly constrain the
recognition results to match the queery by supplementing the speech
with additional criteria, for example, to provide partial knowledge
in the form of text hints. In doing so, a multi-modal approach is
presented which incorporates voice with text, touch, etc. This
multi-modal functionality enables users more accurately access
desired information.
[0032] In aspects, the innovation discloses a multi-modal search
interface that tightly couples speech, text and touch by utilizing
regular expression queries that employ `wildcards,` where parts of
the query can be input via different modalities. For instance,
modalities such as speech, text, touch and gestures can be used at
any point in the query construction process. In other aspects, the
innovation can represent uncertainty in a spoken recognized result
as wildcards in a regular expression query. In yet other aspects,
the innovation allows users to express their own uncertainty about
parts of their utterance using expressions such as "something" or
"whatchamacallit" which can then be translated into or interpreted
as wildcards.
[0033] Referring initially to the drawings, FIG. 1 illustrates an
example block diagram of a system 100 that employs a multi-modal
search management system 102 to construct meaningful search results
based upon a multi-modal input query. It is to be understood that,
as used herein, `multi-modal` can refer to most any combination of
text, voice, touch, gestures, etc. While examples described herein
are directed to a specific multi-modal example that employs text,
voice and touch only, it is to be understood that other examples
exist that employ a subset of these identified modalities. As well,
it is to be understood that other examples exist that employ
disparate modalities in combination with or separate from those
described herein. For instance, other examples can employ gesture
input, pattern recognition, among others to establish a search
query. Similarly, while the examples are directed to mobile device
implementations, it is to be understood that the features,
functions and benefits of the innovation can be applied to most any
computing experience, platform and/or device without departing from
the spirit and scope of this disclosure and claims appended
hereto.
[0034] As shown the multi-modal search management component 102 can
include a query administration component 104 and a search engine
component 106. Essentially, these subcomponents (104, 106) enable a
user to establish a query using multiple modalities and to search
for data and other resources using the multi-modal query, for
example, a query constructed using text, voice, touch, gestures,
etc. Features, functions and benefits of the innovation will be
described in greater detail below.
[0035] Internet usage, especially via mobile devices, continues to
grow as users seek anytime, anywhere access to information. Because
users frequently search for businesses, directory assistance has
recently been the focus of conventional voice search applications
utilizing speech as the primary input modality. Unfortunately,
mobile scenarios often contain noise which degrades performance of
speech recognition functionalities. Thus, the innovation presents a
multi-modal search management system 102 which can employ user
interfaces (UIs) that not only can facilitate touch and text
whenever speech fails, but also allows users to assist the speech
recognizer via text hints.
[0036] Continuing with the ADA example from above, in generating a
search query, the innovation can also take advantage of most any
partial knowledge users may have about a business listing by
letting them express their uncertainty in a simple, intuitive
manner. In simulation experiments conducted on real voice search
data, leveraging multi-modal refinement resulted in a 28% relative
reduction in error rate. Providing text hints along with the spoken
utterance resulted in even greater relative reduction, with
dramatic gains in recovery for each additional character.
[0037] As can be appreciated, according to market research, mobile
devices are believed to be poised to rival desktop and laptop PCs
(personal computers) as a dominant Internet platform, providing
users with anytime, anywhere access to information. One common
request for information is the telephone number or address of local
businesses. Because perusing a large index of business listings can
be a cumbersome affair using existing mobile text and touch input
mechanisms, directory assistance has been emerged as a focus of
voice search applications, which utilize speech as the primary
input modality. Unfortunately, mobile environments pose problems
for speech recognition, even for native speakers. First, mobile
settings often contain non-stationary noise which cannot be easily
cancelled or filtered. Second, speakers tend to adapt to
surrounding noise in acoustically unhelpful ways. Under such
adverse conditions, task completion for voice search is less than
stellar, especially in the absence of an effective correction user
interface for dealing with speech recognition errors.
[0038] In operation, the query administration component 104 can
receive multi-modal input(s), generate an appropriate query and
instruct the search engine component 106 accordingly. As will be
understood upon a review of the figures and discussions that
follow, the query administration component 104 enables one modality
to be supplemented with another thereby enhancing interpretation
and ease of use in locating meaningful search results. In one
example, speech input can be supplemented with textual hints (e.g.,
a beginning letter of a word) to enhance recognition accuracy.
Similarly, textual input can be supplemented with speech to enhance
scope of a search query. Still further, system generated and user
prompted wildcards can be used to facilitate, improve, increase or
boost functionality.
[0039] In view of the challenges of conventional voice search
approaches, especially mobile voice search, the multi-modal search
management system 102 can generate (or otherwise employ) a UI as
illustrated in FIG. 2. The multi-modal UI tightly couples speech
with touch and text in at least two directions; users can not only
use touch and text to clarify, supplement or generate their queries
whenever recognition of speech is not sufficiently reliable, but
they can also use speech whenever text entry becomes burdensome.
Additionally, the innovation enables leverage of this tight
coupling by transforming a typical n-best list, or a list of phrase
alternates from the recognizer, into a palette of words with which
users can compose and refine queries, e.g., as described in the
Related Application identified above.
[0040] The innovation can also take advantage of most any partial
knowledge users may have about the words, e.g., of the business
listing. For example, a user may only remember that the listing
starts with an "s" and also contains the word "avenue". Likewise,
the user may only remember "Saks something," where the word
"something" is used to express uncertainty about what words follow.
While the word `something` is used in the aforementioned example,
it is to be appreciated that most any desired word or indicator can
be used without departing from the spirit/scope of the innovation
and claims appended hereto. The innovation represents this
uncertainty as wildcards in an enhanced regular expression search
of the listings, which exploits the popularity of the listings.
[0041] This disclosure is focused on three phases. First, a
description of the system 100 architecture together with a contrast
against a typical architecture of conventional voice search
applications. The specification also details the backend search
infrastructure deployed for fast and efficient retrieval. Second,
the disclosure presents an example UI that highlights the
innovation's tightly coupled multi-modal generation capabilities
and support of partial knowledge with several user scenarios.
[0042] It is to be understood that the ADA examples described
herein are included to provide perspective to the features,
functions and benefits of the innovation and are not intended to
limit the scope of the disclosure and appended claims in any
manner. The following ADA example references an implementation
where users can request telephone or address information of
residential and business listings using speech recognition via a
network (e.g., Internet) equipped mobile device (e.g., smartphone,
cell phone, personal digital assistant, personal media player,
navigation system, pocket PC . . . ). As will be appreciated, with
increased use of Internet-capable mobile communication devices, ADA
is a growing industry with over 30 million U.S. callers per month.
Many voice search applications focus exclusively on telephony-based
ADA. However, more recent applications have migrated onto other
mobile devices, providing users with a rich client experience which
includes, among other services, maps and driving directions in
addition to ADA. Whether users call ADA or use a data channel to
send utterances, the speech recognition task is most always
dispatched to speech servers, due to the fact that decoding
utterances for large domains with many choices (e.g., high
perplexity domains) requires sufficient computational power, which
to date does not exist on mobile devices. However, it is to be
appreciated that the features, functions and benefits of the
innovation can be employed in connection with any data or
electronic search including, but not limited to, Internet and
intranet searching embodiments.
[0043] Returning to the ADA example, because there are currently
over 18 million listings in the U.S. Yellow Pages alone, and users
frequently may not use the exact name of the listing as found in
the directory (e.g., "Maggiano's Italian Restaurant" instead of
"Maggiano's Little Italy"), grammar-based recognition approaches
that rely on lists fail to scale properly. As such, approaches to
ADA have focused on combing speech recognition with information
retrieval techniques.
[0044] As described supra, voice search applications encourage
users to "just say what you want" in order to obtain useful mobile
content such as ADA. Unfortunately, when users only remember part
of what they are looking for, they are forced to guess, even though
what they know may be sufficient to retrieve the desired
information. In this disclosure, it is proposed to expand the
capabilities of voice search to allow users to explicitly express
their uncertainties as part of their queries, and as such, to
provide partial knowledge. Applied to ADA, the disclosure
highlights the enhanced user experience uncertain expressions
affords and delineates how to perform language modeling and
information retrieval.
[0045] Voice search applications encourage users to "just say what
you want" in order to obtain useful mobile content such as business
listings, driving directions, movie times, etc. Because certain
types of information require recognition of a large database of
choices, voice search is often formulated as both a recognition and
information retrieval (IR) task, where a spoken utterance is first
converted into text and then used as a search query for IR. ADA
exemplifies the challenges of voice search. Not only are there
millions of possible listings (e.g., 18 million in the US alone),
but users also do not frequently know, remember, or say the exact
business names as listed in the directory. As illustrated in FIG.
2, in some cases, users think they know but are mistaken (e.g., "Le
Sol Spa" for the listing "Le Soleil Tanning and Spa"). In other
cases, they remember only part of the name with certainty (e.g.,
listing starts with "Le" and contains the word "Spa"). In these
cases, what they remember may actually be sufficient to find the
listing. Unfortunately, in current voice search applications, users
are forced to guess and whatever partial knowledge they could have
provided is lost.
[0046] In this specification, the innovation enables expansion of
the capabilities of voice search to enable users to explicitly
express their uncertainties as part of their queries, and as such,
to allow systems to leverage most any partial knowledge contained
in those queries.
[0047] Voice search applications with a UI as shown in FIG. 2 can
offer even richer user experiences. In accordance with the example
multi-modal interface, the innovation displays not only the top
matches for uncertain expressions, but also the query itself for
users to edit, for example, in case they wanted to refine their
queries using text as set for the in the Related Application
identified above. FIG. 2 illustrates a screenshot of results for
the spoken utterance "Le S Something Spa", from the previous
example, as well the more general expression "Le Something Spa".
Note that the system not only retrieved exact matches for the
utterances as a regular expression query, but also approximate
matches.
[0048] As discussed earlier, the innovation's approach to voice
search involve recognition plus IR. For ADA recognition, n-gram
statistical language models are typically used to compress and
generalize across listings as well as their observed user
variations. In order to support n-gram recognition of uncertain
expressions, The training data can be modified. Given that not
enough occurrences of the word "something" appeared in the training
sentences for it to be accurately recognized (e.g., 88), that
number was boosted artificially by creating pseudo-listings from
the original data. For every listing which was not a single word
(e.g., "Starbucks"), the innovation adds new listings with "*" and
"i-*" replacing individual words, where i denotes the initial
letter of the word being replaced. For listings with more than two
words, because people tend to remember either the first or last
word of a listing, the innovation can focus on replacing interior
words. Furthermore, to preserve counts for priors, 4 new listings
(and 4 duplicates for single word listings) were added. For
example, for the listing "Le Soleil Tanning and Spa", "Le *", "Le
S*", "* Spa", and "T* Spa" were generated. Although this approach
of adding new listings with words replaced by "*" and "i-*" is a
heuristic, it was found that it facilitated adequate bigram
coverage. Finally, the pronunciation dictionary was modified so
that "*" could be recognized as "something".
[0049] The advantage of this approach is at least two-fold. First,
because the innovation replaced words with "*" and "i-*" instead of
the word "something" and avoids conflicts with businesses that had
"something" as part of their name (only 9 in the Seattle area).
Second, by having the recognition produce wildcards it is possible
to treat the recognized result in its very condition as a regular
expression for search.
[0050] Turning to a discussion of information retrieval, after
obtaining a regular expression from the recognizer (e.g., "Le *
Spa"), an index and retrieval algorithm can be used that could
quickly find likely matches for the regular expression. This is
accomplished by encoding the directory listing as a k-best suffix
array. Because a k-best suffix array is sorted by both
lexicographic order and most any figure of merit, such as the
popularity of listings in the call logs, it is a convenient data
structure for finding the most likely, or in this case, the most
popular matches for a substring, especially when there could be
many matches. For example, for the query "H* D*", the k-best suffix
array would quickly bring up "Home Depot" as the top match.
Furthermore, because lookup time for finding the k most popular
matches is close to O(log N) for most practical situations with a
worst case guarantee of O(sqrt N), where N is the number of
characters in the listings, user experience did not suffer from any
additional retrieval latencies. Note that before any regular
expression was submitted as a search query, a few simple heuristics
were applied to clean it up (e.g., consecutive wildcards were
collapsed into 1 wildcard).
[0051] Besides regular expression queries using a k-best suffix
array, which provides popular exact matches to the listings, it is
also useful to also obtain approximate matches. For this purpose,
an improved term frequency can be implemented--e.g., inverse
document frequency (TFIDF) algorithm. Because statistical language
models can produce garbled output, voice search typically utilizes
approximate search techniques, such as TFIDF, because they treat
the output as just a bag of words. This is advantageous when users
either incorrectly remember the order of words in a listing, or add
spurious words. In some ways, the two IR methods are flip sides of
each other. The strength of finding exact matches is that the
innovation can leverage most any partial knowledge users may have
about their queries (e.g., word order) as well as the popularity of
any matches. Its weakness is that it assumes users are correct
about their partial knowledge. On the other hand, this is the
strength of finding approximate matches; it is indifferent to word
order and other mistakes users often make.
[0052] FIG. 3 displays an example architecture for typical voice
search applications. As illustrated, first, an utterance can be
recognized using an n-gram statistical language model (SLM) that
compresses and generalizes across training sentences. In the case
of ADA, the training sentences comprise not only the exact listings
and business categories but also alternative expressions for those
listings. Because an n-gram is based on word collocation
probabilities, the output of the speech recognizer is an n-best
list containing phrases that may or may not match any of the
training sentences. This is often acceptable if the phrases are
submitted to an information retrieval (IR) engine that utilizes
techniques which treat the phrases as just bags of words.
[0053] The IR engine (or search engine) retrieves matches from an
index, which is typically a subset of the language model training
sentences, such as the exact listings along with their categories.
In the example architecture, if an utterance is recognized with
high confidence, it is immediately sent to the IR engine to
retrieve the best matching listing. However, if an utterance is
ambiguous in any way, as indicated for example by medium to low
confidence scores, voice search applications with a graphical UI
very often display an n-best list to users for selection, at which
point users can either select a result (e.g., phrase) or retry
their utterance.
[0054] In contrast to the voice search architecture of FIG. 3, FIG.
4 illustrates an alternative example system architecture in
accordance with the innovation. It is to be understood that the
`Search Vox` component illustrated in FIG. 4 is analogous to the
multi-modal management system 102 of FIG. 1. As shown in FIG. 4,
first, high confidence results immediately go to the IR engine.
Second, users are shown the n-best list, though the interaction
dynamics are fundamentally different than that of conventional
systems. In accordance with the innovation, if subsequent
refinement is desired, e.g., as set forth in the Related
Application referenced above, users can not only select a phrase
from the n-best list, but also the individual words which make up
those phrases thereby refining search results by way of effectively
drilling into a set of search results.
[0055] The n-best list is essentially treated as a sort of word
palette or `bag of words` from which users can select out those
words that the speech recognizer heard or interpreted correctly,
though they may appear in a different phrase. For example, suppose
a user says "home depot," but because of background noise, the
phrase does not occur in the n-best list. Suppose, however, that
the phrase "home office design" appears in the list. With typical
(or conventional) voice search applications, the user would have to
start over.
[0056] In accordance with the innovation, the user can simply
select the word "home" and invoke the backend which finds the most
popular listings that contain the word. For instance, the system
can measure popularity by the frequency with which a business
listing appears in the ADA call logs, for example, for Live Local
Search. In order to retrieve the most popular listings that contain
a particular word or substring, regular expressions can be
used.
[0057] Because, in aspects, much of the effectiveness of the
innovation's interface rests on its ability to retrieve listings
using a wildcard query--or a regular expression query containing
wildcards--a discussion follows that describes implementation of a
RegEx engine followed by further details about wildcard queries
constructed in the RegEx generator. Essentially, in operation, the
RegEx generator and RegEx engine facilitate an ability to employ
wildcards in establishing search queries.
[0058] It will be understood that the components of FIG. 4 can be
deployed within the higher level components of FIG. 1, e.g.,
multi-modal search management system 102, query administration
component 104 and search engine component 106. Three other
sub-components of the system architecture are discussed below: the
IR engine, the supplement generator, and the list filter (FIG.
4).
[0059] Turning first to a discussion of the RegEx engine, the index
data structure chosen to use for regular expression matching can be
based upon k-best suffix arrays. Similar to traditional suffix
arrays, k-best suffix arrays arrange all suffixes of the listings
into an array. While traditional suffix arrays arrange the suffixes
in lexicographical order only, the k-best suffix arrays of the
innovation arrange the suffixes according to two alternating
orders--a lexicographical ordering and an ordering based on a
figure of merit, such as popularity, preference, etc. The
arrangement of the array borrows from ideas seen in the
construction of KD-trees.
[0060] Because the k-best suffix array is sorted by both
lexicographic order and popularity, it is a convenient structure
for finding the most popular matches for a substring, especially
when there are many matches. In an aspect, the k most popular
matches can be found in time close to O(log N) for most practical
situations, and with a worst case guarantee of O(sqrt N), where N
is the number of characters in the listings. In contrast, a
standard suffix array enables locating most all matches to a
substring in O(log N) time, but does not impose any popularity
ordering on the matches. To find the most popular matches, the user
would have to traverse them all.
[0061] Consider a simple example which explains why this subtle
difference is important to the application. The standard suffix
array may be sufficiently fast when searching for the k-best
matches to a large substring since there will not be many matches
to traverse in this case. The situation is, however, completely
different for a short substring such as, for example, `a`. In this
case, a user would have to traverse all dictionary entries
containing an `a`, which is not much better than traversing all
suffixes in the listings--in O(N) time. With a clever
implementation, it is possible to continue a search in a k-best
suffix array from the position it was previously stopped. A simple
variation of k-best suffix matching will therefore allow look up of
the k-best (most popular) matches for an arbitrary wildcard query,
such as, for instance `f* m* ban*`. The approach proceeds as the
k-best suffix matching above for the largest substring without a
wildcard (`ban`). At each match, the innovation now evaluates the
full wildcard query against the full listing entry for the suffix
and continues the search until k valid expansions to the wildcard
query are found.
[0062] The k-best suffix array can also be used to exclude words in
the same way by continuing the search until expansions without the
excluded words are found. The querying process is an iterative
process, which gradually eliminates the wildcards in the text
string. Whenever the largest substring in the wildcard query does
not change between iterations, there is an opportunity to further
improve the computational efficiency of the expansion algorithm. In
this case, the k-best suffix matching can just be continued from
the point where the previous iteration ended.
[0063] With an efficient k-best suffix array matching algorithm
available for the RegEx engine, it can be deployed, for example
onto a mobile device, because of the latencies associated with
sending information back and forth along a wireless data channel.
Speech recognition for ADA already takes several seconds to return
an n-best list. It is desirable to provide short latencies for
wildcard queries--the innovation is capable of enhancing (or
shortening) the latencies.
[0064] Turning now to a discussion of the IR engine, besides
wildcard queries, which provide exact matches to the listings, it
is useful to also retrieve approximate matches to the listings. For
at least this purpose, the innovation implements an IR engine based
on an improved term frequency--inverse document frequency (TFIDF)
algorithm. What is important to note about the IR engine is that it
can treat queries and listings as bags of words. This is
advantageous when users either incorrectly remember the order of
words in a listing, or add additional words that do not actually
appear in a listing. This is not the case for the RegEx engine
where order and the presence of suffixes in the query matter.
[0065] Referring now to the RegEx generator, returning to the
example in which a user selects the word "home" for "home depot"
from a word palette, once the user invokes the backend, the word is
sent as a query to a RegEx generator which transforms it into a
wildcard query. For single phrases, the generator can simply insert
wildcards before spaces, as well as to the end of the entire query.
For example, for the query "home", the generator could produce the
regular expression "home*".
[0066] For a list of phrases, such as an n-best list from the
recognizer, the RegEx or wildcard generator uses minimal edit
distance (with equal edit operation costs) to align the phrases at
the word level. Once words are aligned, minimal edit distance is
again applied to align the characters. Whenever there is
disagreement between any aligned words or characters, a wildcard is
substituted in its place. For example, for an n-best list
containing the phrases "home depot" and "home office design," the
RegEx generator would produce "home * de*". After an initial query
is formulated, the RegEx generator applies heuristics to clean up
the regular expression (e.g., in an aspect, no word can have more
than one wildcard) before it is used to retrieve k-best matches
from the RegEx engine. The RegEx generator is invoked in this form
whenever speech is utilized, such as for leveraging partial
knowledge.
[0067] Turning now to the supplement generator of FIG. 4, as
discussed earlier, the innovation's interface treats a list of
phrases as a word palette. Because the word palette is most useful
when it is filled with words to choose from, whenever the
recognizer produces a short n-best list with less phrases than can
appear in the user interface (which for a pocket PC interface is
most often 8 items as shown in FIG. 2), or whenever a no-speech
query has been submitted (e.g., "home*" in the previous example),
it is the job of the supplement generator (FIG. 4) to retrieve
matches from the backend for the UI.
[0068] Currently, the supplement generator attempts to find exact
matches from the RegEx engine first since it will be obvious to
users why they were retrieved. Space permitting, approximate
matches are also retrieved from the IR engine. This can be
accomplished in the following manner: If any exact matches have
already been found, the supplement generator will use those exact
matches as queries to the IR engine until enough matches have been
retrieved. If there are no exact matches, the supplement generator
will use whatever query was submitted to the RegEx generator as the
query.
[0069] Finally, the List filter simply uses a wildcard query to
filter out an n-best list obtained from the speech recognizer. In
operation, the List filter is used primarily for text hints, which
are discussed infra.
[0070] As discussed in the previous section, the innovation can
display an n-best list to the user, making an interface (e.g., UI
of FIG. 2) appear, at least at first blush, like any other voice
search application. This aspect facilitates a default correction
mechanism users may expect of speech applications; namely, that
when their utterances fail to be correctly recognized, they may
still select from a list of choices, provided that their utterance
exists among these choices. However, because re-speaking does not
generally increase the likelihood that the utterance will be
recognized correctly, and furthermore, because mobile usage poses
distinct challenges not encountered in desktop settings, the
interface endows users with a larger arsenal of recovery
strategies--for example, text hints, word selection from a word
palette or bag of words, etc.
[0071] FIG. 5 illustrates a methodology of generating a multi-modal
query in accordance with an aspect of the innovation. While, for
purposes of simplicity of explanation, the one or more
methodologies shown herein, e.g., in the form of a flow chart, are
shown and described as a series of acts, it is to be understood and
appreciated that the subject innovation is not limited by the order
of acts, as some acts may, in accordance with the innovation, occur
in a different order and/or concurrently with other acts from that
shown and described herein. For example, those skilled in the art
will understand and appreciate that a methodology could
alternatively be represented as a series of interrelated states or
events, such as in a state diagram. Moreover, not all illustrated
acts may be required to implement a methodology in accordance with
the innovation.
[0072] At 502, a multi-modal input is received, for example, by way
of the UI of FIG. 2. In operation, multi-modal input can include
text, speech, touch, gesture, etc. input. While examples are
described herein, it is to be understood that the multi-modal input
can render and employ UIs that are capable of receiving most any
protocol combination. Additionally, the inputs can be received at
different timings as appropriate.
[0073] At 504, the multi-modal input is analyzed to interpret the
input. For instance, text can be parsed, speech can be converted,
etc. An appropriate search query can be generated at 506. In other
words, as a result of the analysis, a search query can be
established to increase accuracy and meaningfulness of results. As
shown, results in accordance with the search query can be obtained
at 508.
[0074] Referring now to FIG. 6, there is illustrated a methodology
of generating a search query in accordance with the innovation. At
602, a multi-modal input is received, for example, text, speech,
touch, gesture, etc. At 604, a determination is made to conclude if
the document includes text data. If so, at 606, the data can be
parsed and analyzed to determine keywords, text hints and/or
context of the text. Additionally, a determination can be made if
wildcards should be used to effect a query.
[0075] Similarly, at 608, a determination can be made to conclude
if the document includes audible data. If the document includes
audible data, at 610, speech recognition (or other suitable sound
analysis) mechanisms can be used to establish keywords associated
with the audible data and subsequently the context of the keywords
in view of the other input(s) as appropriate.
[0076] Still further, at 612, a determination is made if the
document contains gesture-related data. As with text and sound
described above, if gestures were used to input, an evaluation can
be effected at 614. For instance, if the gesture was intended to
identify a specific number of words, this criterion can be
established at 614.
[0077] Once the data is analyzed (e.g., 604-614), at 616, a search
query can be generated. Here, wildcards can be used as appropriate
to establish a comprehensive search query. Additionally, as
described above, TFIDF algorithms can be applied where appropriate.
Still further, other logic and inferences can be made to establish
user intent based upon the multi-modal input thereby establishing a
comprehensive query that can be used to fetch meaningful search
results.
[0078] Turning now to FIG. 7, an example block diagram of query
administration component 104 is shown. Generally, the query
administration component 104 can include a query generation
component 702 and an analysis component 704. Together these
sub-components (702, 704) facilitate transformation of a
multi-modal input into a comprehensive search query.
[0079] The query generation component 702 employs input from the
analysis component 704 to establish an ample and comprehensive
search query that will produce results in line with intentions of
the user input. As described in connection with the aforementioned
methodologies, the innovation can evaluate the multi-modal input.
In operation, the analysis component 704 can be employed to effect
this evaluation. Logic can be employed in connection with the
analysis component 704 to effect the evaluation.
[0080] FIG. 8 illustrates an example block diagram of an analysis
component 704. As shown, the analysis component 704 can include a
text evaluation component 802, a speech evaluation component 804
and a gesture evaluation component 806, all of which are capable of
evaluating multi-modal input in efforts to establish comprehensive
search queries. While specific modality evaluation components are
shown in FIG. 8 (802, 804, 806), it is to be understood that
alternative aspects can include other evaluation components without
departing from the spirit and/or scope of the innovation.
[0081] As illustrated, a logic component 808 can be employed to
effect the evaluation and/or interpretation of the input. In
aspects, logic component 808 can include rules-based and/or
inference-based (e.g., machine learning and reasoning (MLR)) logic.
This logic essentially enables the multi-modal input to be
interpreted or construed to align with the intent of the raw input
(or portions thereof).
[0082] As stated above, the innovation can employ MLR which
facilitates automating one or more features in accordance with the
subject specification. The subject innovation (e.g., in connection
with input interpretation or query generation) can employ various
MLR-based schemes for carrying out various aspects thereof. For
example, a process for determining an intention or interpretation
based upon a speech input can be facilitated via an automatic
classifier system and process.
[0083] A classifier is a function that maps an input attribute
vector, x=(x1, x2, x3, x4, xn), to a confidence that the input
belongs to a class, that is, f(x)=confidence(class). Such
classification can employ a probabilistic and/or statistical-based
analysis (e.g., factoring into the analysis utilities and costs) to
prognose or infer an action that a user desires to be automatically
performed.
[0084] A support vector machine (SVM) is an example of a classifier
that can be employed. The SVM operates by finding a hypersurface in
the space of possible inputs, which the hypersurface attempts to
split the triggering criteria from the non-triggering events.
Intuitively, this makes the classification correct for testing data
that is near, but not identical to training data. Other directed
and undirected model classification approaches include, e.g., naive
Bayes, Bayesian networks, decision trees, neural networks, fuzzy
logic models, and probabilistic classification models providing
different patterns of independence can be employed. Classification
as used herein also is inclusive of statistical regression that is
utilized to develop models of priority.
[0085] As will be readily appreciated from the subject
specification, the subject innovation can employ classifiers that
are explicitly trained (e.g., via a generic training data) as well
as implicitly trained (e.g., via observing user behavior, receiving
extrinsic information). For example, SVM's are configured via a
learning or training phase within a classifier constructor and
feature selection module. Thus, the classifier(s) can be used to
automatically learn and perform a number of functions, including
but not limited to determining according to a predetermined
criteria how to interpret an input, how to establish a query,
etc.
[0086] Below, user scenarios are highlighted that demonstrate two
concepts: first, tight coupling of speech with touch and text, so
that whenever one of the three modalities fails or becomes
burdensome, users may switch to another modality in a complementary
way; second, leveraging of most any partial knowledge a user may
have about the constituent words of their intended query.
[0087] Turning to a discussion of query generation using a word
palette. In accordance with the innovation, a user can select words
by way of a touch screen thereby establishing a search query.
Additionally, the selected words can be chosen (or otherwise
identified) for inclusion or alternatively, exclusion, from a set
of search results. In other words, a selection can be used as a
filter to screen out results that contain a particular word or set
of words. Moreover, a selection can be supplemented with speech (or
other modality) thereby enhancing the searching capability of the
innovation. While many of the examples described herein are
directed to selection of words from an n-best list, it is to be
understood that the innovation can treat most any display rendering
as a bag of words thereby enabling selection to enhance
comprehensive searching and query construction.
[0088] As stated supra, the innovation can support query generation
via multi-modal input by combining speech with text hints. Just in
the way that users can resort to touch and text when speech fails,
they can also resort to speech whenever typing becomes burdensome,
or when they feel they have provided enough text hints for the
recognizer to identify their query.
[0089] In an example, the user starts typing "m" for the intended
query "mill creek family practice," but because the query is too
long to type, the user utters the intended query after pressing a
trigger or specific functional soft key button. After the query
returns from the backend, all choices in the list now start with an
"m" and indeed include the user utterance may be displayed.
[0090] The innovation can achieve this functionality by first
converting the text hint in the textbox into a wildcard query and
then using that to filter the n-best list as well as to retrieve
additional matches from the RegEx engine. In principle, the
innovation acknowledges that the query should be used to bias the
recognition of the utterance in the speech engine itself.
[0091] Referring now to FIG. 9, there is illustrated a block
diagram of a computer operable to execute the disclosed
architecture. In order to provide additional context for various
aspects of the subject innovation, FIG. 9 and the following
discussion are intended to provide a brief, general description of
a suitable computing environment 900 in which the various aspects
of the innovation can be implemented. While the innovation has been
described above in the general context of computer-executable
instructions that may run on one or more computers, those skilled
in the art will recognize that the innovation also can be
implemented in combination with other program modules and/or as a
combination of hardware and software.
[0092] Generally, program modules include routines, programs,
components, data structures, etc., that perform particular tasks or
implement particular abstract data types. Moreover, those skilled
in the art will appreciate that the inventive methods can be
practiced with other computer system configurations, including
single-processor or multiprocessor computer systems, minicomputers,
mainframe computers, as well as personal computers, hand-held
computing devices, microprocessor-based or programmable consumer
electronics, and the like, each of which can be operatively coupled
to one or more associated devices.
[0093] The illustrated aspects of the innovation may also be
practiced in distributed computing environments where certain tasks
are performed by remote processing devices that are linked through
a communications network. In a distributed computing environment,
program modules can be located in both local and remote memory
storage devices.
[0094] A computer typically includes a variety of computer-readable
media. Computer-readable media can be any available media that can
be accessed by the computer and includes both volatile and
nonvolatile media, removable and non-removable media. By way of
example, and not limitation, computer-readable media can comprise
computer storage media and communication media. Computer storage
media includes both volatile and nonvolatile, removable and
non-removable media implemented in any method or technology for
storage of information such as computer-readable instructions, data
structures, program modules or other data. Computer storage media
includes, but is not limited to, RAM, ROM, EEPROM, flash memory or
other memory technology, CD-ROM, digital versatile disk (DVD) or
other optical disk storage, magnetic cassettes, magnetic tape,
magnetic disk storage or other magnetic storage devices, or any
other medium which can be used to store the desired information and
which can be accessed by the computer.
[0095] Communication media typically embodies computer-readable
instructions, data structures, program modules or other data in a
modulated data signal such as a carrier wave or other transport
mechanism, and includes any information delivery media. The term
"modulated data signal" means a signal that has one or more of its
characteristics set or changed in such a manner as to encode
information in the signal. By way of example, and not limitation,
communication media includes wired media such as a wired network or
direct-wired connection, and wireless media such as acoustic, RF,
infrared and other wireless media. Combinations of the any of the
above should also be included within the scope of computer-readable
media.
[0096] With reference again to FIG. 9, the exemplary environment
900 for implementing various aspects of the innovation includes a
computer 902, the computer 902 including a processing unit 904, a
system memory 906 and a system bus 908. The system bus 908 couples
system components including, but not limited to, the system memory
906 to the processing unit 904. The processing unit 904 can be any
of various commercially available processors. Dual microprocessors
and other multi-processor architectures may also be employed as the
processing unit 904.
[0097] The system bus 908 can be any of several types of bus
structure that may further interconnect to a memory bus (with or
without a memory controller), a peripheral bus, and a local bus
using any of a variety of commercially available bus architectures.
The system memory 906 includes read-only memory (ROM) 910 and
random access memory (RAM) 912. A basic input/output system (BIOS)
is stored in a non-volatile memory 910 such as ROM, EPROM, EEPROM,
which BIOS contains the basic routines that help to transfer
information between elements within the computer 902, such as
during start-up. The RAM 912 can also include a high-speed RAM such
as static RAM for caching data.
[0098] The computer 902 further includes an internal hard disk
drive (HDD) 914 (e.g., EIDE, SATA), which internal hard disk drive
914 may also be configured for external use in a suitable chassis
(not shown), a magnetic floppy disk drive (FDD) 916, (e.g., to read
from or write to a removable diskette 918) and an optical disk
drive 920, (e.g., reading a CD-ROM disk 922 or, to read from or
write to other high capacity optical media such as the DVD). The
hard disk drive 914, magnetic disk drive 916 and optical disk drive
920 can be connected to the system bus 908 by a hard disk drive
interface 924, a magnetic disk drive interface 926 and an optical
drive interface 928, respectively. The interface 924 for external
drive implementations includes at least one or both of Universal
Serial Bus (USB) and IEEE 1394 interface technologies. Other
external drive connection technologies are within contemplation of
the subject innovation.
[0099] The drives and their associated computer-readable media
provide nonvolatile storage of data, data structures,
computer-executable instructions, and so forth. For the computer
902, the drives and media accommodate the storage of any data in a
suitable digital format. Although the description of
computer-readable media above refers to a HDD, a removable magnetic
diskette, and a removable optical media such as a CD or DVD, it
should be appreciated by those skilled in the art that other types
of media which are readable by a computer, such as zip drives,
magnetic cassettes, flash memory cards, cartridges, and the like,
may also be used in the exemplary operating environment, and
further, that any such media may contain computer-executable
instructions for performing the methods of the innovation.
[0100] A number of program modules can be stored in the drives and
RAM 912, including an operating system 930, one or more application
programs 932, other program modules 934 and program data 936. All
or portions of the operating system, applications, modules, and/or
data can also be cached in the RAM 912. It is appreciated that the
innovation can be implemented with various commercially available
operating systems or combinations of operating systems.
[0101] A user can enter commands and information into the computer
902 through one or more wired/wireless input devices, e.g., a
keyboard 938 and a pointing device, such as a mouse 940. Other
input devices (not shown) may include a microphone, an IR remote
control, a joystick, a game pad, a stylus pen, touch screen, or the
like. These and other input devices are often connected to the
processing unit 904 through an input device interface 942 that is
coupled to the system bus 908, but can be connected by other
interfaces, such as a parallel port, an IEEE 1394 serial port, a
game port, a USB port, an IR interface, etc.
[0102] A monitor 944 or other type of display device is also
connected to the system bus 908 via an interface, such as a video
adapter 946. In addition to the monitor 944, a computer typically
includes other peripheral output devices (not shown), such as
speakers, printers, etc.
[0103] The computer 902 may operate in a networked environment
using logical connections via wired and/or wireless communications
to one or more remote computers, such as a remote computer(s) 948.
The remote computer(s) 948 can be a workstation, a server computer,
a router, a personal computer, portable computer,
microprocessor-based entertainment appliance, a peer device or
other common network node, and typically includes many or all of
the elements described relative to the computer 902, although, for
purposes of brevity, only a memory/storage device 950 is
illustrated. The logical connections depicted include
wired/wireless connectivity to a local area network (LAN) 952
and/or larger networks, e.g., a wide area network (WAN) 954. Such
LAN and WAN networking environments are commonplace in offices and
companies, and facilitate enterprise-wide computer networks, such
as intranets, all of which may connect to a global communications
network, e.g., the Internet.
[0104] When used in a LAN networking environment, the computer 902
is connected to the local network 952 through a wired and/or
wireless communication network interface or adapter 956. The
adapter 956 may facilitate wired or wireless communication to the
LAN 952, which may also include a wireless access point disposed
thereon for communicating with the wireless adapter 956.
[0105] When used in a WAN networking environment, the computer 902
can include a modem 958, or is connected to a communications server
on the WAN 954, or has other means for establishing communications
over the WAN 954, such as by way of the Internet. The modem 958,
which can be internal or external and a wired or wireless device,
is connected to the system bus 908 via the serial port interface
942. In a networked environment, program modules depicted relative
to the computer 902, or portions thereof, can be stored in the
remote memory/storage device 950. It will be appreciated that the
network connections shown are exemplary and other means of
establishing a communications link between the computers can be
used.
[0106] The computer 902 is operable to communicate with any
wireless devices or entities operatively disposed in wireless
communication, e.g., a printer, scanner, desktop and/or portable
computer, portable data assistant, communications satellite, any
piece of equipment or location associated with a wirelessly
detectable tag (e.g., a kiosk, news stand, restroom), and
telephone. This includes at least Wi-Fi and Bluetooth.TM. wireless
technologies. Thus, the communication can be a predefined structure
as with a conventional network or simply an ad hoc communication
between at least two devices.
[0107] Wi-Fi, or Wireless Fidelity, allows connection to the
Internet from a couch at home, a bed in a hotel room, or a
conference room at work, without wires. Wi-Fi is a wireless
technology similar to that used in a cell phone that enables such
devices, e.g., computers, to send and receive data indoors and out;
anywhere within the range of a base station. Wi-Fi networks use
radio technologies called IEEE 802.11 (a, b, g, etc.) to provide
secure, reliable, fast wireless connectivity. A Wi-Fi network can
be used to connect computers to each other, to the Internet, and to
wired networks (which use IEEE 802.3 or Ethernet). Wi-Fi networks
operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps
(802.11a) or 54 Mbps (802.11b) data rate, for example, or with
products that contain both bands (dual band), so the networks can
provide real-world performance similar to the basic 10 BaseT wired
Ethernet networks used in many offices.
[0108] Referring now to FIG. 10, there is illustrated a schematic
block diagram of an exemplary computing environment 1000 in
accordance with the subject innovation. The system 1000 includes
one or more client(s) 1002. The client(s) 1002 can be hardware
and/or software (e.g., threads, processes, computing devices). The
client(s) 1002 can house cookie(s) and/or associated contextual
information by employing the innovation, for example.
[0109] The system 1000 also includes one or more server(s) 1004.
The server(s) 1004 can also be hardware and/or software (e.g.,
threads, processes, computing devices). The servers 1004 can house
threads to perform transformations by employing the innovation, for
example. One possible communication between a client 1002 and a
server 1004 can be in the form of a data packet adapted to be
transmitted between two or more computer processes. The data packet
may include a cookie and/or associated contextual information, for
example. The system 1000 includes a communication framework 1006
(e.g., a global communication network such as the Internet) that
can be employed to facilitate communications between the client(s)
1002 and the server(s) 1004.
[0110] Communications can be facilitated via a wired (including
optical fiber) and/or wireless technology. The client(s) 1002 are
operatively connected to one or more client data store(s) 1008 that
can be employed to store information local to the client(s) 1002
(e.g., cookie(s) and/or associated contextual information).
Similarly, the server(s) 1004 are operatively connected to one or
more server data store(s) 1010 that can be employed to store
information local to the servers 1004.
[0111] What has been described above includes examples of the
innovation. It is, of course, not possible to describe every
conceivable combination of components or methodologies for purposes
of describing the subject innovation, but one of ordinary skill in
the art may recognize that many further combinations and
permutations of the innovation are possible. Accordingly, the
innovation is intended to embrace all such alterations,
modifications and variations that fall within the spirit and scope
of the appended claims. Furthermore, to the extent that the term
"includes" is used in either the detailed description or the
claims, such term is intended to be inclusive in a manner similar
to the term "comprising" as "comprising" is interpreted when
employed as a transitional word in a claim.
* * * * *