U.S. patent application number 11/838943 was filed with the patent office on 2007-11-29 for marking and annotating electronic documents.
Invention is credited to Mukul Madhular Joshi, Mukesh Kumar Mohania.
Application Number | 20070277093 11/838943 |
Document ID | / |
Family ID | 35944914 |
Filed Date | 2007-11-29 |
United States Patent
Application |
20070277093 |
Kind Code |
A1 |
Joshi; Mukul Madhular ; et
al. |
November 29, 2007 |
MARKING AND ANNOTATING ELECTRONIC DOCUMENTS
Abstract
A user can highlight text and provide accompanying annotations.
Highlighted text, accompanying annotations, and time-stamp
information are stored in a user profile that is maintained locally
with a web browser, at the client side. A retrieved web page is
presented to a user with annotations of some form, based upon the
user profile. The retrieved web page may typically be annotated
through marked or highlighted portions of text, so that the user
can readily locate this information in the web page, and assess the
relevance of the retrieved page.
Inventors: |
Joshi; Mukul Madhular;
(Pune, IN) ; Mohania; Mukesh Kumar; (new Delhi,
IN) |
Correspondence
Address: |
Frederick W. Gibb, III;McGinn & Gibb, PLLC
Suite 304
2568-A Riva Road
Annapolis
MD
21401
US
|
Family ID: |
35944914 |
Appl. No.: |
11/838943 |
Filed: |
August 15, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10924447 |
Aug 24, 2004 |
|
|
|
11838943 |
Aug 15, 2007 |
|
|
|
Current U.S.
Class: |
715/230 |
Current CPC
Class: |
G06F 40/169
20200101 |
Class at
Publication: |
715/512 |
International
Class: |
G06F 15/00 20060101
G06F015/00 |
Claims
1. A method for marking and annotating an electronic document, said
method comprising: at a client side, a user highlighting text, and
providing comments accompanying the highlighted text in order to
provide a user context of said highlighted text, relating to user
interests in a first electronic document; said user annotating said
text relating to said user interests, wherein said annotating
comprises using said user context of said highlighted text to
distinguish between different possible meanings of said highlighted
text; storing the annotations, the highlighted text, and the
accompanying comments at said client side as a user profile based
upon said annotations; said user receiving a second electronic
document; comparing said annotations and said highlighted text with
text given in said second electronic document; deleting said
annotations and said highlighted text from memory on said client
side upon an expiry of a predetermined amount of time; and
automatically displaying said text relating to said user interests
and that are relevant to said user context in said second
electronic document based upon said user profile.
2. The method of claim 1, further comprising determining user
interface events that indicate said annotations.
3. The method of claim 1, further comprising determining text that
relates to said annotations.
4. The method of claim 1, further comprising inserting a
computerized tag on said annotations.
5. The method of claim 4, wherein said computerized tag is viewable
to said user.
6. The method of claim 5, wherein said computerized tag is viewable
to said user upon a computer mouse rolling over said
annotations.
7. The method of claim 1, further comprising: assigning a unique
combination of colors for each annotation appearing in said first
electronic document; and corresponding said unique combination of
colors with said text in said second electronic document.
8. A computer program product for annotating an electronic document
comprising computer software recorded on a computer-readable medium
for performing a method for marking and annotating an electronic
document, said method comprising: at a client side, a user
highlighting text, and providing comments accompanying the
highlighted text in order to provide a user context of said
highlighted text, relating to user interests in a first electronic
document; said user annotating said text relating to said user
interests, wherein said annotating comprises using said user
context of said highlighted text to distinguish between different
possible meanings of said highlighted text; storing the
annotations, the highlighted text, and the accompanying comments at
said client side as a user profile based upon said annotations;
said user receiving a second electronic document; comparing said
annotations and said highlighted text with text given in said
second electronic document; deleting said annotations and said
highlighted text from memory on said client side upon an expiry of
a predetermined amount of time; and automatically displaying said
text relating to said user interests and that are relevant to said
user context in said second electronic document based upon said
user profile.
9. The computer program product of claim 8, wherein said method
further comprises determining user interface events that indicate
said annotations.
10. The computer program product of claim 8, wherein said method
further comprises determining text that relates to said
annotations.
11. The computer program product of claim 8, wherein said method
further comprises inserting a computerized tag on said
annotations.
12. The computer program product of claim 11, wherein said
computerized tag is viewable to said user.
13. The computer program product of claim 12, wherein said
computerized tag is viewable to said user upon a computer mouse
rolling over said annotations.
14. The computer program product of claim 8, wherein said method
further comprises: assigning a unique combination of colors for
each annotation appearing in said first electronic document; and
corresponding said unique combination of colors with said text in
said second electronic document.
15. The computer system of claim 8, wherein said computerized tag
is viewable to said user.
16. The computer system of claim 15, wherein said computerized tag
is viewable to said user upon a computer mouse rolling over said
annotations.
17. A computer system for annotating electronic documents
comprising computer software recorded on a computer-readable
medium, said computer system comprising: a client side computer
adapted to allow a user to highlight text, and provide comments
accompanying the highlighted text in order to provide a user
context of said highlighted text, relating to user interests in a
first electronic document; an annotator adapted to allow said user
to (i) annotate said text relating to said user interests, and (ii)
use said user context of said highlighted text to distinguish
between different possible meanings of said highlighted text; a
profile manager adapted to store the annotations, the highlighted
text, and the accompanying comments at said client side as a user
profile based upon said annotations, wherein said profile manager
is further adapted to delete said annotations and said highlighted
text from memory on said client side computer upon an expiry of a
predetermined amount of time; a second electronic document received
by said user; a pattern locator adapted to compare said annotations
and said highlighted text with text given in said second electronic
document; a text marker adapted to automatically display said text
relating to said user interests and that are relevant to said user
context in said second electronic document based upon said user
profile.
18. The computer system of claim 17, further comprising a user
interface window adapted to determine user interface events that
indicate said annotations.
19. The computer system of claim 17, further comprising a page
composer adapted to insert a computerized tag on said
annotations.
20. The computer system of claim 17, further comprising a page
composer adapted to: assign a unique combination of colors for each
annotation appearing in said first electronic document; and
correspond said unique combination of colors with said text in said
second electronic document.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 10/924,447 filed Aug. 24, 2004, the complete disclosure of
which, in its entirety, is herein incorporated by reference.
FIELD OF THE INVENTION
[0002] The present invention relates to marking and annotating
electronic documents, such as Web pages, based on a user's
highlighted preferences history.
BACKGROUND
[0003] Web personalization involves tailoring Web content directly
to a specific user. This can be accomplished by having the user
provide information to the Web site directly, or through tracking
of the user's behavior on the site. The software on the Web site
then can modify the content to suit the particular user's needs.
That is, all the personalization is done at the Web site.
[0004] Typically, a Web site maintains profiles of the users that
visits the site, and analyzes the information gathered. Based on
this analysis, information of interest to each user is
delivered.
[0005] Explicit or implicit profiling techniques can be used to
collect user information, either alone or in combination. Explicit
profiling involves asking each user to complete a questionnaire or
similar, while implicit profiling involves tracking the behavior of
each user, and drawing inferences from such observed behavior.
[0006] One form of implicit profiling involves the use of "cookies"
that are stored at the browser and updated at each visit, and
record browsing patterns.
[0007] To present appropriate content to the user and make proper
recommendation, rule-based techniques or filtering techniques can
be used. Filtering techniques may involve simple filtering,
content-based filtering and collaborative filtering.
Collaborative-filtering software compares the information gained
about one user's behavior against data about other users with
similar interests.
[0008] None of the techniques described above are entirely
satisfactory. Consequently, techniques are sought that have
application in navigating electronic content.
SUMMARY
[0009] A user's interest in a recently viewed web page can be
determined automatically from that user's highlighted text and
annotation history profiled from the previously viewed web pages.
Such annotations typically constitute marked or highlighted
portions of text, accompanying comments, or other forms of
annotation. This annotation information is maintained in a user
profile at the client side. A retrieved web page is presented to a
user with annotations of some form, based upon the viewer's user
profile. The retrieved web page may typically be annotated through
marked or highlighted portions of text, so that the user can
readily locate this information in the web page, and assess the
relevance of the retrieved page.
[0010] A context for the highlighted information is obtained by
annotating the text, and can be presented to the user along with
the text. A web page presented to the user is marked to indicate
the information of interest. When a user rolls the mouse over this
text, the annotation is shown at the mouse position. This is the
context that applies to the text, which indicates to the user the
broad topic to which the marked text relates. Ontology can be
represented in any form, and can be stored as a database, and
represents relationships between words. A word-net can be used to
enhance this gathered information. Information concerning the
user's interest can be presented to the user without the need for
server-side processing.
[0011] Highlighted text, accompanying comments, and time-stamp
information are stored in a user profile that is maintained locally
with the browser, at the client side. The user profile is updated
as the user visits new pages and annotates these web pages. When a
user accesses a new web page, text in this page that is, for
example, similar to the text stored in the user profile, is
automatically marked. Other annotations can also be assigned. Since
the marking occurs at the client side, profiles can be shared and
used across different sites the user visits.
DESCRIPTION OF DRAWINGS
[0012] FIG. 1 is a schematic representation of a system
architecture for annotating electronic documents.
[0013] FIG. 2 is a schematic representation of components of a
preference-enable text marker.
[0014] FIG. 3 is a schematic representation of a computer interface
having a dialog box for prompting a user to annotate text.
[0015] FIG. 4 is Javascript code that can be added to a Web page to
faciliate annotation of the highlighted text on the Web page.
[0016] FIG. 5 is a schematic representation of a computer interface
indicating information of interest to a user.
[0017] FIG. 6 is a schematic representation of a computer system
suitable for operating the described computer interfaces.
DETAILED DESCRIPTION
[0018] FIG. 1 schematically represents an architecture of the
described system. A Web Browser 100 is enabled to create and update
a User Profile 130 based on highlighted and annotated text of
previous web pages. The Web Browser 100 then marks the text in the
retrieved Web Page 150 based on the User Profile 130. The marked
Web Page 150 is presented to the user by the Web Browser 100.
[0019] The system architecture of FIG. 1 has five components that
supplement the Web Browser 100, namely, User Manager 110, Event
Listener and Text Extractor 112, Annotator 114, Profile Manager 116
and Preference Enabled Text Marker 118 that supplement the
functionality of a conventional web browser. Each of these
components is described in turn below. FIG. 2 schematically
represents components of the Preference Enabled Text Marker 118,
described in further detail below.
[0020] User Manager: The main function of the User Manager 110 is
to identify the "userid" of the "logged on" user. The User Manager
110 also creates a User Profile 130, if a User Profile 130 does not
already exist. Each user has their own User Profile 130 that stores
their preferences. The User Manager 110 maintains the privacy of
each user. The security and privacy of the User Profile 130 can be
maintained using the file system user privileges provided by the
operating system. If the operating system does not provide a way to
identify the user, then the system can maintain a single User
Profile 130 that is common to all the users of that particular
instance of the Web Browser 100.
[0021] Event Listener and Text Extractor: The Event Listener and
Text Extractor 112 listens to the "mouse-dragged" event. This
operating system event gets fired whenever a user highlights
particular text in a web page. When the event is fired the Event
Listener and Text Extractor 112 extracts the highlighted text from
the web page and sends the text string to the Annotator 114, which
is described directly below.
[0022] Annotator: The Annotator 114 enables the user to annotate
the user's highlighted text. This allows the user to associate the
context with the highlighted text. When the user annotates a text,
the user's annotating comments may either summarize the text or
disambiguate the text. This is what meant by giving a context to
the text of interest to the user. For example, a particular text
about networking might be of interest to one user because the text
is about wireless networking, for another user the text may be
interesting since the text is about security. So when the user
annotates the text with words "network security", this is the
context in which the user looking for further information.
Therefore, a particular text might be relevant within different
contexts, i.e., the text can have different meanings depending upon
how the text is used. Thus, annotation helps the system to
distinguish between different possible meanings of the text. The
second use of annotation is to summarize a text. For example, the
user visits page "A" and highlights the text "optimized to manage
large collections of smaller objects such as statements and reports
and checks" appearing on page "A". The system does not in his case
receive any information concerning what the information is about.
Annotator 114, however, allow the user to mark this information
with the annotation of "content manager". Now the system can make
use of the annotation to find the information of the user interest
in the pages that talk about "content manager". Once the
highlighted text is annotated, if any, then the Annotator 114
passes the highlighted text along with accompanying annotation to
the Profile Manager 116.
[0023] Profile Manager: Profile Manager 116 receives the annotated
highlighted text from the Annotator 114. Profile Manager 116
obtains the file system location of the User Profile 130 from the
User Manager 110. Profile Manager 116 then stores this highlighted
text along with the associated annotation in the User Profile 130.
Table 1 below presents the format of the User Profile 130.
TABLE-US-00001 TABLE 1 Format of User Profile 130 Timestamp of
Timestamp of Highlighted Text \Annotation highlighting Expiry . . .
. . . . . . . . . . . . . . . . . . . . . . . .
[0024] As shown in Table 1 above, the User Profile 130 stores the
time when the user highlighted and annotated the relevant text. The
fourth row indicates the "life" of each entry in the User Profile
130 (that is, each highlighted text and its annotation). An "Expiry
date" can be used to avoid maintaining the history beyond certain
time.
[0025] The user can set a system parameter that controls how much
past history is considered when marking a Web Page 150. The Profile
Manager 116 uses this parameter to compute the Timestamp of Expiry
for the User Profile 130 entry. Suppose the user sets the
parameters to indicate that the user is interested in keeping the
history for 30 days. If an entry is made in the user profile, for
page A on 1.sup.st Jan 12 p.m. 30 days time is added to the time
when the entry is inserted. This is the time of expiry. Time of
expiry is 31.sup.st Jan 12.00 pm. Now, on 10.sup.th Jan the user
again changes the parameter to contain 20 days of history. The time
of expiry for page A is updated by adding 20 days to timestamp of
highlighting. Alternatively, after highlighting and annotating the
user can be prompted to provide the time duration for which the
information is to persist in the User Profile 130. The expiry time
for the information is then calculated and stored in the User
Profile 130 along with the other information. Profile Manager 116
runs a maintenance algorithm that removes entries in the User
Profile 130 that are expired.
[0026] Preference-Enabled Text Marker: The Preference Enabled Text
Marker 118 receives web pages from the HTTP client 119 in the Web
Browser 100, which in turn retrieves web pages from the Web Server
120. The Preference Enabled Text Marker 118 presents web pages to
the user in such a way that the information is highlighted and
annotated automatically. This highlighting and annotation is based
upon the User Profile 130, which contains the history of the
highlighted text and annotations from the previously browsed pages.
FIG. 2 schematically represents different components of the
Preference Enabled Text Marker 118.
[0027] Various steps performed by the Preference Enabled Text
Marker 118 are now described with reference to the components of
the Preference Enabled Text Marker 118 depicted in FIG. 2. Let LA
be a list of all annotations (List of Annotations) in the User
Profile 130, let T.sub.i be the list of all highlighted text
available in the user profile for annotation .alpha..sub.i, and let
S.sub.i be the list of synonyms of annotation .alpha..sub.i. Table
2 below presents an algorithm performed by the Preference Enabled
Text Marker 118. TABLE-US-00002 TABLE 2 1. A page W retrieved by
the HTTP client 119 from the Web Server 120 is provided to the
Profiler 216. 2. The Profiler 216 then reads the User Profile 130
of that user and retrieves LA 3. For each annotation a.sub.i in the
list LA a. Profiler 216 retrieves a list of all the corresponding
highlighted text entries T.sub.i. b. Profiler 216 queries an
Ontology Plug-in 140 to get S.sub.i. c. Profiler 216 passes
S.sub.i, T.sub.i, a.sub.i and W to the Match Finder 212. d. Match
Finder 212 passes T.sub.i and W to Pattern Locator 210. e. Pattern
Locator 210 finds the position of each text element of T.sub.i in
W, and returns back a list denoted by P.sub.i of position pairs
<b.sub.i, e.sub.i> providing beginning and ending positions
of the sentence in the retrieved web page in which the strings were
approximately matched. f. Match Finder 212 stores this list P.sub.i
and then passes S.sub.i, a.sub.i and W to the Pattern Locator 210.
g. Pattern Locator 210 performs exact string matching for each of
the string in S.sub.i and a.sub.i in W. Pattern Locator 210 returns
back a list denoted P.sub.j of position pairs <b.sub.i,
e.sub.i> of the beginning and ending positions of the sentence
in the retrieved web page in which the strings were exactly
matched. h. Match Finder 212 now merges P.sub.i and P.sub.j and
removes duplicates, if there are any. For each entry <b.sub.i,
e.sub.i> in this merged list, Match Finder 212 augments the
annotation a.sub.i and stores the resulting triplet <b.sub.i,
e.sub.i, a.sub.i> in pattern list LP. i. Match Finder 212 sends
a signal to Profiler 216 that Match Finder 212 has updated the
pattern list for annotation a.sub.i. 4. Profiler 216 sends a signal
to Match Finder 212 that all the annotations have been processed
and sends it W and LA. 5. Match Finder 212 then sends W, LA and
pattern list LP to the Page Composer 214. 6. Page Composer 214
performs the following steps a. For each a.sub.i in LA. Page
Composer 214 assigns a unique combination of foreground and
background colors. b. For each triplet <b.sub.i, e.sub.i,
a.sub.i> in LP, the Page Composer 214 obtains the starting
position b.sub.i and ending position e.sub.i of the sentence in the
Web page and then inserts Hypertext Markup Language (HTML) tags at
the starting and ending position so that the text of the sentence
appears in bold with the foreground and background colors
corresponding to a.sub.i. This operation performs the marking for
the text matching with the user's preferences. Also Page Composer
214 inserts a special tag so that the annotation a.sub.i is shown
as a "tip" when user rolls the mouse over the sentence text. 7. The
Page Composer 214 then presents this modified page to the user.
[0028] The Pattern Locator 210 used by the Preference Enabled Text
Marker 118 uses a module to perform approximate string matching in
step 3e of Table 2, using any suitable approximate string matching
algorithm. A suitable algorithm is described in Cole, R.,
Hariharan, R., "Approximate String Matching: A simpler faster
algorithm", SIAM Journal on Computing, Volume 31, Number 6, pages
1761-1782, 2002, the content of which is hereby incorporated by
reference.
[0029] FIG. 3 schematically represents a typical user experience
while using the system. A user interface window 310 displays text
340. When a user highlights a portion of text 350, Event Listener
and Text Extractor 112 is activated and extracts the highlighted
text, which is passed to the Annotator 114. Annotator 114 then
prompts the user to provide an accompanying comment 330 for the
highlighted text 350 using a dialog box 320.
Web Browser Implementations
[0030] A web browser having the functionality described herein can
be constructed by adding appropriate components to a conventional
browser. The browser needs to read the User Profile 130, which is
created by the user. The user appropriately creates the User
Profile 130 in the right (system) directory structure with the
right schema so that the browser can read the User Profile 130, and
take appropriate action in marking and annotating documents
automatically.
[0031] Alternatively, an implementation can be achieved without
adding components to a Web browser, but by achieving equivalent
functionality using code embedded in the actual Web pages. FIG. 4
presents Javascript code, which is interpreted by compatible
browsers, and which can be used for this purpose.
[0032] The web page is downloaded, and the User Manager 110 is
invoked to identify the user and the appropriate user profile
location. Javascript code can be added to the web page to provide
the simulation for the Event Listener and Text Extractor 112,
Annotator 114 and Profile Manager 116. The "Preference Enabled Text
Marker" algorithm described above is then applied to the page, and
the page is presented to the user through a web browser.
[0033] A maintenance algorithm, which removes entries in the User
Profile 130, is activated by the Profile Manager 116 and runs as a
daemon in the background. To understand the working of the
simulator, assume that the user "xyz" starts using the simulated
system for the first time. The User Manager 110 identifies the user
and creates the User Profile 130. Initially, the User Profile 130
is empty. If the user wants to browse the page www.abc.com the
browser downloads the relevant page. The Javascript code of FIG. 4
is added to the downloaded page, either by including appropriate
Javascript to the downloaded webpage, or by using a suitable
browser plugin for the browser.
[0034] Since the user is using the system for the first time, the
code for the "Preference Enabled Text Marker" presents the page to
the user without alteration. When the user highlights and annotates
information in this presented page, these annotations are stored in
the User Profile 130. When the user sends a request to the
simulator to browse another page, the same steps as mentioned above
are carried out for this requested page. When the page is passed to
Preference Enabled Text Marker 118, this page is passed to its
various components and the simulator presents the final composed
page to the user.
Annotations to Browsed Documents
[0035] FIG. 5 represents a page in which annotations are made based
upon a user profile. In this example, when the user brings the
mouse on top of the first line of text, a entry "Data Warehouse" is
displayed as the highlighted line is annotated by "Data Warehouse"
in FIG. 3, as recorded in the User Profile 130. In FIG. 3, the user
annotates this same text, which is associated with this annotation
in the User Profile 130. When a new page is fetched, the User
Profile 130 is automatically applied on the fetched page and the
text is automatically highlighted and annotated.
Computer Hardware
[0036] FIG. 6 is a schematic representation of a computer system
600 of a type that is suitable for executing computer software for
annotating electronic documents in the manner described herein.
Computer software executes under a suitable operating system
installed on the computer system 600, and may be thought of as
comprising various software code means for achieving particular
steps.
[0037] The components of the computer system 600 include a computer
620, a keyboard 610 and mouse 615, and a video display 690. The
computer 620 includes a processor 640, a memory 650, input/output
(I/O) interfaces 660, 665, a video interface 645, and a storage
device 655.
[0038] The processor 640 is a central processing unit (CPU) that
executes the operating system and the computer software executing
under the operating system. The memory 650 includes random access
memory (RAM) and read-only memory (ROM), and is used under
direction of the processor 640.
[0039] The video interface 645 is connected to video display 690
and provides video signals for display on the video display 690.
User input to operate the computer 620 is provided from the
keyboard 610 and mouse 615. The storage device 655 can include a
disk drive or any other suitable storage medium.
[0040] Each of the components of the computer 620 is connected to
an internal bus 630 that includes data, address, and control buses,
to allow components of the computer 620 to communicate with each
other via the bus 630.
[0041] The computer system 600 can be connected to one or more
other similar computers via a input/output (I/O) interface 665
using a communication channel 685 to a network, represented as the
Internet 680.
[0042] The computer software may be recorded on a portable storage
medium, in which case, the computer software program is accessed by
the computer system 600 from the storage device 655. Alternatively,
the computer software can be accessed directly from the Internet
680 by the computer 620. In either case, a user can interact with
the computer system 600 using the keyboard 610 and mouse 615 to
operate the programmed computer software executing on the computer
620.
[0043] Other configurations or types of computer systems can be
equally well used to execute computer software that assists in
implementing the techniques described herein.
[0044] Various alterations and modifications can be made to the
techniques and arrangements described herein, as would be apparent
to one skilled in the relevant art.
* * * * *
References