U.S. patent application number 13/114780 was filed with the patent office on 2011-11-24 for method and apparatus for correlating multiple cookies as having originated from the same device using device fingerprinting.
Invention is credited to Peter H. HORADAN, Matthew R. Shanahan, Mark B. Upson.
Application Number | 20110288940 13/114780 |
Document ID | / |
Family ID | 44971731 |
Filed Date | 2011-11-24 |
United States Patent
Application |
20110288940 |
Kind Code |
A1 |
HORADAN; Peter H. ; et
al. |
November 24, 2011 |
Method and Apparatus for Correlating Multiple Cookies as Having
Originated from the Same Device Using Device Fingerprinting
Abstract
Information that is useful to distinguish between two or more
computer devices (a "device fingerprint") is collected and stored
in a database with corresponding state-management tokens such as
HTTP cookies. The database is searched for a fingerprint, and if
the fingerprint is found, the corresponding stored token is
delivered to a computer device for the device's use in making
subsequent requests for resources or services.
Inventors: |
HORADAN; Peter H.; (Redmond,
WI) ; Shanahan; Matthew R.; (Seattle, WA) ;
Upson; Mark B.; (Mercer Island, WA) |
Family ID: |
44971731 |
Appl. No.: |
13/114780 |
Filed: |
May 24, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61347734 |
May 24, 2010 |
|
|
|
Current U.S.
Class: |
705/14.69 ;
707/769; 707/E17.108; 709/203; 709/224 |
Current CPC
Class: |
G06Q 30/0273 20130101;
F16K 27/029 20130101; F16K 31/0634 20130101 |
Class at
Publication: |
705/14.69 ;
709/224; 709/203; 707/769; 707/E17.108 |
International
Class: |
G06Q 30/00 20060101
G06Q030/00; G06F 17/30 20060101 G06F017/30; G06F 15/16 20060101
G06F015/16 |
Claims
1. A method for recognizing repeat visitors to a website among a
plurality of visitors to the website, comprising: if a visitor to
the website fails to present a cookie, issuing a new cookie and
collecting device fingerprint information about the visitor's
computer; storing the new cookie and the device fingerprint
information in a database; tracking activities of each tracked
visitor of the plurality of visitors by the tracked visitor's
cookie; and computing a number of unique visitors to the website by
reducing a count of tracked visitors to the website by a number of
cookie-clearing visitors having different cookies but similar
device fingerprints.
2. The method of claim 1, further comprising: offering advertising
impressions on the website to an advertiser at a price computed
based on the number of unique visitors to the website.
3. The method of claim 1, further comprising: soliciting a
discounted advertising rate based on the number of cookie-clearing
visitors.
4. The method of claim 1 wherein collecting device fingerprint
information comprises: transmitting an executable program to the
visitor's computer; and receiving data collected by the executable
program about the visitor's computer.
5. The method of claim 1 wherein reducing the count of tracked
visitors comprises: selecting records from the database having
different cookies and similar device fingerprints; and reducing the
count of tracked visitors by a count of the selected records having
distinct device fingerprints.
6. The method of claim 1, further comprising: analyzing the
database to estimate a time at which each cookie-clearing visitor
lost its cookie; and computing an average lifetime of a cookie
based on the estimated times.
7. The method of claim 1, further comprising: identifying unique
visitors who experienced at least one cookie-clearing event; and
filtering the tracked activities of the tracked visitors to remove
tracked activities of unique visitors who were not identified as
having experienced at least one cookie-clearing event.
8. A method comprising: transmitting an executable program to a web
browser at a client computer, the executable program to cause the
web browser to collect information about the client computer;
receiving identifying information about the client computer that
was collected by the executable program; correlating the
identifying information about the client computer with
previously-collected identifying information about a plurality of
computers; and associating a first browser activity sequence linked
with a first persistent activity token with a second browser
activity sequence linked with a second, different persistent
activity token.
9. The method of claim 8, further comprising: before the
transmitting operation, receiving a request from the web browser at
the client computer to retrieve a resource, the request lacking a
persistent activity token; and after the receiving operation,
transmitting a message to cause the web browser to associate the
second, different persistent activity token with a subsequent
request from the web browser.
10. The method of claim 8 wherein the executable program comprises
at least one of a JavaScript program, a Java program or a Flash
program.
11. The method of claim 8 wherein the identifying information
comprises at least one of an operating system version, a browser
software version, a browser plugin list, or a font list.
12. The method of claim 8 wherein the associating operation
produces a plurality of tentative associations, the method further
comprising: collecting distinguishing information about a plurality
of browser activities associated with the second, different
persistent activity token; comparing the distinguishing information
with the first browser activity sequence; and selecting one of the
plurality of tentative associations based on similarity between the
distinguishing information and the first browser activity
sequence.
13. A system comprising: a web server to receive requests from
clients and deliver requested digital content to the clients; a
database to record information about the requests and the clients;
and client correlation means to collect distinguishing information
from the clients and assign unique identifiers to the clients.
14. The system of claim 13 wherein the client correlation means is
to cause a client to collect and transmit information about the
client to the web server.
15. The system of claim 13 wherein the client correlation means is
to transmit an executable program to the client, said executable
program to cause the client to report one of an operating system of
the client, a browser software version of the client, a list of
browser plugins of the client or a list of display fonts of the
client.
16. The system of claim 13, further comprising: an analysis server
to report a synthetic history of client activities, wherein the
synthetic history of at least one client is constructed by
combining a first history of requests associated with a first
unique identifier and a second history of requests associated with
a second, different unique identifier.
17. A computer-readable medium containing instructions to cause a
programmable processor to perform operations comprising: receiving
a device fingerprint from a client computer; locating a similar
device fingerprint in a database; extracting a persistent token
corresponding to the similar device fingerprint in the database;
and transmitting a message to cause the client computer to adopt
the persistent token for a future sequence of requests for digital
resources.
18. The computer-readable medium of claim 17 wherein the device
fingerprint comprises information about the client computer.
19. The computer-readable medium of claim 17 wherein the device
fingerprint comprises information about an Internet Protocol ("IP")
address of the client computer.
20. The computer-readable medium of claim 17 wherein the device
fingerprint comprises an approximate geographic location of the
client computer.
21. The computer-readable medium of claim 17 containing additional
instructions to cause the programmable processor to perform
operations comprising: receiving a request from the client
computer; preparing a response to the client computer based on the
request and on historical data in the database keyed to the
persistent token; and transmitting the response to the client
computer.
22. The computer-readable medium of claim 17 containing additional
instructions to cause the programmable processor to perform
operations comprising: comparing a pair of device fingerprints to
estimate a likelihood that the device fingerprints were received
from the same client computer.
23. The computer-readable medium of claim 17 containing additional
instructions to cause the programmable processor to perform
operations comprising: searching the database of device
fingerprints to locate a device fingerprint that is most similar to
the device fingerprint received from the client computer.
Description
CLAIM OF PRIORITY
[0001] This application claims the benefit of U.S. provisional
patent application No. 61/347,734, filed 24 May 2010.
FIELD
[0002] The invention relates to user tracking in online services.
More specifically, the invention relates to techniques for
improving the accuracy of cookie-based tracking schemes.
BACKGROUND
[0003] Those who deliver products or services (or, more generally,
information) over the Internet have a strong interest--financially
and otherwise--in tracking and analyzing visitors, visits, page
views, browsing histories and other characteristics of their
customers. For example, a publisher may provide a content site and
wish to analyze the reach and frequency of advertising delivered to
individual visitors. To do this they must have a reliable and
long-lasting way to recognize repeat visitors. Providers of digital
products or services primarily use HTTP cookies as a tracking
mechanism to determine whether the current visitor is the same
visitor that was seen before, or is a new visitor. (HTTP cookies
are described in detail in Internet Engineering Task Force ("IETF")
Request for Comments ("RFC") documents RFC2965, published October
2000.) A publisher's web infrastructure may build up a significant
amount of interesting information about a visitor over the course
of his many page views. This information is tracked and correlated
to that visitor by the means of a cookie issued to the visitor's
device. A publisher gets great value from the information it is
able to collect about visitors--for example, in estimating user
counts, or in selling advertisements to a targeted market, and so
on--and thus there is considerable value in being able to build a
lasting record of a visitor.
[0004] Unfortunately, cookies are easily and often deleted. When
this happens, all of the collected information about a visitor may
be lost. After cookie deletion, a new cookie will be issued to that
visitor on his next visit, and the process of collecting
information starts again. The system no longer has any way to know
that the current visitor is the same as the previous visitor,
because the original cookie was deleted. Any analysis system
relying on cookies may mistakenly believe that there are two
different visitors (one from before the cookie deletion, and a new
visitor after the cookie deletion)--when in fact these are the same
visitor. This causes errors in analysis--for example in this case
an analytics system would report two unique users, when in fact
there was only one. Significantly increased accuracy of analysis
would be achieved if the system were able to "stitch together"
those two cookies and understand that they both represent the same
visitor.
SUMMARY
[0005] Embodiments of the invention correlate multiple unique
cookies as having originated from the same device using device
fingerprinting. The general method is to collect and record
characteristics of a device (the "device fingerprint") that, taken
together, may uniquely identify the device, or at least narrow the
set of possible devices from which the fingerprint could have come.
The fingerprint is used when a cookie is issued or referenced, and
then later to compare the device fingerprint when cookies are
analyzed to determine if those cookies were originally issued to
the same device. By "stitching together" unique cookies that were
issued to the same device, and conflating them to a single virtual
cookie, analytics and other systems that make use of cookies to
identify users or devices can produce much more accurate
results.
BRIEF DESCRIPTION OF DRAWINGS
[0006] Embodiments of the invention are illustrated by way of
example and not by way of limitation in the figures of the
accompanying drawings in which like references indicate similar
elements. It should be noted that references to "an" or "one"
embodiment in this disclosure are not necessarily to the same
embodiment, and such references mean "at least one."
[0007] FIG. 1 shows a distributed computing environment where an
embodiment of the invention may be deployed.
[0008] FIG. 2 shows how cookies may be used in a traditional
sequence of Hypertext Transfer Protocol ("HTTP") requests.
[0009] FIG. 3 shows how an embodiment of the invention collects and
applies device fingerprint data.
[0010] FIG. 4 shows how an embodiment of the invention operates
when a previously-seen client issues a request without an HTTP
cookie.
[0011] FIG. 5 shows another distributed computing environment where
multiple servers cooperate to preserve information for correlating
client identities.
DETAILED DESCRIPTION
[0012] Embodiments of the present invention are believed to be
superior to prior-art cookie-based content-tracking systems for
several reasons, including: [0013] Not requiring any specialized
software to be deployed to users of the content. [0014] Not relying
on a file that can be deleted by the visitor. [0015] Not requiring
product or service providers to switch to a new form of tracking
(providers can use all of their current cookie-based tools with a
slight modification to call the cookie stitching algorithm at
certain times) [0016] Not requiring product or service providers to
change their data storage schema. [0017] Capable of detecting when
a previous cookie was deleted--which is itself interesting
analytical information. [0018] Capable of correlating the previous
cookie(s) to a newly generated cookie.
[0019] One embodiment of our invention involves the analysis of
users of Internet-based web publication using HTTP cookies. FIG. 1
shows some of the entities and interactions involved in such
publication: a user 100 operates a web browser 110 executing on a
computer 120. The user directs the browser to retrieve some desired
information from a web server 130 executing at a remote computer
140. Communication between computers 120 and 140 may occur over a
distributed data network 150 such as the Internet. As described
below, according to an embodiment of the invention, web server 130
may send an executable program 190 to operate within web browser
110; program 190 may send additional information to web server 130,
or otherwise interact with it.
[0020] Web server 130 also interacts with an analysis server 160
which (in the environment depicted here) is executing on another
computer 170. A database 180 is provided for storing information
used by analysis server 160 to perform its role in the operations
detailed below.
[0021] FIG. 2 is a flow chart outlining some important aspects of
the communication between browser 110 and web server 130 in an
ordinary Hypertext Transfer Protocol ("HTTP") interaction. The
interaction is quite a bit more complicated than this flow chart
suggests, but the details are well known in the art, and are
clearly explained in various IETF RFCs, including RFC2616 and
RFC2965.
[0022] At 210, a browser sends a request to a web server to cause
the server to provide information. If this is the first request
from the browser to the server, the request will not include a
cookie. The server receives the request and prepares an appropriate
response (220). If the request does not include a cookie (230),
then a "Set-Cookie" header will be added to the response (250). The
response is sent back to the browser (260) and presented to the
user (270). The user may cause the browser to make another request
for information (280). This request (and subsequent requests) will
include the cookie, so the server will skip step 250 in preparing
subsequent responses.
[0023] However, on occasion, the browser's cookie may be cleared or
deleted (290), so a subsequent request is again made without a
cookie (210), and the server will prepare a response (220)
including a new Set-Cookie header (250).
[0024] A web server operating according to known, prior-art HTTP
state-maintenance protocols (e.g., cookies) may be unable to
distinguish between two series of requests from two different
browsers that have never visited the server before, and a single
series of requests from one browser, where the single series of
requests is interrupted by a cookie-clearing event.
[0025] FIG. 3 shows a similar browser-web server interaction
according to an embodiment of the invention. As in FIG. 2, the
browser's first request lacks a cookie (310). The web server
replies with the requested material. The response includes a
"Set-Cookie" header and code to cause the browser to collect
information about its computer, environment and user ("device
fingerprint data") (320). This code may be JavaScript, Flash,
Silverlight, or any other appropriate browser-based coding
technology. The browser displays the response (330), sets the
cookie (333) and runs the code to collect device fingerprint data
(336). The device fingerprint data may include information such as
the operating system type and version, screen size, system default
colors, available fonts and browser plug-ins, and other similar
data.
[0026] Next, the browser transmits the cookie and device
information to the web server (340). The web server forwards the
fingerprint data, and other information about the HTTP request
(such as the HTTP headers and source IP address) to an analysis
server (350). The analysis server stores the information for future
use (360) and may reply to the web server that the browser has not
been seen previously (370).
[0027] Subsequent requests from the browser proceed as usual: the
browser sends its assigned cookie, and the web server can correlate
these requests with previous requests, collecting information of
interest to the publisher about the resources the browser
references. The web server may continue to transmit
fingerprint-data collection code, and changes to the device's
fingerprint can be detected and monitored. For example, the user
may install a new display device, so the device fingerprint might
show a different screen resolution or color depth. All this
information can be kept exclusively within the web server, or
shared with the analysis server.
[0028] FIG. 4 shows what happens according to an embodiment of the
invention if the browser loses its cookie: steps 310 through 350
are identical to those shown described in reference to FIG. 3 (only
steps 310 and 350 are shown in FIG. 4; for 320, 330 and 340, refer
to the preceding Figure). However, after the fingerprint, HTTP
header and other data are sent to the analysis server, the
earlier-saved record is located and the "old" cookie is recovered
(410). The analysis server responds to the web server that this is
an existing client (420) (i.e., that it has been seen previously,
and has already had a cookie assigned). The web server may issue a
new response to change the client's cookie to the old value (430),
or may simply note internally that the two different cookies are
associated with only one client (440). In either scenario, the web
server and analysis server continue to collect and analyze data,
and the analysis can include both the information that the cookie
was lost or cleared, and that the two otherwise apparently
unrelated request sequences were issued by a single client.
[0029] The information available to the analysis server in the
device fingerprint and HTTP headers may include some or all of the
following data: [0030] Client IP address [0031] Network connection
details (e.g., proxy server) [0032] Browser software type and
version [0033] Browser plug-in software modules & versions
[0034] Computer operating system type and version [0035] Time zone
[0036] Display pixel dimensions and color depth [0037] Fonts
available Future developments in browser-side code execution
environments may make additional information available to the web
server, and this additional information may be useful to an
embodiment of the invention. For example, the JavaScript language
might be extended so that JavaScript programs can access a list of
Universal Serial Bus ("USB") peripherals that are presently
attached to the client computer, or even a list of all USB
peripherals that have ever been attached to the client computer.
This list may provide an additional basis for distinguishing
between two different computers, so it may be useful information
for an embodiment of the invention to collect and incorporate into
a device fingerprint. Generally speaking, any information that can
be collected under the direction and/or control of the web server,
and that is useful in distinguishing between two or more different
computers, can be employed by an embodiment of the invention as
part of the device fingerprint.
[0038] It is appreciated that the dynamic pattern of interaction
between a browser and a web server also yields identifying
information, and, to the extent that this information is captured
and stored, it may be available to the analysis server to assist in
identifying a client without a cookie, or with a newly-assigned
cookie, as a previously-seen client whose old cookie was associated
with an earlier browsing history. Thus, in some embodiments, the
determination that a client with a newly-initialized browsing
history is actually the same as an earlier client may be emergent,
developing over a series of browsing interactions. Roughly
speaking, the bare fingerprint data may suggest that a "new" client
is the same as an earlier client, but continued browsing activity
may provide an increased level of confidence in the identification.
Alternatively, continued browsing may show that the browser
tentatively identified as the same as an earlier client based on
the fingerprint data, is in fact more likely to be a new client
after all.
[0039] Embodiments of the invention may incorporate a number of
variants to accomplish the goal of correlating multiple series of
web interactions: [0040] The cookie is not an HTTP cookie, but is a
Flash Local Shared Object, or an HTML 5 locally stored object, or
any other technology for storing data in a browser. [0041] The
cookie is not issued by the web server, but is instead issued by an
outside analytics system. The cookie that is put back in place is
the cookie originally issued by the outside analytics system.
[0042] Flexible, selectable and/or plug-in module-based methods for
doing device fingerprint comparison. [0043] The client device
described in the foregoing as a "web browser" is not a traditional
web browser, and the interaction does not happen over the Internet,
but instead it is a series of loosely-correlated interactions
between a client and a server on any distributed system that
attempts to track users.
[0044] Other alternate embodiments include: [0045] A system to
correlate cookies in an advertising-delivery network. [0046] A
system where cookies can be correlated among different providers
(publishers) (see discussion of FIG. 5 below). [0047] A system
where cookies can be correlated across data sets. [0048] A system
to correlate cookies for a Software As A Service ("SaaS") provider.
[0049] system where cookies can be correlated across digital
products and services. [0050] A system where the original cookie is
provided in real-time rather than issuing a new cookie, followed by
a cookie replacement. [0051] A system to create an alternate
persistent ID based on a collection of cookies.
[0052] FIG. 5 shows an environment similar to FIG. 1, but here, two
data publishers cooperate to stitch cookies together into a
unified, reliable client identity. The web browser on client
computer 120 displays information in an on-screen window 110. An
enlarged sample window is shown at 530. The window contains a main
text document 533 and a graphic image 536. The main document was
retrieved from a remote server 140, while the graphic was retrieved
from a different remote server, 550. Servers 140 and 550 exchange
information 560 so that, according to an embodiment of the
invention, even if client computer 120 loses or deletes a cookie
that server 140 had been using to identify the computer, it may not
have lost or deleted a cookie that identified computer 120 to
server 550. Thus, information reported over inter-server channel
560 can be used by server 140 to stitch pre-cookie-loss history for
client 120 together with post-cookie-loss history. In this case,
the main text document 533 from server 140 causes client 120 to
report information about itself to server 550 when it requests
image 536. Server 550 then reports identical or related information
to server 140. In some embodiments, server 550 may be part of an
advertisement delivery network, where ads are delivered for
inclusion with web-page resources from a variety of publishers. The
ad delivery network functions (in part) as a repository of
information about clients, and can provide information to the
publishers that allows the publishers to identify return visitors
who--due to cookie loss or deletion--appear to be unrelated to
prior visitors.
[0053] An embodiment of the invention may be a machine-readable
medium having stored thereon data and instructions to cause a
programmable processor to perform operations as described above. In
other embodiments, the operations might be performed by specific
hardware components that contain hardwired logic. Those operations
might alternatively be performed by any combination of programmed
computer components and custom hardware components.
[0054] Instructions for a programmable processor may be stored in a
form that is directly executable by the processor ("object" or
"executable" form), or the instructions may be stored in a
human-readable text form called "source code" that can be
automatically processed by a development tool commonly known as a
"compiler" to produce executable code. Instructions may also be
specified as a difference or "delta" from a predetermined version
of a basic source code. The delta (also called a "patch") can be
used to prepare instructions to implement an embodiment of the
invention, starting with a commonly-available source code package
that does not contain an embodiment.
[0055] In some embodiments, the instructions for a programmable
processor may be treated as data and used to modulate a carrier
signal, which can subsequently be sent to a remote receiver, where
the signal is demodulated to recover the instructions, and the
instructions are executed to implement the methods of an embodiment
at the remote receiver. In the vernacular, such modulation and
transmission are known as "serving" the instructions, while
receiving and demodulating are often called "downloading." In other
words, one embodiment "serves" (i.e., encodes and sends) the
instructions of an embodiment to a client, often over a distributed
data network like the Internet. The instructions thus transmitted
can be saved on a hard disk or other data storage device at the
receiver to create another embodiment of the invention, meeting the
description of a machine-readable medium storing data and
instructions to perform some of the operations discussed above.
Compiling (if necessary) and executing such an embodiment at the
receiver may result in the receiver performing operations according
to a third embodiment.
[0056] In the preceding description, numerous details were set
forth. It will be apparent, however, to one skilled in the art,
that the present invention may be practiced without some of these
specific details. In some instances, well-known structures and
devices are shown in block diagram form, rather than in detail, in
order to avoid obscuring the present invention.
[0057] Some portions of the detailed descriptions may have been
presented in terms of algorithms and symbolic representations of
operations on data bits within a computer memory. These algorithmic
descriptions and representations are the means used by those
skilled in the data processing arts to most effectively convey the
substance of their work to others skilled in the art. An algorithm
is here, and generally, conceived to be a self-consistent sequence
of steps leading to a desired result. The steps are those requiring
physical manipulations of physical quantities. Usually, though not
necessarily, these quantities take the form of electrical or
magnetic signals capable of being stored, transferred, combined,
compared, and otherwise manipulated. It has proven convenient at
times, principally for reasons of common usage, to refer to these
signals as bits, values, elements, symbols, characters, terms,
numbers, or the like.
[0058] It should be borne in mind, however, that all of these and
similar terms are to be associated with the appropriate physical
quantities and are merely convenient labels applied to these
quantities. Unless specifically stated otherwise as apparent from
the preceding discussion, it is appreciated that throughout the
description, discussions utilizing terms such as "processing" or
"computing" or "calculating" or "determining" or "displaying" or
the like, refer to the action and processes of a computer system or
similar electronic computing device, that manipulates and
transforms data represented as physical (electronic) quantities
within the computer system's registers and memories into other data
similarly represented as physical quantities within the computer
system memories or registers or other such information storage,
transmission or display devices.
[0059] The present invention also relates to apparatus for
performing the operations herein. This apparatus may be specially
constructed for the required purposes, or it may comprise a general
purpose computer selectively activated or reconfigured by a
computer program stored in the computer. Such a computer program
may be stored in a computer readable storage medium, including
without limitation any type of disk including floppy disks, optical
disks, compact disc read-only memory ("CD-ROM"), and
magnetic-optical disks, read-only memories (ROMs), random access
memories (RAMs), erasable, programmable read-only memories
("EPROMs"), electrically-erasable read-only memories ("EEPROMs"),
magnetic or optical cards, or any type of media suitable for
storing computer instructions.
[0060] The algorithms and displays presented herein are not
inherently related to any particular computer or other apparatus.
Various general purpose systems may be used with programs in
accordance with the teachings herein, or it may prove convenient to
construct more specialized apparatus to perform the required method
steps. The required structure for a variety of these systems will
be recited in the claims below. In addition, the present invention
is not described with reference to any particular programming
language. It will be appreciated that a variety of programming
languages may be used to implement the teachings of the invention
as described herein.
[0061] The applications of the present invention have been
described largely by reference to specific examples and in terms of
particular allocations of functionality to certain hardware and/or
software components. However, those of skill in the art will
recognize that client correlation based on device fingerprints can
also be produced by software and hardware that distribute the
functions of embodiments of this invention differently than herein
described. Such variations and implementations are understood to be
captured according to the following claims.
* * * * *