U.S. patent application number 16/470234 was filed with the patent office on 2021-06-10 for map matching and trajectory analysis.
The applicant listed for this patent is Dataspark Pte Ltd. Invention is credited to The Anh DANG, Ying LI, Shixin LUO.
Application Number | 20210172759 16/470234 |
Document ID | / |
Family ID | 1000005460534 |
Filed Date | 2021-06-10 |
United States Patent
Application |
20210172759 |
Kind Code |
A1 |
LI; Ying ; et al. |
June 10, 2021 |
Map Matching and Trajectory Analysis
Abstract
A trajectory may be derived from noisy location data by mapping
candidate locations for a user, then finding a match between
successive locations. Location data may come from various sources,
including telecommunications networks. Telecommunications networks
may give location data based on observations of users in a network,
and such data may have many inaccuracies. The observations may be
mapped to physical constraints, such as roads, pathways, train
lines, and the like, as well as applying physical rules such as
speed analysis to smooth the data and identify outlier data points.
A trajectory may be resampled or interpolated to generate a
detailed set of trajectory points from a sparse and otherwise
ambiguous dataset.
Inventors: |
LI; Ying; (Singapore,
SG) ; DANG; The Anh; (Singapore, SG) ; LUO;
Shixin; (Singapore, SG) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Dataspark Pte Ltd |
Singapore |
|
SG |
|
|
Family ID: |
1000005460534 |
Appl. No.: |
16/470234 |
Filed: |
September 27, 2017 |
PCT Filed: |
September 27, 2017 |
PCT NO: |
PCT/SG2017/050484 |
371 Date: |
June 16, 2019 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G01C 21/343 20130101;
G01C 21/3815 20200801; H04W 4/029 20180201 |
International
Class: |
G01C 21/00 20060101
G01C021/00; H04W 4/029 20060101 H04W004/029; G01C 21/34 20060101
G01C021/34 |
Foreign Application Data
Date |
Code |
Application Number |
Feb 17, 2017 |
IB |
PCT/IB2017/050891 |
Feb 17, 2017 |
IB |
PCTIB2017050892 |
Sep 27, 2017 |
SG |
PCTSG2017050484 |
Sep 27, 2017 |
SG |
PCTSG2017050485 |
Jan 5, 2018 |
SG |
PCTSG2018050006 |
Feb 14, 2018 |
SG |
PCT2018050068 |
Feb 14, 2018 |
SG |
PCTSG2018050070 |
Claims
1. A method performed on at least one computer processor, said
method comprising: receiving a trajectory segment comprising a
sequence of locations for a device, said locations comprising a
timestamp and a coordinate location, said locations further
comprising an accuracy; for each of said sequence of locations,
mapping a plurality of physical locations within said coordinate
location and said accuracy; determining a transportation route
comprising a physical location for each of said sequence of
locations.
2. The method of claim 1, said mapping comprising determining a
plurality of geophysical locations within said accuracy of said
coordinate location, said geophysical locations being defined in a
transportation graph.
3. The method of claim 2, said transportation graph comprising a
road system.
4. The method of claim 2, said transportation graph comprising a
railway system.
5. The method of claim 2, said transportation graph comprising a
bus system.
6. The method of claim 2, said transportation graph comprising a
ferry system.
7. The method of claim 2, said transportation graph comprising a
plurality of nodes and edges, at least one of said nodes being an
intersection.
8. The method of claim 1 further comprising: identifying a
trajectory comprising a start point and end point, said trajectory
being comprised of said sequence of locations; determining said
trajectory segment from said trajectory, said trajectory segment
being a portion of said trajectory being traveled on a first
transportation mode.
9. The method of claim 8, said transportation route comprising a
sequence of passageways within said transportation graph.
10. The method of claim 9 further comprising: determining a first
speed between a first location at a first time and a second
location at a second time; and interpolating at least one
intermediate location at an intermediate time between said first
location at said first time and said second location at said second
time.
11. A system comprising: at least one processor; a map matching
analyzer operating on said at least one processor and configured to
perform a method comprising: receiving a trajectory segment
comprising a sequence of locations for a device, said locations
comprising a timestamp and a coordinate location, said locations
further comprising an accuracy; for each of said sequence of
locations, mapping a plurality of physical locations within said
coordinate location and said accuracy; determining a transportation
route comprising a physical location for each of said sequence of
locations.
12. The system of claim 11, said mapping comprising determining a
plurality of geophysical locations within said accuracy of said
coordinate location, said geophysical locations being defined in a
transportation graph.
13. The system of claim 12, said transportation graph comprising a
road system.
14. The system of claim 12, said transportation graph comprising a
railway system.
15. The system of claim 12, said transportation graph comprising a
bus system.
16. The system of claim 12, said transportation graph comprising a
ferry system.
17. The system of claim 12, said transportation graph comprising a
plurality of nodes and edges, at least one of said nodes being an
intersection.
18. The system of claim 11, said method further comprising:
identifying a trajectory comprising a start point and end point,
said trajectory being comprised of said sequence of locations;
determining said trajectory segment from said trajectory, said
trajectory segment being a portion of said trajectory being
traveled on a first transportation mode.
19. The system of claim 18, said transportation route comprising a
sequence of passageways within said transportation graph.
20. The system of claim 19, said method further comprising:
determining a first speed between a first location at a first time
and a second location at a second time; and interpolating at least
one intermediate location at an intermediate time between said
first location at said first time and said second location at said
second time.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims benefit of and priority to
PCT/IB2017/050891 filed 17 Feb. 2017 by DataSpark, PTE, LTD
entitled "Mobility Gene for Trajectory Data," PCT/IB2017/050892
filed 17 Feb. 2017 by DataSpark, PTE, LTD entitled "Mobility Gene
for Visit Data," PCT/SG2017/050485 filed 27 Sep. 2017 by DataSpark,
PTE, LTD entitled "Trajectory Analysis With Mode Of Transport
Analysis," and PCT/SG2017/050484 filed 27 Sep. 2017 by DataSpark,
PTE, LTD entitled "Map Matching and Trajectory Analysis,"
PCT/SG2018/050006 filed 5 Jan. 2018 by DataSpark, PTE, LTD entitled
"Trajectory Analysis Through Fusion of Multiple Data Sources,"
PCT/SG2018/050068 filed 14 Feb. 2018 entitled "Stay And Trajectory
Identification From Historical Analysis of Communications Network
Observations," PCT/SG2018/050070 filed 14 Feb. 2018 by DataSpark,
PTE, LTD entitled "Real Time Trajectory Identification From
Communications Network Observations," the entire contents of which
are hereby expressly incorporated by reference for all they teach
and disclose.
BACKGROUND
[0002] Mobility data is being gathered on a tremendous scale. Every
cellular telephone connection to every mobile device generates some
data about a user's location. These observations are being
generated at an astonishing rate, but the sheer volume of the
observations make the data difficult to analyze.
[0003] Mobility data can be generated by merely observing a
location for a device connected to a wireless network. The wireless
network may be a cellular network, but also may be any other
network from which a device may be observed. For example, a WiFi
router or BlueTooth device may passively observe nearby devices,
and may note the device's various electronic identification or
other signatures. In many cases, a device may establish a
communications session with various network access points, which
may indicate the device's location.
[0004] Many interesting uses come from analyzing mobility data. As
merely one example, traffic congestion may be observed from
aggregating mobility observations from cellular telephones.
[0005] As more and more uses for mobility data are developed, the
complexities of analyzing and managing these large data sets are
exploding. One issue is that the sources of the data, such as the
telecommunications companies, may have obligations of privacy and
anonymity, but there may be a large number of consumers of the
data. The consumers may be a wide range of companies which may use
the data in countless ways.
SUMMARY
[0006] A trajectory may be derived from noisy location data by
mapping candidate locations for a user, then finding a match
between successive locations. Location data may come from various
sources, including telecommunications networks. Telecommunications
networks may give location data based on observations of users in a
network, and such data may have many inaccuracies. The observations
may be mapped to physical constraints, such as roads, pathways,
train lines, and the like, as well as applying physical rules such
as speed analysis to smooth the data and identify outlier data
points. A trajectory may be resampled or interpolated to generate a
detailed set of trajectory points from a sparse and otherwise
ambiguous dataset.
[0007] Mobility observations may be analyzed to create so-called
mobility genes, which may be intermediate data forms from which
various analyses may be performed. The mobility genes may include a
trajectory gene, which may describe a trajectory through which a
user may have travelled. The trajectory gene may be analyzed from
raw location observations and processed into a form that may be
more easily managed. The trajectory genes may be made available to
third parties for analysis, and may represent a large number of
location observations that may have been condensed, smoothed, and
anonymized. By analyzing only trajectories, a third party may
forego having to analyze huge numbers of individual observations,
and may have valuable data from which to make decisions.
[0008] A visit mobility gene may be generated from analyzing raw
location observations and may be made available for further
analysis. The visit mobility gene may include summarized statistics
about a certain location or location type, and in some cases may
include ingress and egress travel information for visitors. The
visit mobility gene may be made available to third parties for
further analysis, and may represent a concise, rich, and
standardized dataset that may be generated from several sources of
mobility data.
[0009] This Summary is provided to introduce a selection of
concepts in a simplified form that are further described below in
the Detailed Description. This Summary is not intended to identify
key features or essential features of the claimed subject matter,
nor is it intended to be used to limit the scope of the claimed
subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] In the drawings,
[0011] FIG. 1 is a diagram illustration of an example embodiment
showing an ecosystem with mobility genes.
[0012] FIG. 2 is a diagram illustration of an embodiment showing a
network environment with systems for generating mobility genes.
[0013] FIG. 3 is a flowchart illustration of an embodiment showing
a method for collecting data by a telecommunications network.
[0014] FIG. 4 is a flowchart illustration of an embodiment showing
a method for requesting and responding to a customized mobility
gene order.
[0015] FIG. 5 is a flowchart illustration of an embodiment showing
a method for generating and responding to a standardized mobility
gene order.
[0016] FIG. 6 is a flowchart illustration of an embodiment showing
a method for generating a trajectory mobility gene.
[0017] FIG. 7 is a flowchart illustration of an embodiment showing
a method for preparing trajectory mobility genes for
transmittal.
[0018] FIG. 8 is a flowchart illustration of an embodiment showing
a method for processing trajectories into visit mobility genes.
[0019] FIG. 9 is a flowchart illustration of an embodiment showing
a method for processing raw location observations into visit
mobility genes.
[0020] FIG. 10 is a diagram illustration of an embodiment showing
steps to create a path associated with a trajectory.
[0021] FIG. 11 is a diagram illustration of an embodiment showing a
path generated from a set of locations.
[0022] FIG. 12 is a diagram illustration of an embodiment showing a
network environment with a system that calculates a physical path
from a trajectory.
[0023] FIG. 13 is a flowchart illustration of an embodiment showing
a method for generating a transportation graph.
[0024] FIG. 14 is a diagram illustration of an example method for
creating candidate locations and determining an optimized path
through the candidates.
DETAILED DESCRIPTION
[0025] Trajectory Analysis from Sparse Location Data
[0026] A detailed trajectory may be derived from a sparse and noisy
set of sequential location points. For each location point in time,
a set of candidate physical locations may be generated from a map
of the physical area, then the candidate physical locations may be
connected to form a trajectory or path for a user.
[0027] A location dataset may include a location and timestamp for
a specific device or user. In many cases, the set of location data
points may be noisy. In many cases, location data supplied from a
telecommunications or cellular network may provide location data
that may have a high degree of inaccuracy. One such example may be
Location Based Service (LBS) location data.
[0028] Some telecommunications networks may provide a location data
point as merely the location of the cellular tower to which a user
may be connected, even though the user may be located a large
distance from the tower. Such datasets may have a large accuracy
tolerance, and the actual physical location may be anywhere within
the covered area of a cellular tower.
[0029] Further compounding the inaccuracies of the data, cellular
networks may have various rollover or handoff mechanisms that may
be deployed for load balancing. For example, a user may attempt to
connect to a network with a mobile device, but the closest cellular
tower may be nearing capacity. In such a case, the user's device
may be connected to a more distant tower with available capacity.
Such a situation may result in a user's location data reflecting a
more distant tower.
[0030] In another example of inaccuracies in location data, many
cellular networks may support several different communication bands
and communication technologies. A user may have an older device
that may not support the newest communication protocols, so their
connection may be supplied by one set of towers while another user
with a more advanced mobile device may be connected to a different
set of towers, even though both users may be in the same physical
location. In such a situation, both users are physically located in
the same space, but their location data may be different.
[0031] The analysis of such noisy and ambiguous location data may
begin by identifying candidate physical locations for each location
data point. The physical locations may be locations on streets,
sidewalks, roads, highways, train tracks, train stations, bus
stations, and other physical locations. Once candidate physical
locations have been mapped, an analysis may be performed to find a
logical physical location that a user may have traversed. In a
simple example, a logical physical sequence may be to have
traversed a roadway in a car or bicycle.
[0032] The analysis may further refine a sequence of physical
locations into a trajectory by identifying any outliers or
inconsistent location points. Such inconsistencies may be
identified by impractical or physically impossible changes in speed
or direction, by illogical traffic routing, or other
inconsistencies. In such cases, inconsistent data may be removed
from the trajectory. In some cases, the location sequence may be
recalculated with the inconsistent data removed or
de-emphasized.
[0033] Once a trajectory may be established, the trajectory may be
resampled or interpolated between the established data points. Such
a process may add location data points to a trajectory to make the
trajectory more useful for subsequent analyses.
[0034] Mobility Genes as Representations of Location
Observations
[0035] Mobility genes may represent large numbers of location
observations into a compact, meaningful, and easily digestible
dataset for subsequent observations. The mobility genes may be one
way for telecommunications service providers may aggregate and
process their location observations into various formats that may
be sold and consumed by other companies to provide meaningful and
useful analyses.
[0036] The mobility genes may be a second tier of raw location
data. Raw location data may come in enormous quantities, the volume
of which may be overwhelming. By condensing the raw location data
into different mobility genes, the subsequent analyses may be much
more achievable, while also maintaining anonymity of the users
whose observations may be protected by convention or law.
[0037] Raw location data may be produced in enormous volumes. In
modern society, virtually every person has at least one cellular
telephone or other connected device. The devices continually ping
with a cellular access point or tower, where each ping may be
considered a location observation. In a single day in a medium
sized city, billions of location observations may be collected.
[0038] Making meaningful judgments from these enormous datasets can
be computationally expensive. In many cases, small samples of the
larger dataset may be used to estimate various factors from the
data.
[0039] By pre-processing the raw location observations into a set
of mobility genes, a data provider may make these enormous datasets
available for further analysis without the huge computational
complexities. In many cases, the mobility genes may be anonymized,
smoothed, augmented with additional data, and may be succinct
enough and rich enough to make meaningful analyses without
violating a telecommunications network's obligation of privacy to
their customers. Further, the pre-processing of the data into
mobility genes may transfer much of the computational cost to the
data provider, which may unburden the data consumers from expensive
data handling.
[0040] Mobility Gene for Trajectory Data
[0041] Location observations may be condensed into trajectory data
that may be made available for various secondary analyses. Location
observations may come from many different sources, including
location observations made by telecommunications companies, such as
cellular telephony providers, wireless access providers, and other
communications providers.
[0042] The trajectory data may be useful for many different
analyses, such as traffic patterns, behavioral studies, customer
profiling, commercial real estate analyses, anomaly detection, and
others. The trajectory mobility gene may condense millions or
billions of location observations into a form that may be easily
digested into meaningful analyses and decisions.
[0043] The mobility gene may represent a mechanism by which a data
supplier may digest large numbers of observations into a dense,
useful, and anonymous format that may be consumed by a third party.
The third party may be a separate company that may further process
the mobility gene into a decision-making tool for various
applications.
[0044] By using a mobility gene, a data provider, such as a
telecommunications service provider, may be able to pre-process
large numbers of data into an intermediate format for further
analysis. The mobility gene may be a format for making data
available through an application programming interface (API) or
some other mechanism.
[0045] The trajectory mobility gene condenses many location
observations into a series of points or trajectories where a device
was observed. This pre-processing may increase the value of the
trajectory data, as well as make the trajectory data easier to
analyze and digest. In many cases, the pre-processing may also
attach various demographic information about the users associated
with the trajectories.
[0046] The trajectories may be smoothed, which may be useful in
cases where the observations may have location or time variations
or tolerances. For example, many location observations may be made
using an access point location or some form of triangulation
between multiple access points. Such location observations may have
an inherent level of tolerance or uncertainty, which may lead to
trajectories that may be physically impossible, as the speed
between each point may be unattainable using conventional
transportation mechanisms.
[0047] Demographic information about the users may be added to the
trajectory data. In many cases, a data provider may have secondary
information about a user, such as the user's gender, actual or
approximate age, home and work locations, actual or approximate
income, family demographics, and other information. Such
demographics may be associated with each trajectory, and may be
used for supplying subsets of trajectories for third party
analysis.
[0048] Trajectories may be anonymized in some cases. A user's
trajectory may reveal certain personally identifiable information
(PII) about a user. For example, a user's commuting trajectory may
identify the user's home and work locations. With such information,
a specific user may be identified. Anonymization of this data may
be performed in several different ways.
[0049] One way to anonymize a trajectory may be to truncate the
trajectory to omit an origin, destination, or both, while keeping a
portion of a trajectory of interest. For example, a set of
trajectories may be truncated to only show movement trajectories
through a specific portion of a road or train station. Such
truncations may omit the user's origin and destinations, but may
give a third traffic analysis service meaningful and useful
trajectories from which the service may show local traffic
patterns.
[0050] Another way to anonymize a trajectory may be to generalize
or randomize an origin or destination of a trajectory. In many
cases, a trajectory may have location observations with a certain
accuracy range or tolerance. Such accuracy may help identify a
person's home or other destination very specifically. One way to
anonymize the trajectory may be to identify an origin or
destination with a general area, such as a centroid of a housing
district. All trajectories beginning or ending at the housing
district may be assigned to be the centroid of the housing
district, and thereby an individual trajectory cannot be used to
identify a specific resident of the housing district.
[0051] Mobility Gene for Visit Data
[0052] A mobility gene for visits may be one mechanism to aggregate
and condense location observations into an intermediate form for
further analysis. A visit gene may represent summarized location
data that reflect user behavior with respect to a certain location
or location type.
[0053] The visit mobility gene may be derived from
telecommunications observations and other sources, and may be an
intermediate form of processed data that may be made available to
third parties for analysis. In many cases, the visit mobility gene,
as well as other mobility genes, may be made available for sale or
consumption by third parties, and may be a revenue source for
telecommunications companies and other companies that may gather
location observations.
[0054] A visit mobility gene may represent a rich set of data that
may be derived from location observations. In many cases, a visit
mobility gene may represent movements relating to a specific
location, such as a train station, store, recreational location, or
some other specific location. In some cases, a visit mobility gene
may represent an aggregation of visits to a specific type of
location, such as a user's home, work, or recreational
location.
[0055] A visit may be determined by a user's location observations
being constant or within a certain radius for a period of time. In
some cases, a visit may be derived by analyzing location
observations to find all location observations that may be within a
specific area, then analyzing user's behavior to determine if the
users remained in the area for a period of time. In other cases, a
visit may be derived by computing a user's trajectory and analyzing
the trajectory for periods where the user's movements have stopped
or remain within a small area. In such cases, a visit mobility gene
may be a secondary analysis of a trajectory mobility gene.
[0056] A visit gene may include time of day, length of stay, and
various other statistics. A visit gene may also include information
before and after a person's visit. For example, a visit gene may
include trajectories before and after a person's visit to a
location. A visit gene may be supplemented with demographic
information about visitors, such as actual or approximate age,
gender, actual or approximate home and work locations, actual or
approximate income, as well as hobbies, common other locations
visited, and other information.
[0057] Throughout this specification, like reference numbers
signify the same elements throughout the description of the
figures.
[0058] In the specification and claims, references to "a processor"
include multiple processors. In some cases, a process that may be
performed by "a processor" may be actually performed by multiple
processors on the same device or on different devices. For the
purposes of this specification and claims, any reference to "a
processor" shall include multiple processors, which may be on the
same device or different devices, unless expressly specified
otherwise.
[0059] When elements are referred to as being "connected" or
"coupled," the elements can be directly connected or coupled
together or one or more intervening elements may also be present.
In contrast, when elements are referred to as being "directly
connected" or "directly coupled," there are no intervening elements
present.
[0060] The subject matter may be embodied as devices, systems,
methods, and/or computer program products. Accordingly, some or all
of the subject matter may be embodied in hardware and/or in
software (including firmware, resident software, micro-code, state
machines, gate arrays, etc.) Furthermore, the subject matter may
take the form of a computer program product on a computer-usable or
computer-readable storage medium having computer-usable or
computer-readable program code embodied in the medium for use by or
in connection with an instruction execution system. In the context
of this document, a computer-usable or computer-readable medium may
be any medium that can contain, store, communicate, propagate, or
transport the program for use by or in connection with the
instruction execution system, apparatus, or device.
[0061] The computer-usable or computer-readable medium may be, for
example but not limited to, an electronic, magnetic, optical,
electromagnetic, infrared, or semiconductor system, apparatus,
device, or propagation medium. By way of example, and not
limitation, computer readable media may comprise computer storage
media and communication media.
[0062] Computer storage media includes volatile and nonvolatile,
removable and non-removable media implemented in any method or
technology for storage of information such as computer readable
instructions, data structures, program modules or other data.
Computer storage media includes, but is not limited to, RAM, ROM,
EEPROM, flash memory or other memory technology, CD-ROM, digital
versatile disks (DVD) or other optical storage, magnetic cassettes,
magnetic tape, magnetic disk storage or other magnetic storage
devices, or any other medium which can be used to store the desired
information and which can accessed by an instruction execution
system. Note that the computer-usable or computer-readable medium
could be paper or another suitable medium upon which the program is
printed, as the program can be electronically captured, via, for
instance, optical scanning of the paper or other medium, then
compiled, interpreted, of otherwise processed in a suitable manner,
if necessary, and then stored in a computer memory.
[0063] When the subject matter is embodied in the general context
of computer-executable instructions, the embodiment may comprise
program modules, executed by one or more systems, computers, or
other devices. Generally, program modules include routines,
programs, objects, components, data structures, etc. that perform
particular tasks or implement particular abstract data types.
Typically, the functionality of the program modules may be combined
or distributed as desired in various embodiments.
[0064] FIG. 1 is an illustration showing an example embodiment 100
of an ecosystem with mobility genes. A mobile device 102 may
connect to various access points 104, which may be managed by a
network operator 106. Each communication with the mobile device 102
may be stored as raw location data 108.
[0065] A location data processor 110 may analyze the raw location
data 108 to generate a set of mobility genes 112. The mobility
genes 112 may be transferred to various analyzers 114, 116, and 118
for subsequent analysis.
[0066] The location data processor 110 may process the raw location
observations into mobility genes 112, which may be sold or
transferred to third parties who may perform various analyses. The
mobility genes 112 may be a condensed, succinct, and useful
intermediate data format that may be consumed by third parties
while keeping user anonymity. In many cases, the location data
processor 110 may augment the raw location data with secondary data
sources, as well as provide smoothing and other processing that may
increase data usefulness and, in some cases, improve data
accuracy.
[0067] The various mobility genes 112 may be a standardized
mechanism by which third party data analyzers may access a very
rich and very detailed set of location data 108. A location data
processor 110 may analyze billions of raw location observations and
distill the data into mobility genes 112 that may be easily
consumed without the high data handling costs and high data
processing costs of analyzing enormous numbers of location
observations.
[0068] The mobility genes 112 may be an industrial standard format
that may preserve user anonymity yet may be increase the value of
specific data that may be used by third party analyzers. The
mobility genes 112 may come in many formats, including trajectories
and visits.
[0069] The mobility genes 112 may come in historical and real time
data formats. A historical data format may include mobility genes
that may have been derived over a relatively long period of time,
such as a week, month, or year. A real time format may present
mobility genes that may be occurring currently, or over a
relatively short period of time, such as over a minute, hour, or
day. Each use case and each system may have a different definition
for "historical" and "real time." For example, in some systems,
real time may be mobility genes derived in the last several
seconds, while another system may define real time as data
collected in the last week.
[0070] Real time data formats may be useful for providing alerts,
providing current data, or making real time decisions about
people's mobility. One use for real time data may be to display
traffic congestion on a road or to estimate travel time through a
city. Another use of real time data may be to predict the number of
travelers that may be at a taxi stand in the next several minutes
or in the next hour.
[0071] Real time data formats may be used to compare current events
to historical behaviors. Historical analysis may provide an
estimate for events that may happen today or some period in the
future, and by comparing historical estimates with real time data,
an anomaly may be detected or an estimate for future traffic may be
increased or decreased accordingly.
[0072] FIG. 2 is a diagram of an embodiment 200 showing components
that may analyze raw location data and provide mobility genes for
subsequent analyses. The example of embodiment 200 is merely one
topology that may be used to analyze raw location data.
[0073] The diagram of FIG. 2 illustrates functional components of a
system. In some cases, the component may be a hardware component, a
software component, or a combination of hardware and software. Some
of the components may be application level software, while other
components may be execution environment level components. In some
cases, the connection of one component to another may be a close
connection where two or more components are operating on a single
hardware platform. In other cases, the connections may be made over
network connections spanning long distances. Each embodiment may
use different hardware, software, and interconnection architectures
to achieve the functions described.
[0074] Embodiment 200 illustrates a device 202 that may have a
hardware platform 204 and various software components. The device
202 as illustrated represents a conventional computing device,
although other embodiments may have different configurations,
architectures, or components.
[0075] In many embodiments, the device 202 may be a server
computer. In some embodiments, the device 202 may still also be a
desktop computer, laptop computer, netbook computer, tablet or
slate computer, wireless handset, cellular telephone, game console
or any other type of computing device. In some embodiments, the
device 202 may be implemented on a cluster of computing devices,
which may be a group of physical or virtual machines.
[0076] The hardware platform 204 may include a processor 208,
random access memory 210, and nonvolatile storage 212. The hardware
platform 204 may also include a user interface 214 and network
interface 216.
[0077] The random access memory 210 may be storage that contains
data objects and executable code that can be quickly accessed by
the processors 208. In many embodiments, the random access memory
210 may have a high-speed bus connecting the memory 210 to the
processors 208.
[0078] The nonvolatile storage 212 may be storage that persists
after the device 202 is shut down. The nonvolatile storage 212 may
be any type of storage device, including hard disk, solid state
memory devices, magnetic tape, optical storage, or other type of
storage. The nonvolatile storage 212 may be read only or read/write
capable. In some embodiments, the nonvolatile storage 212 may be
cloud based, network storage, or other storage that may be accessed
over a network connection.
[0079] The user interface 214 may be any type of hardware capable
of displaying output and receiving input from a user. In many
cases, the output display may be a graphical display monitor,
although output devices may include lights and other visual output,
audio output, kinetic actuator output, as well as other output
devices. Conventional input devices may include keyboards and
pointing devices such as a mouse, stylus, trackball, or other
pointing device. Other input devices may include various sensors,
including biometric input devices, audio and video input devices,
and other sensors.
[0080] The network interface 216 may be any type of connection to
another computer. In many embodiments, the network interface 216
may be a wired Ethernet connection. Other embodiments may include
wired or wireless connections over various communication
protocols.
[0081] The software components 206 may include an operating system
218 on which various software components and services may
operate.
[0082] A raw location receiver 220 may receive raw location data
from one or more networks 242 or other sources. The raw location
receiver 220 may have a push or pull communication model with a raw
location data source, and may receive real time or historical data
for analysis. The raw location receiver 220 may store information
in a raw location database 222.
[0083] A batch analysis engine 224 or a real time analysis engine
226 may route the raw location data 222 into various analyzers for
processing. The analyzers may include a trajectory analyzer 228, a
visit analyzer 230, and a statistics generator 232. The analysis
may result in mobility genes 234, which may be served to various
analyzers through a real time analysis portal 236 or a batch level
analysis portal 238.
[0084] In the example of embodiment 200, a batch analysis engine
224 may analyze historical data to create historical mobility
genes. The results of batch-level analysis may be available through
a batch level analysis portal 238, where other analyzers may
download and use mobility genes. A batch-level analysis may be
analyses that may not have a real-time use case. For example, a
commercial developer may wish to know the demographics of people
who travel near a commercial shopping mall. Such an analysis may be
performed in batch mode because the data may not be changing
rapidly.
[0085] A real time analysis engine 226 may perform real-time
analysis of location observations, and may be tuned to process data
quickly. In many cases, the real time analysis engine 226 may
generate comparison versions of a mobility gene. A comparison
version may be a difference or comparison between a set of real
time observations and a predefined, historical mobility gene. This
difference may be useful for generating alerts, for example. In
some cases, the difference information may be much more compact
than having to access an entire set of mobility genes.
[0086] A trajectory analyzer 228 may create trajectories from raw
location data 222. The trajectories may include sequences of
locations traveled by a user, including timestamps for each of the
observed locations. The trajectories may be processed into a
useable form by scrubbing and smoothing the data, as well as
removing duplicate or superfluous observations.
[0087] A visit analyzer 230 may identify visits for a given
location. In some cases, the visits may be inferred or determined
from subsequent analysis of trajectories. In other cases, visits
may be identified by finding all location observations for a given
location, then finding data associated with those visits.
[0088] A statistics generator 232 may generate various statistics
for a given mobility gene. In some cases, the statistics generator
232 may access various static data sources 256 or real time or
dynamic data sources 258 to augment a mobility gene.
[0089] The real time analysis portal 236 and batch level analysis
portal 238 may be a computer or web interface through which data
may be queried and received. In a typical use case, a third party
analyzer may send a request to one of the portals 236 or 238 for a
set of mobility genes. After verifying the requestor's credentials,
the portal may cause the data to be generated if the mobility genes
have not been calculated, then the mobility genes may be
transmitted to the requestor.
[0090] The system 202 may be connected to various other devices and
services through a network 240.
[0091] One or more telecommunications networks 242 may supply raw
location data to the system 202. The telecommunications networks
242 may be cellular telephony networks, wireless data networks,
networks of passive wireless sniffers, or any other network that
may supply location information.
[0092] In a typical network, a wireless mobile device 244, which
may have a Global Positioning System (GPS) receiver 246, may
connect to with a telecommunications network 248 through a series
of access points. Various location data 250 may be generated from
the mobile device interactions, including GPS location data that
may be generated by the mobile device 244 and transmitted across
the telecommunications network 242.
[0093] The location data 250 may be cleaned and scrubbed with a
data scrubber 252 to provide raw location data 254 that may be
processed by the system 202. In many cases, the location data 250
may include device identifiers and other potentially personally
identifiable information. The data scrubber 252 may replace device
identifiers with other, non-traceable identifiers and perform other
pre-processing of the location data.
[0094] One form of telecommunications location data may include
location data that may be gathered from monitoring a device
location in a cellular telephony system. In some such systems, the
location data may include the location coordinates of an access
point, which may be close to but not exactly the location of the
device. Some cellular networks may have cells that span large
distances, such as multiple kilometers or miles, and the accuracy
of the location information may be very poor. Other
telecommunications systems may use triangulation between two,
three, or more access points to determine location with a higher
degree of accuracy.
[0095] In some cases, a GPS receiver in a mobile device may
generate coordinates and may transmit the coordinates as part of a
data message from the mobile device 244. Such GPS coordinates may
be much higher accuracy than other location mechanisms, but GPS
coordinates may not be transmitted with as often as other location
mechanisms. In some systems, some location observations may have
different degrees of accuracy, such that some observations may be
generated by GPS and other observations may be determined through
triangulation or merely access point locations. Such accuracy
differences may be used during mobility gene calculations.
[0096] Static data sources 256 and dynamic data sources 258 may
represent any type of supplemental data sources that may be used to
generate mobility genes. An example of a static data source 256 may
be a map of highways, roads, train systems, bus systems, pedestrian
paths, bicycle paths, and other transportation routes. Another
example may be the name and location of various places of
interests, such as shopping malls, parks, stores, train stations,
bus stops, restaurants, housing districts, factories, offices, and
other physical locations.
[0097] Another set of static data sources 256 may be demographic
information about people. Such information may be known by a
telecommunications network 242 because the network may have name,
address, credit card, and other information about each of its
subscribers. In some cases, a telecommunications network 242 may
augment its raw location data 254 with demographic information.
[0098] An example of dynamic data sources 258 may be current train,
bus, airplane, or ferry schedule, the current number of taxis
available, or any other data source.
[0099] The static and dynamic data sources 256 and 258 may augment
a mobility gene. For example, a data analyzer may request mobility
gene information for fast food restaurants in a specific city. The
system 202 may identify each of the fast food restaurants from a
secondary data source, the identify visits and trajectories that
may relate to each of the fast food restaurants.
[0100] A set of data consumers 260 may be third party organizations
that may consume the mobility gene data. The data consumers 260 may
have a hardware platform 260 on which various analysis applications
262 may execute. In some cases, the data consumers 260 may be third
party services that may consume the mobility genes and provide
location-based services, such as traffic monitoring and a host of
other services.
[0101] FIG. 3 is a flowchart illustration of an embodiment 300
showing a method of generating location observations. Embodiment
300 is a simplified example for a sequence of generating location
observations that may be performed by a telecommunications
network.
[0102] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0103] Embodiment 300 illustrates two ways of determining a
location observation, along with a way to scrub the observations
from device-specific identifiers.
[0104] One way to create a location observation may be to detect a
device on the network in block 302. A location for the device may
be determined in block 304, along with a timestamp in block 306.
The resultant location observation may be stored in block 308.
[0105] Each location may be determined by the network. In some
cases, a network may establish an approximate location for the
device, which may be sufficient for managing the traffic on the
network. However, in many cases, such location coordinates may be
inaccurate. For example, some networks may provide a location as
the location of the access point, cell tower, or other fixed node
on the network. Any device detected by that node may be located
anywhere within the range of the access point, which may be several
kilometers or miles. Such location information may have a large
tolerance or variation from the actual location.
[0106] Some networks may provide a location estimate based on
triangulation of a device with two, three, or more access points or
other receivers. Such a location may be more accurate than the
example of providing merely the access point physical location, but
may not be as accurate as GPS location.
[0107] In block 310, a network may detect that GPS location
information may be transmitted over the network. Such information
may be captured, a timestamp generated in block 312, and a location
observation may be stored in block 314. Such an example may be one
method by which GPS information may be captured and stored as a
location information.
[0108] In some systems, certain applications may execute on a
device and may generate GPS location information. For example,
navigation applications typically send a stream of GPS location
data to a server, which may update directions for a user. Such
applications may be detected, and the GPS locations may be used as
highly accurate location observations.
[0109] A typical location observation may include a device
identifier, a set of location coordinates, and a timestamp. The
device identifier used in a wireless network may depend on the
network. Typically, a device may have some type of electronic
identification, such as a Media Access Control (MAC) address,
Electronic Identification Number (EIN), or other device identifier.
In many cases, such identifiers may be a mechanism by which other
systems may also identify the device.
[0110] A device identifier may be one mechanism by which a mobility
gene may be directly linked to a specific user. In general, the raw
data for mobility genes may be collected by one group of actors who
may have strict privacy regulations to which they have to adhere,
but may sell mobility genes to a third party. A device identifier
may be one way that a third party may connect specific mobility
data to specific users.
[0111] In order to obfuscate identifiable information from the
location observations, each observation may be analyzed in block
316, and a unique identifier for the device may be generated in
block 318 and substituted for the actual device identifier in block
320. The location observation may be updated in block 322.
[0112] The unique identifier may be the same identifier for that
device in the particular dataset being analyzed. In some cases, a
lookup table may be created that may have the device identifier and
its unique replacement. Such a system may use the same substituted
device identifier for observations over a long period of time.
[0113] After updating all of the observations, the updates may be
sent to a mobility gene analyzer in block 324.
[0114] FIG. 4 is a flowchart illustration of an embodiment 400
showing interactions between a mobility gene provider 402 and a
data consumer 404. The operations of the mobility gene provider 402
are illustrated in the left hand column, while the operations of
the data consumer 404 are illustrated in the right hand column.
[0115] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0116] Embodiment 400 is one method by which a mobility gene may be
requested and provided. A mobility gene provider 402 may be a
system that may process raw location observations into a set of
mobility genes. The mobility genes may be consumed by the data
consumer 404. In many situations, the mobility genes may be a
compact form of location observations that may be ready for further
processing by a data consumer 404.
[0117] The mobility genes may represent many thousands, millions,
billions, or even trillions of individual observations that may be
condensed into various mobility genes. By pre-processing the
location observations into a set of mobility genes, the high cost
and complexity of analyzing enormous numbers of observations may be
avoided. Further, a set of mobility genes may be anonymized or
summarized such that the data may be handled without worry of
disclosing personally identifiable information. Such restrictions
may be imposed by law or convention, and the cost of implementing
the restrictions may be borne by the mobility gene provider 402 and
may not be passed to the data consumer 404.
[0118] In the example of embodiment 400, a data consumer 404 may
define a mobility gene in block 406, then transmit that definition
in block 408 to the mobility gene provider 402.
[0119] The mobility gene provider 402 may receive the definition in
block 410, analyze raw location data in block 412, and create the
mobility genes in block 414 and store the mobility genes in block
416.
[0120] In many cases, the mobility gene may be processed from
historical data. Such mobility genes may be processed in a batch
mode. Some requests may be for real time data, and such mobility
genes may be continually processed and updated.
[0121] In the example of embodiment 400, a data consumer 404 may
request data in block 418, which may be received in block 420 by
the mobility gene provider 402 in block 422. The mobility gene
provider 402 may transmit the mobility genes in block 422, which
may be received by the data consumer in block 424. The mobility
genes may be analyzed in block 426 to provide various location
based services in block 428.
[0122] The example of embodiment 400 in blocks 418-428 may be one
example of a pull-style communication protocol, where the data
consumer 404 may initiate a request. Other systems may use a
push-style communication protocol, where the mobility gene provider
402 may initiate a data transfer. Still other systems may use other
types of communication protocols for transferring mobility genes
from a mobility gene provider 402 to a data consumer 404.
[0123] FIG. 5 is a flowchart illustration of an embodiment 500
showing interactions between a mobility gene provider 502 and a
data consumer 504. The operations of the mobility gene provider 502
are illustrated in the left hand column, while the operations of
the data consumer 504 are illustrated in the right hand column.
[0124] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0125] Embodiment 500 is an example of an interaction where a data
consumer 504 may use a standard, pre-computed mobility gene. A
mobility gene provider 502 may analyze raw location data in block
506, create a standardized set of mobility genes in block 508, and
store the mobility genes in block 510. Such a process may loop over
and over as new data may be received.
[0126] A standardized set of mobility genes may be pre-defined and
may be ready to use. One form of such genes may be a subscription
service or a data marketplace, where many different data consumers
504 may purchase or consume a pre-defined set of mobility
genes.
[0127] Such a system may compare with the example of embodiment
400, where a data consumer may define various parameters about a
requested mobility gene.
[0128] A data consumer 504 may determine a standard mobility gene
for an application in block 512. In many cases, a mobility gene
provider 502 may provide a catalog of mobility genes that may be
useful for various applications. Such mobility genes may be
standardized and may be offered on a subscription or other basis to
one or more data consumers.
[0129] The data consumer 504 may request mobility genes in block
514, and the request may be received in block 516 by the mobility
gene provider 502. The mobility genes may be transmitted in block
518 and received in block 520. A data consumer 504 may analyze the
mobility genes in block 522 and provide a location based service in
block 524.
[0130] FIG. 6 is a flowchart illustration of an embodiment 600
showing a method for creating trajectory mobility genes. The method
of embodiment 600 may be merely one example of how trajectories may
be created from raw location observations.
[0131] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0132] Embodiment 600 is one example of how trajectory mobility
genes may be generated. A trajectory gene may define a path that a
user may have traveled. In many cases, a trajectory gene may
include a transportation mode.
[0133] Trajectory genes may be smoothed. In many cases, location
observations may not be very precise. For example, some raw
location data may give a user's location as the location of an
access point, which may be a large distance from the actual
location. In some cases, such variation may be on the order of tens
or hundreds of feet, or in some cases miles or kilometers of
inaccuracies.
[0134] A smoothing algorithm may adjust a trajectory such that the
movement may make physical sense. Some such smoothing algorithms
may increase a trajectory's accuracy.
[0135] Some smoothing or post processing algorithms may adjust a
trajectory as part of an anonymizing process. Trajectories can
contain information that may identify people specifically. For
example, a trajectory from a person's home address to their work
address may indicate exactly who the person may be. By obfuscating
one or both of the origin or destination, the trajectory may be
made anonymous, while preserving useful portions of the trajectory
for analysis.
[0136] Many mobility genes may include demographic information
about a user. The demographic information may be any type of
descriptor or categorization of the user. Many systems may classify
users by gender, age or age group, income, race, education, and so
on. Some systems may include demographics that may be derived from
location observation data, such as predominant mode of transport,
recreational sites visited, types of restaurants visited, and the
like.
[0137] Raw location observations may be received in block 602.
[0138] A timeframe of interest may be determined in block 604. In
some analyses, a time frame may be defined by trajectories in the
last hour, day, or week. In other analyses, a time frame may be
defined by trajectories at a specific recurring time, such as
between 9:15-9:30 am on Tuesdays that are not holidays. Location
observations meeting the timeframe of interest may be gathered for
the analysis.
[0139] The observations may be sorted by device identification in
block 606. For each device identification in block 608, a subset of
observations may be retrieved in block 610 that have the device
identification. The subset may be sorted by timestamp in block 612
and a raw trajectory may be created by the sequence of location
observations in block 614.
[0140] For each sequence in block 616, the trajectory may be broken
into segments based on the trajectory speed in block 618. In other
words, a trajectory segment may be created by identifying locations
where the trajectory may have paused for an extended time. An
example may be a trajectory that may pause while a person is at
work, at home, at a recreational event, or visiting some
location.
[0141] For each segment in block 620, a transportation mode may be
determined in block 622 and an average speed determined in block
624. The transportation mode may be inferred by the specifics of a
trajectory. For example, a person who progresses slowly at a
walking pace to a train station, then moves quickly at a train's
speed may be assumed to have walked to the train station and ridden
a train. Another person who lingers at a bus stop for a period of
time, then travels at a common speed of vehicular traffic may be
assumed to be riding a bus. Yet another person who travels on a
motorway but begins and ends a journey away from bus stops may be
assumed to travel by car or taxi.
[0142] In some embodiments, a user's previous history may be used
as an indicator for their preferred transportation mode. Some
systems may look back to previous transportation analyses for hints
or indicators as whether a specific user often uses a car or
train.
[0143] The following several steps may be one way to smooth the
trajectory and, in some cases, increase its accuracy. Some location
observations may have positional data that may be highly
inaccurate. The inaccuracies may come from the method used to
determine a user's location, which may include giving only the
coordinates of an access point or cell tower, even though the user
may be a long distance away from the access point or cell tower. In
such cases, the trajectory information may give unrealistic
movements, such as lingering for a period of time at one access
point, then instantaneously moving a long distance to a second
access point. Such movements are not physically possible, so by
smoothing the trajectory, the trajectory may become more accurate
and more useful for further analyses.
[0144] Once a transportation mode is determined in block 622, an
average speed may be determined in block 624. The average speed may
be calculated from the end points of a trajectory segment.
[0145] A baseline speed range for the travel segment may be
determined from historical data in block 626. The baseline speed
may be used as a comparison to determine whether the observed
speeds appear appropriate. For each observation in block 628, a
speed comparison may be made in block 630. If the speed appears
appropriate in block 630, no changes may be made. If the speed does
not appear to be appropriate in block 630, the observed location
may be adjusted in block 632 to meet the speed limits determined
from the historical data.
[0146] After analyzing each segment in block 620, descriptors may
be added to each segment in block 634. The descriptors may include
transportation mode, averages speed, and other metadata.
Demographic information may be added in block 636 describing the
user.
[0147] After analyzing each sequence in block 616, the trajectories
may be stored in block 638.
[0148] FIG. 7 is a flowchart illustration of an embodiment 700
showing a method for preparing trajectory mobility genes for
transmittal. The method of embodiment 700 may be merely one example
of how trajectories may be prepared for use.
[0149] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0150] Embodiment 700 may illustrate one method by which a request
for trajectory mobility genes may be fulfilled. The fulfillment
method may ensure that there may be a sufficient number of
trajectories such that individual trajectories may not be
separately identifiable. In some cases, the trajectories may also
be obfuscated.
[0151] A request for trajectory genes may be received in block
702.
[0152] The request may define a physical area of interest in block
704. The physical area of interest may be a specific physical
location, such as people traveling along a highway or people
traveling towards a sporting event. In some cases, the physical
area of interest may be a category, such as people going out to
eat, where the category may define the destination as any
restaurant.
[0153] A time frame of interest may be defined in block 704. The
number of available trajectories that meet the physical location
and time frame criteria may be determined in block 706. If the
number is below a predefined minimum number of trajectories in
block 708, the search parameters may be adjusted in block 710 to
include additional trajectories.
[0154] The minimum number of trajectories may be selected for any
of many reasons. In some cases, a minimum number of trajectories
may allow a mobility gene to anonymize the data such that a single
trajectory may not be individually identified. In many cases, a
summarized demographic profile may be provided with the
trajectories, and when a low number of trajectories may be
provided, it may be possible to single out a trajectory as possibly
belonging to an outlier in the demographic profile.
[0155] Another reason for using a minimum number of trajectories
may be to ensure relatively accurate subsequent analyses. A small
set of trajectories may give highly skewed results in some cases,
and by having larger datasets, more meaningful results may be
calculated with higher confidence intervals.
[0156] The trajectories meeting the criteria may be retrieved in
block 714. For each trajectory in block 716, the trajectory origins
or destinations may be obfuscated in block 718, and demographic
data may be collected in block 720.
[0157] The obfuscation of the trajectory may be accomplished in
several different methods. One way to obfuscate a trajectory may be
by truncating a trajectory. One use case may be to use trajectories
to determine the density of riders on a subway system. The density
may be derived from the number of trajectories from one train
station to the next, but the analyses does not need to include
origin and destination. By truncating the trajectories to just the
portion from one train station to the next, anonymity may be
preserved.
[0158] One way to obfuscate a trajectory may be to summarize an
origin or destination. A person may be personally identified when
that person begins or ends their journey from their home address.
In such cases, a trajectory may be anonymized by using a
centralized location as a substitute for a home address. For
example, a centralized location in a housing district may be
substituted for a user's home address in their trajectory. Such a
substitution may be made with a work address or some other origin
or destination.
[0159] Another way to obfuscate a trajectory may be to truncate a
trajectory at a common location near the origin or destination. For
example, a person why may travel by subway to their home may have
their trajectory truncated at the train station where they
alight.
[0160] After analyzing all of the trajectories in block 716, the
demographic data may be summarized for the group of trajectories in
block 722. The mobility genes may be transmitted in block 724.
[0161] FIG. 8 is a flowchart illustration of an embodiment 800
showing a method for creating visit mobility genes from trajectory
genes. The method of embodiment 800 may be merely one example of
how visit genes may be created.
[0162] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0163] Embodiment 800 may be one example of how to create a visit
mobility gene. A visit mobility gene may give various information
and statistics about people's visits to certain locations. In some
cases, a data consumer may wish to find information about people's
visits to a specific location, such as a shopping mall,
recreational venue, a specific coffee shop, or other location.
[0164] In other cases, a data consumer may wish to find information
about people's visits to certain classes of locations, such as fast
food restaurants, grocery stores, or some other category.
[0165] Embodiment 800 may be one way to identify visits from
trajectories. In this method, places where a person's trajectory
pauses or remains within a certain area may be considered visits.
Once a visit may be identified, the visit may be matched to a known
physical location, then the visit may be classified, and
demographics may be added.
[0166] The operations of embodiment 800 may be an example of an
analysis that may be performed any time a trajectory may be
generated. In some systems, trajectory mobility genes may be
constantly generated from recently generated data. As each
trajectory may be created, a visit analysis such as embodiment 800
may be performed to identify, classify, and store visits in a
database.
[0167] Trajectories may be received in block 802. For each
trajectory in block 804, a period of little movement may be
identified in block 806. The period of little movement may be
analyzed in block 808 to determine a length of visit. If the visit
does not exceed a minimum threshold in block 810, the visit may be
ignored in block 812.
[0168] When the visit exceeds a threshold in block 810, an attempt
may be made to identify home or work location in block 814. The
home or work location of a person may be visited very frequently,
typically every day.
[0169] The home and work location of a person may be a special
category of locations for several reasons. For example, many
movement studies may involve people's movements to and from work or
home. As another example, home and work locations may be a way to
identify a trajectory as belonging to a specific person.
[0170] If a match for home or work is made in block 816, the visit
may be marked as home or work in block 818. When the visit is not
to home or work, an attempt may be made in block 820 to match the
visit to a known location. If there is a match in block 822, the
visit may be marked with the location in block 824.
[0171] The matching in block 820 may be to attempt to match a visit
to a business, organization, physical feature such as a park, or
some other metadata about a location. Such metadata may enrich the
data stored for a visit. For example, a visit near a grocery store
that takes 20 minutes or so may be classified as a visit to the
grocery store. Such grocery store visits may be searched and
aggregated into a visit mobility gene for further analysis.
[0172] The visit type and duration may be classified in block 826
and demographic information may be added in block 828. The visit
mobility gene information may be stored in block 830.
[0173] FIG. 9 is a flowchart illustration of an embodiment 900
showing a second method for creating visit mobility genes. The
method of embodiment 900 may be merely one example of how visit
mobility genes may be created from raw location observations.
[0174] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0175] Embodiment 900 may be another way of identifying and
classifying visits as part of a visit mobility gene. In this
method, a set of locations is given, and the raw observation data
may be searched to find occasions where the location was visited.
From these data points, various aspects of a visit mobility gene
may be derived.
[0176] Raw location observations may be received in block 902, as
well as a set of locations of interest in block 904.
[0177] For each location of interest in block 906, raw location
observations meeting the location criteria may be found in block
908. The user identifications for those observations may be found
in block 910.
[0178] For each user identification in block 912, a length of stay
may be determined in block 914. If the stay does not exceed a
minimum value in block 916, the visit may be ignored in block
918.
[0179] When the visit does exceed the minimum value in block 916,
the demographic information about the user may be gathered in block
918.
[0180] An inbound trajectory may be calculated in block 920 and an
outbound trajectory may be determined in block 922. The inbound and
outbound trajectories may be useful to help understand visitor's
movements before and after the visit.
[0181] In some cases, the visit information may be anonymized. For
example, inbound and outbound trajectories may be truncated or
otherwise obfuscated. The visit data may be stored in block
928.
[0182] FIG. 10 is a diagram illustration of an embodiment 1000
showing a method for determining a trajectory pathway. Embodiment
1000 illustrates high level steps for calculating a movement
trajectory that may be mapped to physical thoroughfares.
[0183] A physical map 1002 may contain a street map 1004, which may
show various roads and highways 1006.
[0184] From the physical map 1002, a graph 1008 of transport
pathways may be generated. The graph 1008 may be a description of
the physical map 1002 that may be interpreted and analyzed by a
computer. In many cases, the graph 1008 may have edges representing
segments of thoroughfares, and the nodes may be intersections
between the segments. The graph may be a computer-readable
structure that may be traversed and analyzed by a computer when
attempting to determine a user's physical path through the map
1002.
[0185] For each time period in a trajectory segment, candidate
locations may be determined 1010. A location 1012 may be received
from a telecommunications network, which may be represented by
several candidate locations 1014. The location 1012 may have an
accuracy or tolerance, which may be used to select several
candidate locations 1014. The trajectory path may be calculated
1016 by finding an optimized route 1018 through the sequence of
candidate locations.
[0186] FIG. 11 is a diagram illustration of an embodiment 1100
showing a calculated trajectory 1112 generated from a sequence of
locations.
[0187] A map 1102 may illustrate the position of several locations
1104, 1106, 1108, and 1110. Each of the locations 1104, 1106, 1108,
and 1110 may represent individual positions captured by a
telecommunications network in successive time intervals. From these
positions, a calculated trajectory 1112 may be generated that maps
to the user traversing the highway 1114.
[0188] The locations 1104, 1106, 1108, and 1110 may be located away
from the highway 1114. Such inaccuracies may arise from the
inaccuracies of the location data that may be provided by a
telecommunications network.
[0189] A telecommunications network may provide location data
through several different mechanisms. Some networks may capture
Global Positioning System (GPS) data that may originate at a user's
device, then may be transmitted to the telecommunications network.
Such GPS data may tend to be more accurate than other forms of
location data. In such systems, each time period may have a much
smaller set of candidate physical locations for analysis than with
other location data.
[0190] Some networks may not capture data generated at a user
device, but may capture data that may be detected through the
network itself. In the coarsest type of location data, a network
may merely capture the location of the tower to which a user device
may be connected. The tower location may cover individual cells
that may be many kilometers in size, yielding highly inaccurate
location data. Other networks may have various methods of
triangulating a user's location using signals from two, three, or
more towers.
[0191] With each type of location data, different levels of
inaccuracies or tolerance may be assumed. For GPS-generated
location data, the accuracies may be relatively high. In such
cases, a user's candidate positions for a given set of location
coordinates may be tightly focused around the coordinates. Some
systems may assign a probability factor for candidate positions
closest to the coordinate locations, with a higher probability
factor allocated for closer candidate positions and lower
probability factors for further candidates.
[0192] For location data that may identify merely a tower location
or a cell serviced by a tower, the candidate locations may be any
location within the area serviced by the tower or the cell. In some
cases, each of the candidate locations may be assigned the same
probability.
[0193] From the analysis of embodiment 1100, the various locations
1104, 1106, 1108, and 1110 may represent cell towers or cells in
which the user may have been traveling, yet the most likely
calculated trajectory 1112 may be along the highway 1114. Such a
situation often occurs in trajectories generated from
telecommunications networks.
[0194] A calculated trajectory 1112 may be useful for further
analysis. For example, traffic density and speeds along the highway
1114 may be measured. Such analysis may not have been previously
possible with the highly inaccurate and spare data that may come
from a telecommunications network.
[0195] FIG. 12 is a diagram of an embodiment 1200 showing
components that may analyze raw location data and provide analyzed
trajectories for subsequent analyses. The example of embodiment
1200 is merely one topology that may be used to analyze raw
location data.
[0196] The diagram of FIG. 12 illustrates functional components of
a system. In some cases, the component may be a hardware component,
a software component, or a combination of hardware and software.
Some of the components may be application level software, while
other components may be execution environment level components. In
some cases, the connection of one component to another may be a
close connection where two or more components are operating on a
single hardware platform. In other cases, the connections may be
made over network connections spanning long distances. Each
embodiment may use different hardware, software, and
interconnection architectures to achieve the functions
described.
[0197] Embodiment 1200 illustrates a device 1202 that may have a
hardware platform 204 and various software components. The device
1202 as illustrated represents a conventional computing device,
although other embodiments may have different configurations,
architectures, or components.
[0198] In many embodiments, the device 1202 may be a server
computer. In some embodiments, the device 1202 may still also be a
desktop computer, laptop computer, netbook computer, tablet or
slate computer, wireless handset, cellular telephone, game console
or any other type of computing device. In some embodiments, the
device 1202 may be implemented on a cluster of computing devices,
which may be a group of physical or virtual machines.
[0199] The hardware platform 1204 may include a processor 1208,
random access memory 1210, and nonvolatile storage 1212. The
hardware platform 1204 may also include a user interface 1214 and
network interface 1216.
[0200] The random access memory 1210 may be storage that contains
data objects and executable code that can be quickly accessed by
the processors 1208. In many embodiments, the random access memory
1210 may have a high-speed bus connecting the memory 1210 to the
processors 1208.
[0201] The nonvolatile storage 1212 may be storage that persists
after the device 1202 is shut down. The nonvolatile storage 1212
may be any type of storage device, including hard disk, solid state
memory devices, magnetic tape, optical storage, or other type of
storage. The nonvolatile storage 1212 may be read only or
read/write capable. In some embodiments, the nonvolatile storage
1212 may be cloud based, network storage, or other storage that may
be accessed over a network connection.
[0202] The user interface 1214 may be any type of hardware capable
of displaying output and receiving input from a user. In many
cases, the output display may be a graphical display monitor,
although output devices may include lights and other visual output,
audio output, kinetic actuator output, as well as other output
devices. Conventional input devices may include keyboards and
pointing devices such as a mouse, stylus, trackball, or other
pointing device. Other input devices may include various sensors,
including biometric input devices, audio and video input devices,
and other sensors.
[0203] The network interface 1216 may be any type of connection to
another computer. In many embodiments, the network interface 1216
may be a wired Ethernet connection. Other embodiments may include
wired or wireless connections over various communication
protocols.
[0204] The software components 1206 may include an operating system
1218 on which various software components and services may
operate.
[0205] A map processor 1220 may create graphs from a map of a
physical transportation system. The graphs may be stored in a graph
database 1222 and may be used by a map matcher 1224 to associate a
location trajectory to a set of physical locations. In many cases,
a map matcher 1224 may generate a sequence of roads, highways,
pathways, train lines, or other thoroughfares that may correspond
with user trajectories. The map processor 1220 may retrieve various
maps from a remote map database 1234, which may be accessible over
a network 1232.
[0206] The graphs may be computer-searchable representations of
maps of physical transportation networks. A graph may be created
for different modes of transportation, such as a graph of a train
network, a bus network, a road system, a ferry system, a bicycle
path network, pedestrian walkways, and any other transportation
mode. The graphs may represent the physical world by matching the
graph nodes to intersections and the graph edges to a thoroughfare.
Once represented as a graph, the map matcher 1224 may be able to
find a sequence of physical positions that correspond with
locations that may have been observed for a device.
[0207] The map matcher 1224 may attempt to find a physically
logical sequence for a trajectory. The physically logical sequence
may involve finding a path that makes sense based on the speed,
mode of transportation, or other factors in the data. For example,
a trajectory that results in a user moving at three times the speed
limit of a side street may be impractical or impossible, so the
sequence may be recomputed with a trajectory that traverses a
highway with a much faster speed limit.
[0208] A trajectory generator 1226 may generate a trajectory from a
sequence of location data. The location data may come from a
telecommunications network 1236, which may provide a set of device
locations 1238. The device locations 1238 may be constructed into
trajectories, which may contain a sequence of location coordinates
for individual devices. The location coordinates may be timestamped
so that the coordinates may be arranged by time sequence.
[0209] The trajectory generator 1226 may store trajectories in a
trajectory database 1228. A map matcher 1224 may analyze
trajectories 1228 to create analyzed trajectories 1230. The
analyzed trajectories may include a sequence of thoroughfares
traveled by a user.
[0210] A user device 1240 may represent a device that may operate
within a telecommunications network or other network and from which
a sequence of locations may be generated. A typical user device
1240 may be a cellular telephone, but other devices may be portable
laptop computers, tablet computers, wearable computer devices, or
any other mobile device that may operate on any type of hardware
platform 1242. In some cases, the device may have an internal
location detector 1244, which may generate a location history 1246.
The location history 1246 may be used by a trajectory generator
1226 to create trajectories.
[0211] In some cases, the user device 1240 may be recognized and
tracked by a telecommunications network 1236 without the device
having a location detector 1244. In such a mode, the
telecommunications network 1236 may detect a device and its
location by identifying the device within the network 1236.
[0212] FIG. 13 is a flowchart illustration of an embodiment 1300
showing a method for creating transportation graphs. The method of
embodiment 1300 is merely one example of how to convert a
geographic map to a graph that may be used for analyzing
trajectories.
[0213] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0214] A map may be retrieved in block 1302. Thoroughfares may be
identified in block 1304, along with intersections of the
thoroughfares in block 1306. For each intersection in block 1308, a
node may be created in in the graph in block 1310. For each
thoroughfare in block 1312, a graph edge may be created in block
1314.
[0215] Various characteristics of the thoroughfare may be
determined in block 1316 and stored with the edge in block 1318.
Characteristics may include information such as the speed limit,
direction of travel, transportation modes, as well as distance and
other information.
[0216] The graph may be stored in block 1320. In many cases, a
different graph may be created for each mode of transportation. For
example, a graph may be created for pedestrian travel, and separate
graphs for travel by car, bus, train, ferry, bicycle, or other
transportation mode.
[0217] FIG. 14 is a flowchart illustration of an embodiment 1400
showing a method for analyzing trajectory paths. The method of
embodiment 1400 may be merely one example of how to map a set of
location observations to actual, physical locations that a user may
have traversed in their trajectory.
[0218] Other embodiments may use different sequencing, additional
or fewer steps, and different nomenclature or terminology to
accomplish similar functions. In some embodiments, various
operations or set of operations may be performed in parallel with
other operations, either in a synchronous or asynchronous manner.
The steps selected here were chosen to illustrate some principals
of operations in a simplified form.
[0219] A trajectory may be received in block 1401. From the
trajectory, a transportation mode may be determined in block 1402,
and a corresponding transportation graph may be retrieved in block
1404.
[0220] For each element in the trajectory segment in block 1406, a
set of candidate physical locations may be determined in block
1408. Weights may be applied to each candidate location in block
1410.
[0221] For some systems, a weighted probability of the candidate
locations may be applied. For example, some systems may apply a
probability factor for candidate locations that may be a high
probability near the location coordinates with a lower probability
for candidate locations further away from the location coordinates.
Such a system may be used when analyzing GPS location data.
[0222] The candidate locations may be stored in a time sequence in
block 1412.
[0223] A path through the candidate locations may be generated in
block 1414. The path may be generated by finding an optimized route
through the candidate locations, such as optimizing by time,
distance, minimum number of turns, or some other optimization
function.
[0224] Once the candidate locations may be coalesced into a path in
block 1414, a sequence of thoroughfares may be determined in block
1416. The sequence may be analyzed in block 1418 for
inconsistencies. The inconsistencies may be items such as large
changes in speed, excessive speed for certain thoroughfares, large
directional changes, or some other factor that may be inconsistent
with basic physics or otherwise unlikely to occur.
[0225] For each inconsistency in block 1420, a determination may be
made whether the inconsistency may be physically impossible in
block 1422. If the inconsistency is physically impossible in block
1424, the candidate locations causing the impossibility may be
removed from consideration in block 1426. If the inconsistency may
be physically possible in block 1424, the candidate points may be
de-emphasized in block 1428. One mechanism for de-emphasizing may
be to give the candidate location a lower probability, for
example.
[0226] After analyzing the inconsistencies in block 1420, if there
are existing inconsistencies in block 1430, the process may return
to block 1414 to re-calculate a path. Once the inconsistencies have
been eliminated in block 1430, intermediate locations may be
interpolated in to the trajectory path in block 1432 and stored as
part of a trajectory segment in block 1434.
[0227] FIG. 15 is a diagram illustration of an embodiment 1500
showing a theoretical view of a location analysis.
[0228] Three different time periods may be illustrated as time T
1502, time T+1 1504, and time T+2 1506. Time T 1502 may have a set
of candidate locations 1508, time T+1 1504 may have a set of
candidate locations 1510, and time T+2 1506 may have a set of
candidate locations 1514.
[0229] A minimizing function or other optimization technique may be
used to find an optimum path 1514 through the candidate locations.
The optimum path 1514 may represent the best fit of a path through
the candidate locations for each time period. Once the optimum path
1514 may be determined and verified for any inconsistencies, path
1514 may represent the most likely path that a user may have
traversed for the data in a trajectory.
[0230] The foregoing description of the subject matter has been
presented for purposes of illustration and description. It is not
intended to be exhaustive or to limit the subject matter to the
precise form disclosed, and other modifications and variations may
be possible in light of the above teachings. The embodiment was
chosen and described in order to best explain the principles of the
invention and its practical application to thereby enable others
skilled in the art to best utilize the invention in various
embodiments and various modifications as are suited to the
particular use contemplated. It is intended that the appended
claims be construed to include other alternative embodiments except
insofar as limited by the prior art.
* * * * *