U.S. patent application number 17/069166 was filed with the patent office on 2021-01-28 for system and method for identifying contacts of a target user in a social network.
The applicant listed for this patent is Verint Systems Ltd.. Invention is credited to Tal Beja, Zahy Bnaya, Yuval Elovici, Ariel Felner, Rami Puzis, Lior Rokach, Liron Samama, Roni Stern.
Application Number | 20210026909 17/069166 |
Document ID | / |
Family ID | 1000005147127 |
Filed Date | 2021-01-28 |
United States Patent
Application |
20210026909 |
Kind Code |
A1 |
Puzis; Rami ; et
al. |
January 28, 2021 |
SYSTEM AND METHOD FOR IDENTIFYING CONTACTS OF A TARGET USER IN A
SOCIAL NETWORK
Abstract
When using Web intelligence ("Webint") to collect information
regarding a target social network user, one of the most valuable
pieces of information is the target user's List-Of-Friends (LOF).
In some cases, however, the LOF of the target user is not
accessible in his profile. Herein are described methods and systems
for identifying the LOF of a target user. An analysis system crawls
the profiles of social network users, other than the target user,
and reconstructs the LOF of the target user from the crawled
profiles.
Inventors: |
Puzis; Rami; (Ashdod,
IL) ; Stern; Roni; (Modi'in, IL) ; Rokach;
Lior; (Omer, IL) ; Elovici; Yuval; (Arugot,
IL) ; Beja; Tal; (Ashdod, IL) ; Felner;
Ariel; (Yad Binyamin, IL) ; Bnaya; Zahy; (Tel
Aviv, IL) ; Samama; Liron; (Beer Sheva, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Verint Systems Ltd. |
Herzliya Pituach |
|
IL |
|
|
Family ID: |
1000005147127 |
Appl. No.: |
17/069166 |
Filed: |
October 13, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15454357 |
Mar 9, 2017 |
10866998 |
|
|
17069166 |
|
|
|
|
14065505 |
Oct 29, 2013 |
9646245 |
|
|
15454357 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/24575 20190101;
G06F 16/9535 20190101; G06N 20/00 20190101; G06F 16/2379 20190101;
G06Q 50/01 20130101; H04L 67/306 20130101; G06N 5/003 20130101 |
International
Class: |
G06F 16/9535 20060101
G06F016/9535; G06N 5/00 20060101 G06N005/00; G06N 20/00 20060101
G06N020/00; G06F 16/23 20060101 G06F016/23; G06F 16/2457 20060101
G06F016/2457; H04L 29/08 20060101 H04L029/08; G06Q 50/00 20060101
G06Q050/00 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 29, 2012 |
IL |
222743 |
Claims
1. A method for determining friends of a target user in a social
network, the method comprising: receiving an identity of the target
user in the social network; receiving a first list of friends of
the target user, wherein the first list of friends comprises one or
more friends of the target user in the social network; for each
friend in the first list of friends: obtaining, from a profile
associated with the friend in the social network, one or more
friends of the friend in the social network; and adding the one or
more friends of the friend in the social network to a second list
of friends; selecting a friend from the second list of fiends using
a heuristic; determining, from a profile associated with the
selected friend, that the selected friend is a friend of the target
user in the social network; and in response to the determination,
adding the selected friend to the first list of friends.
2. The method of claim 1, further comprising repeating the
selecting, determining, and adding using a new friend selected from
the second list of friends.
3. The method of claim 2, further comprising repeating until a
termination condition is met.
4. The method of claim 2, wherein the termination condition is a
number of iterations.
5. The method of claim 4, wherein the number of iterations is below
a value that triggers the social network to utilize throttling and
anti-scraping schemes.
6. The method of claim 1, wherein determining, from the profile
associated with the selected friend, that the selected friend is a
friend of the target user in the social network comprises: querying
the profile associated with the selected friend for the target user
and finding the target user.
7. The method according to claim 1, wherein the heuristic comprises
a similarity in attribute values between a profile associated with
the target user and profiles associated with each friend in the
second list of friends.
8. The method of claim 1, wherein the heuristic comprises a number
of other contacts of the target user that each friend in the second
list of friends is connected to.
9. The method of claim 1, further comprising: updating a
currently-known graph (CKG) describing the relationships of
profiles in the social network using the first list of friends.
10. The method of claim 1, further comprising: determining, from
the profile associated with the selected friend, that the selected
friend is not a friend of the target user in the social network;
and in response to the determination: removing the selected friend
from the second list of friends; and selecting another friend from
the second list of fiends using the heuristic.
11. A system for determining friends of a target user in a social
network, the system comprising: an interface, communicating with a
social network; and a processor communicatively coupled to the
interface and executing computer-readable instructions that cause
the processor to: receive an identity of the target user in the
social network; receive a first list of friends of the target user,
wherein the first list of friends comprises one or more friends of
the target user in the social network; for each friend in the first
list of friends: obtain, from a profile associated with the friend
in the social network, one or more friends of the friend in the
social network; and add the one or more friends of the friend in
social network to a second list of friends; select a friend from
the second list of fiends using a heuristic; determine, from a
profile associated with the selected friend, that the selected
friend is a friend of the target user in the social network; and in
response to the determination, add the selected friend to the first
list of friends.
12. The system of claim 11, further comprising computer-readable
instructions that cause the processor to: repeat the selecting,
determining, and adding using a new friend selected from the second
list of friends.
13. The system of claim 12, further comprising computer-readable
instructions that cause the processor to: repeat until a
termination condition is met.
14. The system of claim 12, wherein the termination condition is a
number of iterations.
15. The system of claim 14, wherein the number of iterations is
below a value that triggers the social network to utilize
throttling and anti-scraping schemes.
16. The system of claim 11, wherein determining, from the profile
associated with the selected friend, that the selected friend is a
friend of the target user in the social network comprises: querying
the profile associated with the selected friend for the target user
and finding the target user.
17. The system of claim 11, wherein the heuristic comprises a
similarity in attribute values between a profile associated with
the target user and profiles associated with each friend in the
second list of friends.
18. The system of claim 11, wherein the heuristic comprises a
number of other contacts of the target user that each friend in the
second list of friends is connected to.
19. The system of claim 11, further comprising computer-readable
instructions that cause the processor to: update a currently-known
graph (CKG) describing the relationships of profiles in the social
network using the first list of friends.
20. The system of claim 11, further comprising computer-readable
instructions that cause the processor to: determine, from the
profile associated with the selected friend, that the selected
friend is not a friend of the target user in the social network;
and in response to the determination: remove the selected friend
from the second list of friends; and select another friend from the
second list of fiends using the heuristic.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 15/454,357, filed on Mar. 9, 2017, which as a
continuation of U.S. patent application Ser. No. 14/065,505 (now
U.S. Pat. No. 9,646,245), filed on Oct. 29, 2013, and claims
foreign priority to Israel Patent Application No. IL 222743 filed
on Oct. 29, 2012. These applications and patents are hereby
incorporated by reference in their entirety.
FIELD OF THE DISCLOSURE
[0002] The present disclosure relates generally to Web
intelligence, and particularly to methods and systems for analyzing
social network activity.
BACKGROUND OF THE DISCLOSURE
[0003] In recent years, social networks such as Facebook, Twitter,
LinkedIn, YouTube and others have become a popular scene for
interaction between network users. These social networks accumulate
large amounts of information regarding the users and the
interactions between them. As such, social networks can be used for
obtaining valuable information regarding target users.
SUMMARY OF THE DISCLOSURE
[0004] An embodiment that is described herein provides a method
including receiving a request to identify a group of users of one
or more social networks, who are defined as contacts of a target
user of the social networks. One or more profiles of respective
users of the social networks, other than the target user, are
analyzed responsively to the request, and the contacts of the
target user are reconstructed from the analyzed profiles.
[0005] In an embodiment, a list of the contacts of the target user
is inaccessible in a profile of the target user. In a disclosed
embodiment, analyzing the profiles includes initially receiving a
subset of the contacts of the target user, and expanding the subset
by iteratively crawling the profiles of the contacts in the subset.
Typically, analyzing the profiles includes selecting a potential
contact of the target user from among currently-known profiles,
and, upon identifying that the potential contact is connected to
the target user, adding the potential contact to the reconstructed
group.
[0006] In some embodiments the method includes, upon identifying
that the potential contact is connected to the target user,
retrieving the contacts of the potential contact and adding the
profiles of the contacts of the potential contact to the
currently-known profiles. The method may include iteratively
selecting the potential contact, identifying whether the potential
contact is connected to the target user and updating the
currently-known profiles, until meeting a termination
condition.
[0007] In an example embodiment, selecting the potential contact
includes choosing the potential contact from among the
currently-known profiles at random. In another embodiment,
selecting the potential contact includes choosing an oldest profile
among the currently-known profiles that is not previously
selected.
[0008] In yet another embodiment, selecting the potential contact
includes choosing a profile having a maximum similarity in
attribute values to the profile of the target user among the
currently-known profiles. In still another embodiment, selecting
the potential contact includes choosing a profile that is connected
to a contact of the target user that is connected to a largest
number of the contacts of the target user.
[0009] In an embodiment, selecting the potential contact includes
defining first and second heuristics for selection of the potential
contact, training a classifier to optimally combine the first and
second heuristics using training data, and choosing the potential
contact based on the optimally-combined heuristics. Training the
classifier may include testing a default heuristic over a given
number of initial potential contacts of the target user to produce
training results, and combining the first and second heuristics
based on the training results. Alternatively, training the
classifier may include combining the first and second heuristics
based on one or more previous searches for LOFs of other target
users.
[0010] In some embodiments, selecting the potential contact
includes defining for each of the currently-known profiles a
respective score that is indicative of a likelihood of the
currently-known profile to be connected to a contact of the target
user that is connected to other contacts of the target user, and
choosing the potential contact based on respective scores of the
currently-known profiles.
[0011] There is additionally provided, in accordance with an
embodiment that is described herein, apparatus including an
interface and a processor. The interface is configured to
communicate with one or more social networks. The processor is
configured to receive a request to identify a group of users of the
social networks who are defined as contacts of a target user of the
social networks, to analyze, responsively to the request, one or
more profiles of respective users of the social networks other than
the target user, and to reconstruct the contacts of the target user
from the analyzed profiles.
[0012] The present disclosure will be more fully understood from
the following detailed description of the embodiments thereof,
taken together with the drawings in which:
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 is a block diagram that schematically illustrates a
system for social network analysis, in accordance with an
embodiment that is described herein;
[0014] FIG. 2 is a graph that schematically illustrates
relationships between profiles of a social network, in accordance
with an embodiment that is described herein; and
[0015] FIG. 3 is a flow chart that schematically illustrates a
method for identifying a List-Of-Friends (LOF) of a target social
network user, in accordance with an embodiment that is described
herein.
DETAILED DESCRIPTION OF EMBODIMENTS
Overview
[0016] When using Web intelligence ("Webint") to collect
information regarding a target social network user, one of the most
valuable pieces of information is the target user's List-Of-Friends
(LOF). The LOF lists the social network users who are defined as
contacts of the target user, and is typically stored as part of the
target user's profile. In various networks the contacts are also
referred to as friends, links, connections, followers, or similar
terms.
[0017] In some cases, however, the LOF of the target user is not
accessible in his profile, e.g., because of the privacy policy
defined for the profile. This scenario is particularly common for
users who try to hide the identities of their contacts. Certain
aspects of profiling target users based on social network analysis
are also addressed in Israel Patent Application 216059, filed Oct.
31, 2011, which is assigned to the assignee of the present patent
application and whose disclosure is incorporated herein by
reference.
[0018] Embodiments that are described herein provide improved
methods and systems for identifying the LOF of a target user. In
the disclosed embodiments, an analysis system crawls the profiles
of social network users, other than the target user, and
reconstructs the LOF of the target user from the crawled profiles.
This technique is effective since the LOFs of the target user's
contacts may not be hidden.
[0019] Several example techniques for analyzing the contact
environment of the target user are described herein. In a typical
flow, the system is provided with an initial subset of the target
user's contacts, referred to as initial leads. (The terms "leads"
and "contacts of the target user" are used interchangeably herein.)
The system acquires the LOFs of the initial leads, and defines the
users listed in these LOFs as "potential leads," i.e., potential
contacts of the target user. The system then attempts to identify
which of the potential leads are indeed contacts of the target user
("leads"). The newly-identified leads are added to the initial
leads, and the process continues iteratively in a similar
manner.
[0020] When searching for new leads in order to reconstruct the LOF
of the target user, it is important to define how to select
potential leads for evaluation from the increasingly-growing number
of new potential leads. The order in which potential leads are
evaluated has a considerable impact on the search performance,
i.e., on the ability to identify a maximum number of genuine leads
within allocated time and/or resources.
[0021] In particular, the disclosed techniques are highly effective
in reconstructing the LOF of a target user while crawling and
accessing only a small number of user profiles. As such, these
techniques are useful in minimizing the exposure to throttling and
anti-scraping schemes that may be used by the social networks. In
addition, the disclosed techniques are useful in reducing the
computational resources needed for the search.
[0022] Several example techniques and heuristics for prioritizing
the potential leads are described herein. Some disclosed techniques
apply a "best first search" approach, in which the potential lead
that best matches some quality criterion is selected next for
evaluation. Several such criteria and heuristics are defined and
demonstrated.
[0023] In summary, the methods and systems described herein
reconstruct the LOF of a target user, even though the LOF is not
accessible in the target user's profile. The example search
processes and heuristics described herein achieve high efficiency
in reconstructing the target user LOF with limited resources and
limited exposure. The disclosed techniques thus improve the quality
of information collected by Web intelligence applications.
System Description
[0024] FIG. 1 is a block diagram that schematically illustrates a
system 20 for social network analysis, in accordance with an
embodiment that is described herein. System 20 interacts with one
or more social networks and reconstructs the Lists-Of-Friends
(LOFs) of target users. The reconstructed LOFs are presented to an
operator 24.
[0025] Systems of this sort may be used, for example, by government
and law enforcement agencies for collecting information on suspect
individuals such as terrorists or criminals. Another application of
such systems is in detecting and preventing fraud in banks and
other financial institutions. For example, a financial institution
may use a system such as system 20 as part of a "Know Your
Customer" (KYC) scheme for collecting background information on
customers.
[0026] System 20 is connected to a communication network 28,
typically the Internet. In particular, system 20 crawls profiles of
users 32 of social networks. The profiles are typically stored on
servers 36 of the social networks. The social networks in question
may comprise, for example, Facebook, Twitter, LinkedIn, Google+,
YouTube, or any other suitable social network application.
[0027] Typically, the social network servers maintain user profiles
of the various social network users. For each user, the profile
holds various information items. In particular, the profile of each
user comprises a List-Of-Friends (LOF), i.e., a list of other users
(sometimes nicknamed "friends," "links," "connections" or
"followers") who are connected to the user. The profile may also
comprise information such as the user's pages, personal
information, communication interactions of the user, sent and
received contact requests, user preferences, user location, images,
affiliation of the user with affiliation groups, updates or
modifications performed in the user profile over time, interactions
of the user with various social network pages and/or any other
suitable information.
[0028] In the example configuration of FIG. 1, system 20 comprises
an interface 40 for communicating with the social network, and a
processor 44 that carries out the disclosed techniques. Interface
40 may comprise, for example, a network probe. Processor 44
comprises a crawler 48 that crawls the social network. A homing
agent 60 defines the information (e.g., user profiles) to be
crawled by crawler 48. A probe manager 52 controls crawler 48,
e.g., prioritizes the crawling tasks defined by agent 60 for
crawler 48.
[0029] A metadata extraction unit 56 extracts the desired
information from the output of crawler 48, and also carries out
tasks such as normalization of information obtained from different
social networks to a common data model. A profile reassembly unit
64 reassembles user profiles from partial information gathered by
crawler 44.
[0030] An entity database 68 stores the reassembled information,
e.g., user profiles and LOFs. The information in database 68 is
also used for defining crawling tasks by homing agent 60. A user
gateway serves as an interface between processor 44 and operator
24.
[0031] The configuration of system 20 shown in FIG. 1 is an example
system configuration, which is chosen purely for the sake of
conceptual clarity. In alternative embodiments, any other suitable
system configuration can also be used. In the description that
follows, the methods described herein are referred to generally as
being performed by processor 44, regardless of the specific
processor component that performs each task. The functions of
processor 44 may be carried out by any desired number of computing
platforms, such as computers, servers or gateways.
[0032] The elements of system 20, such as processor 44 and its
various components, may be implemented in hardware, in software, or
using a combination of hardware and software elements. Database 68
may be implemented using any suitable storage devices, such as
solid state or magnetic disks.
[0033] In some embodiments, certain functions of processor 44 can
be implemented using one or more general-purpose processors, which
are programmed in software to carry out the functions described
herein. The software may be downloaded to the processors in
electronic form, over a network, for example, or it may,
alternatively or additionally, be provided and/or stored on
non-transitory tangible media, such as magnetic, optical, or
electronic memory.
Reconstructing Hidden LOF of a Target User
[0034] FIG. 2 is a graph that schematically illustrates
relationships among user profiles 82 of a social network, in
accordance with an embodiment that is described herein. Each
profile 82 comprises a respective LOF 83 that lists the contacts of
that user. Arcs 86 in the figure represent contact relationships: A
pair of profiles is connected by an arc if at least one of them
appears in the LOF of the other.
[0035] In the present example, a profile 84 of a predefined target
user is shown at the center of the figure. Profile 84 is referred
to as a target profile. The terms "target user" and "target
profile" are sometimes used interchangeably. LOF 83 of the target
user is inaccessible to other users (and thus to system 20), for
example because the target user set the privacy policy of his
profile so as to hide the LOF. In other cases, the target user may
not have a profile at all. In many cases, however, it is important
for system 20 to reconstruct the LOF of target profile 84, in order
to gain valuable information regarding the target user and his
contacts.
[0036] In some embodiments, processor 44 reconstructs the LOF of
the target profile by crawling other profiles 82 of the social
network. In the disclosed embodiments, processor 44 carries out an
iterative search process that examines the LOFs 83 of various
profiles 82 in order to identify contacts ("leads") of the target
user.
[0037] When crawling the user profiles, processor 44 is able to
perform two types of queries, referred to as a focused query and a
full acquisition. A focused query is applied to a pair of profiles
P1 and P2, and returns "TRUE" if P1 and P2 are connected (i.e., if
P1 appears in the LOF of P2 and/or P2 appears in the LOF of P2).
Otherwise, the focused query returns "FALSE". A full acquisition is
applied to a profile P and returns the complete LOF of P.
[0038] In some embodiments, the goal of the search process is to
recover and fully acquire all the elements of LOF 83 of target
profile 84 (or other predefined number of elements) with a smallest
possible number of focused queries and full acquisitions. In other
embodiments, the goal may be to recover and fully acquire as many
elements of the target user LOF as possible, within a certain
permitted number of focused queries and full acquisitions. Further
alternatively, the goal may be defined in any other suitable way.
The description that follows refers mainly to the latter goal, but
the disclosed techniques are generally suitable for other goal
definitions, as well.
[0039] In the disclosed embodiments, the search is performed under
several assumptions. First, focused queries are used only to
discover whether a certain profile is connected to the target
profile or not. Second, if a focused query reveals that the queried
profile is connected to the target profile, the queried profile is
fully acquired. Third, only profiles that are known to be leads
(i.e., connected to the target profile) are fully acquired.
[0040] At any given time during the search, processor 44 maintains
the current state of the search. The state defines the part of the
social network connectivity graph (e.g., FIG. 2) that was
discovered so far. This part is referred to as the Currently-Known
Graph (CKG). In addition, the state indicates which profiles in the
CKG have been fully acquired and which have not.
[0041] In some embodiments, processor 44 carries out the search
using "best first search" framework. In an example "best first
search" implementation, processor 44 maintains two priority lists
or other data structures denoted OPEN and CLOSED. (In some
embodiments, although not necessarily, the OPEN list comprises a
priority queue, and the CLOSED list comprises a searchable set such
as a hash table.) The CLOSED list is initially empty, and the OPEN
queue initially comprises a certain initial subset of one or more
known leads. First, the initial leads are fully acquired, and the
profiles that appear in the LOFs of the initial leads are added to
the OPEN queue. The profiles added to the OPEN list are referred to
as "potential leads," since it is not yet known whether they are
connected to the target profile or not.
[0042] In each iteration, a "best" profile is selected from the
potential leads in the OPEN queue, according to some selection
heuristic. The selected best profile is put in the CLOSED list,
where a focused query is applied to it. If the best node is found
to be a lead, it is fully acquired, and the newly-generated
profiles are added as potential leads to the OPEN queue. The CKG is
updated as necessary. The iterative search continues in this manner
until some termination condition is met, e.g., until the permitted
number of focused queries and full acquisitions is exhausted.
[0043] FIG. 3 is a flow chart that schematically illustrates a
method for identifying a List-Of-Friends (LOF) of a target social
network user, in accordance with an embodiment that is described
herein. The method begins with processor 44 receiving a request to
reconstruct the LOF of a target user, at a request step 90. The
assumption is that the LOF of the target user is hidden and
inaccessible in the target profile. The request typically comprises
the identity of the target user, and one or more initial leads
(i.e., one or more known elements of the target user LOF).
[0044] At an initial full acquisition step 94, processor 44
performs full acquisition on the initial leads, i.e., obtains the
LOFs of the initial leads. At this point, the potential leads in
the OPEN queue comprise the profiles that appear in the LOFs of the
initial leads. Processor 44 updates the CKG as needed to reflect
the results of the full acquisitions.
[0045] Processor 44 now selects a "best" profile from among the
potential leads, at a best profile selection step 98. Various
heuristics and criteria can be used for this purpose. Several
example heuristics are given and explained further below.
[0046] Processor 44 then checks whether the selected best profile
is connected to the target profile, i.e., whether the selected
potential lead is indeed a lead, at a lead checking step 102. The
processor performs a focused query on the selected best profile for
this purpose.
[0047] If the selected best profile is found to be a lead,
processor 44 adds it to the list of known leads, i.e., to the
reconstructed LOF of the target profile, at a lead adding step 106.
Processor 44 then performs full acquisition on the selected best
profile in order to obtain its LOF, at a full acquisition step 110.
If necessary, processor 44 updates the CKG with the results of the
full acquisition, at a CKG updating step 114.
[0048] Processor 44 checks whether the applicable termination
condition is met, at a termination checking step 118. For example,
the processor may check if the permitted number of focused queries
and full acquisitions is reached. If the termination condition is
met, processor 44 outputs the reconstructed LOF of the target user,
at an output step 122, and the method terminates. Otherwise, the
method loops back to step 98 above for selecting the next best
profile for evaluation.
[0049] If, at step 102, processor 44 finds that the selected best
profile is not a lead, steps 106-114 are skipped, and the method
moves directly to termination checking step 118.
[0050] The search strategy and flow of FIG. 3 are chosen purely by
way of example. In alternative embodiments, any other suitable
search process, heuristics and criteria can be used.
Example Heuristics for Selecting the "Best" Profile in the
Best-First Search
[0051] As noted above, the criteria of heuristic used by processor
in selecting the "best" potential lead for evaluation (step 98) has
a significant impact on the search performance. In various
embodiments, processor 44 may select the best potential lead using
different criteria or heuristics.
[0052] In one embodiment, processor 44 chooses the best profile at
random from among the profiles that are present in the OPEN queue.
In another embodiment, processor 44 manages the OPEN queue as a
First-In First-Out (FIFO) queue, i.e., orders the profiles in
accordance with the order in which they were added to the OPEN
queue. In this embodiment, processor 44 selects for evaluation the
oldest profile in the OPEN queue that was not yet selected.
[0053] Other example selection heuristics are referred to as
homophily-based heuristics. These heuristics are based on the
assumption that users are more likely to be connected to users who
share similar attributes, such as ethnic affiliation, education
level, profession or other attributes. Following this assumption,
potential leads that have similar attribute values to the target
profile are more likely to be genuine leads.
[0054] Thus, in some embodiments processor 44 retrieves the various
attribute values of the potential leads in the OPEN queue. The
processor assigns each potential lead a respective homophily score,
which quantifies the similarity in selected attribute values
between the potential lead and the profile of the target user. The
processor chooses the potential lead having the highest homophily
score as the best profile for evaluation. In an example embodiment,
when analyzing a given profile, the processor assigns a respective
similarity score for each attribute, and sums the similarity scores
to produce the homophily score of the profile.
[0055] Yet another example heuristic is referred to as a promising
heuristic. This heuristic is based on the assumption that a
profile, which is connected to a lead that was found to be
connected to many other leads, is more likely to be a lead than a
profile that is connected to a lead that is connected to only few
leads. In other words, this heuristic gives precedence to profiles
that have better promise to lead to future, undiscovered leads.
[0056] Processor 44 may define and evaluate various heuristics
based on this principle. In an example embodiment, processor 44
maintains for every lead L two counters referred to as a "promise
counter" and an "upsetting counter." The promise counter (denoted
p) counts the number of contacts of L that are known to be leads.
The upsetting counter (denoted u) counts the number of contacts of
L that are known not to be leads. Processor 44 calculates for every
lead L a "promise factor" P(L) that is given by P(L)=p/(p+u) if
p+u>0, and P(L)=0.5 otherwise. The score of each potential lead
n is an aggregation over the known leads to which this potential
lead is connected. The aggregation is given by
Score ( n ) = 1 - m neighbor of n ( 1 - P ( m ) ) ##EQU00001##
[0057] and the potential lead having the highest score is selected
as the best profile for evaluation.
[0058] Both the homophily heuristic and the promising heuristic
described above are highly effective in some scenarios and less
effective in other scenarios. In some embodiments, processor 44
combines the two heuristics using a machine learning process.
[0059] Each of the two heuristics is defined as a function that
assigns respective scores to profiles, such that the profile
(potential lead) having the highest score is selected as the best
profile. In some embodiments, processor 44 uses the scores of the
heuristics as features for a classifier, which classifies whether
(and to what extent) a profile having these scores is expected to
be a lead. Typically, processor 44 uses a classifier that outputs a
soft confidence level (a certainty that the classification is
correct) rather than a binary decision. Processor 44 uses this
confidence level as a scoring function for profiles. In other
words, processor 44 selects the profile having the best confidence
level of being a lead, as output by the classifier, as the best
profile for evaluation.
[0060] In an example embodiment, processor 44 trains the classifier
using on-line learning. In this embodiment, the on-line learning
scheme uses two parameters--A default heuristic h and a cold-start
number C. The first C profiles (potential leads) are selected for
evaluation using the default heuristic h. Since focused querying is
applied to these C profiles, it is known which of them are leads
and which are not. As such, the first C profiles can be used as a
training set for training a classifier. Processor 44 performs this
training before selecting the (C+1).sup.th profile.
[0061] In a subsequent iteration of the best-first search, before
choosing the best potential lead for evaluation, a new classifier
is learnt using all the previously evaluated potential leads as the
training set. The resulting classifier is then used for choosing
the next potential lead for evaluation. In some embodiments, it is
possible to reduce computational complexity by learning a new
classifier every K iterations, where K is a parameter. The
performance of the on-line learning scheme typically depends on the
default heuristic h, on C and possibly on K.
[0062] In an alternative embodiment, processor 44 trains the
classifier using off-line learning. In this embodiment, the
assumption is that processor 44 has access to information gathered
during M previous solution instances, i.e., M previous sessions of
finding the LOFs of target users. Every such instance has a target
profile and a set of initial leads. The assumption is that, for
every such instance, processor 44 has access to the potential leads
that were chosen for evaluation during the previous runs, including
information as to which of these potential leads indeed was a
genuine lead. Processor 44 uses the information for the previous
runs as a training set for training a classifier. The resulting
classifier is then used for choosing the next potential lead for
evaluation.
[0063] In an embodiment, the off-line learning scheme can be
combined the on-line learning scheme, for example by using the
off-line-learned heuristic as the default heuristic in the on-line
scheme, and using the newly acquired profiles with the off-line
training set to produce a larger training set.
[0064] When using the machine learning schemes described above, the
classifier may be biased by the fact that the training set contains
a considerably larger number of profiles that are not leads than
profiles that are leads. In some embodiments, processor 44 uses
under-sampling or over-sampling methods in order to balance the
training set and avoid such a bias.
[0065] Various balancing techniques are known in the art, and any
suitable method can be used by processor 44. One example is the
SMOTE algorithm, which is described, for example, by Chawla et al.,
in "SMOTE: Synthetic Minority Over-sampling Technique," Journal of
Artificial Intelligence Research, volume 16 (2002), pages 321-357,
which is incorporated herein by reference.
[0066] In the examples above, the features of a profile (used for
classifier training) are the scores given to the profile by a set
of heuristics. In alternative embodiments, it is possible to use
the attribute similarity score described above as a feature of the
profiles.
[0067] Although the embodiments described herein mainly address
reconstruction of LOF of a target profile, the principles of the
present disclosure can also be used for finding other sorts of
acquaintances of a target person or members of a terrorist
organization.
[0068] It will thus be appreciated that the embodiments described
above are cited by way of example, and that the present disclosure
is not limited to what has been particularly shown and described
hereinabove. Rather, the scope of the present disclosure includes
both combinations and sub-combinations of the various features
described hereinabove, as well as variations and modifications
thereof which would occur to persons skilled in the art upon
reading the foregoing description and which are not disclosed in
the prior art. Documents incorporated by reference in the present
patent application are to be considered an integral part of the
application except that to the extent any terms are defined in
these incorporated documents in a manner that conflicts with the
definitions made explicitly or implicitly in the present
specification, only the definitions in the present specification
should be considered.
* * * * *