U.S. patent application number 15/165486 was filed with the patent office on 2016-09-29 for method and system for matching data sets of non-standard formats.
The applicant listed for this patent is Careerbuilder, LLC. Invention is credited to Andrew B. Cranfill, Jason Elliott.
Application Number | 20160283906 15/165486 |
Document ID | / |
Family ID | 47632042 |
Filed Date | 2016-09-29 |
United States Patent
Application |
20160283906 |
Kind Code |
A1 |
Cranfill; Andrew B. ; et
al. |
September 29, 2016 |
METHOD AND SYSTEM FOR MATCHING DATA SETS OF NON-STANDARD
FORMATS
Abstract
A computer-based system and method is described for converting
non-standardized resumes and job listings into standardized
profiles that can be easily searched, compared and referenced.
Attributes are identified within the resumes and job listings, and
are evaluated for various features. Each resume or job listing is
broken down into its component parts and analyzed based on a
logic-based routine to identify and package meaningful content.
Inventors: |
Cranfill; Andrew B.;
(Roswell, GA) ; Elliott; Jason; (Roswell,
GA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Careerbuilder, LLC |
Chicago |
IL |
US |
|
|
Family ID: |
47632042 |
Appl. No.: |
15/165486 |
Filed: |
May 26, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13744715 |
Jan 18, 2013 |
9355151 |
|
|
15165486 |
|
|
|
|
11869570 |
Oct 9, 2007 |
8375026 |
|
|
13744715 |
|
|
|
|
11835994 |
Aug 8, 2007 |
8103679 |
|
|
11869570 |
|
|
|
|
11622572 |
Jan 12, 2007 |
|
|
|
11835994 |
|
|
|
|
60759242 |
Jan 13, 2006 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/334 20190101;
G06Q 10/105 20130101; G06Q 10/1053 20130101; G06Q 10/063112
20130101; G06F 16/24578 20190101; G06F 16/337 20190101 |
International
Class: |
G06Q 10/10 20060101
G06Q010/10; G06Q 10/06 20060101 G06Q010/06; G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer-driven method for automatically assessing the
similarity between a subject non-standardized data set and at least
a first target non-standardized data set, the method comprising the
steps of: receiving a first subject data file containing the
subject non-standardized data set; using a computer processor
programmed to manipulate non-standardized data sets, electronically
generating a second subject data file containing a standardized
subject profile, wherein the standardized subject profile is
comprised of a plurality of subject attributes that are
collectively representative of the information contained within the
first subject data file; receiving to the computer processor a
first target data file containing the first target non-standardized
data set; using the computer processor programmed to manipulate
non-standardized data sets, electronically generating a second
target data file containing a standardized first target profile,
wherein the standardized first target profile is comprised of a
plurality of target attributes that are collectively representative
of the information contained within the first target data file;
rank ordering each individual attribute from the plurality of
subject attributes against the remaining subject attributes based
on how well the individual attribute is representative of the
entire subject non-standardized data set; rank ordering each
individual attribute from the plurality of target attributes
against the remaining target attributes based on how well the
individual attribute is representative of the entire first target
non-standardized data set; identifying all first target matching
attributes, wherein each first target matching attribute is
comprised of a subject attribute from the plurality of subject
attributes and an identical target attribute from the plurality of
target attributes comprising the first target profile; assigning
first target points to the subject profile based on the rank order
of all subject attributes from the plurality of subject attributes
that are components of a first target matching attribute; assigning
first target points to the first target profile based on the rank
order of all target attributes from the plurality of target
attributes comprising the first target profile that are components
of a first target matching attribute; and adding the first target
points assigned to the subject profile and the first target profile
to determine a first target total matching score representative of
the similarity between the subject non-standardized data set and
the first target non-standardized data set.
2. The method of claim 1, wherein the criterion indicative of the
relative ability of each individual attribute from among the
plurality of subject attributes to accurately characterize the
subject non-standardized data set is the location within the
subject non-standardized data set at which the information
represented by the attribute from among the plurality of subject
attributes to be rank ordered first appears.
3. The method of claim 1, wherein the criterion indicative of the
relative ability of each individual attribute from the plurality of
subject attributes to accurately characterize the subject
non-standardized data set is the number of times that the
information represented by an attribute to be rank ordered is
repeated within the subject non-standardized data set.
4. The method of claim 1, wherein the subject non-standardized data
set represents a job description and the first target
non-standardized data set represents a resume of a job
applicant.
5. The method of claim 1, wherein the subject non-standardized data
set represents a resume of a job applicant and the first target
non-standardized data set represents a job description.
6. The method of claim 1, wherein the first target non-standardized
data set represents a listing type from the set of listing types
consisting of: real estate listings, classified advertisement
listings, used car listings, and dating service listings.
7. The method of claim 1, further comprising the steps of:
receiving a third target data file containing a second target
non-standardized data set; using the computer processor programmed
to manipulate non-standardized data sets, electronically generating
a fourth target data file containing a standardized second target
profile, wherein the standardized second target profile is
comprised of a plurality of target attributes that are collectively
representative of the information contained within the third target
data file; rank ordering the plurality of target attributes
comprising the second target profile based on at least a criterion
indicative of the relative ability of each individual attribute
from the plurality of target attributes comprising the second
target profile to accurately characterize the second target
non-standardized data set; determining all second target matching
attributes, wherein each second target matching attribute is
comprised of a subject attribute from the plurality of subject
attributes and an identical target attribute from the plurality of
target attributes comprising the second target profile; assigning
second target points to the subject profile based on the rank order
of any subject attribute from among the plurality of subject
attributes that is a component of a second target matching
attribute; assigning second target points to the second target
profile based on the rank order of any target attribute from the
plurality of target attributes comprising the second target profile
that is a component of a second target matching attribute; and
adding the second target points assigned to the subject profile and
the second target profile to determine a second target total
matching score representative of the similarity between the subject
non-standardized data set and the second target non-standardized
data set.
8. The method of claim 7, further comprising the step of
recommending the first target non-standardized data set as a better
match for the subject non-standardized data set than the second
target non-standardized data set if the first target total matching
score is greater than the second target total matching score.
9. The method of claim 1, wherein the number of first target points
assigned to the subject profile for a given subject attribute from
among the plurality of subject attributes that is a component of a
first target matching attribute is equivalent to the total number
of subject attributes comprising the plurality of subject
attributes minus the rank order of the given subject attribute.
10. The method of claim 1, wherein the number of first target
points assigned to the subject profile for a given subject
attribute from among the plurality of subject attributes that is a
component of a first target matching attribute is equivalent to the
square of the total number of subject attributes comprising the
plurality of subject attributes minus the rank order of the given
subject attribute.
11. A job applicant selection system for evaluating a resume of a
job applicant based on a score that is indicative of the job
applicant's fit for a job having a non-standardized job
description, said job applicant selection system comprising: a
profile generation module to: convert the resume into a target
profile comprised of a plurality of target attributes ranked in
order of importance to the resume, each of the plurality of target
attributes having a target rank order based at least partially upon
a relationship between each one of the target attributes with
respect to each of the other target attributes, and convert the
non-standardized job description into a subject profile comprised
of a plurality of subject attributes ranked in order of importance
to the job, each of the plurality of subject attributes having a
subject rank order based at least partially upon a relationship
between each one of the subject attributes with respect to each of
the other subject attributes; a comparison module to compare the
target profile to the subject profile to identify matched
attributes; and a scoring module to: assign numbers to each matched
attribute in the target profile based on the target rank order of
the matched attribute within the target profile and, separately, to
each matched attribute in the subject profile based on the subject
rank order of the matched attribute within the subject profile; and
sum the numbers assigned to all matched attributes in both the
target profile and the subject profile to arrive at a total
matching score.
12. The system of claim 11, wherein the scoring module is to:
compare the total matching score to a minimum threshold matching
score; and in response to the total matching score being less than
the minimum threshold matching score, discard the resume.
13. The system of claim 11, wherein the scoring module is to:
normalize the total matching score; compare the normalized total
matching score to a minimum threshold matching score; and in
response to the normalized total matching score being less than the
minimum threshold matching score, discard the resume.
14. The system of claim 13, wherein to normalize the total matching
score, the scoring module is to divide the total matching score by
a objective total matching score, the objective total matching
score being the total matching score that would be calculated if
the resume were to exactly matched the non-standardized job
description.
15. The system of claim 13, wherein the number assigned to each
matched attribute in the target profile is equivalent to the total
number of target attributes comprising the plurality of target
attributes minus the rank order of the given target attribute.
16. The system of claim 13, wherein the number assigned to each
matched attribute in the target profile is equivalent to the square
of the total number of target attributes comprising the plurality
of target attributes minus the rank order of the given target
attribute.
17. The system of claim 13, wherein: the profile generation module
is to convert a second resume into a second target profile
comprised of a plurality of second target attributes ranked in
order of importance to the second resume, each of the plurality of
second target attributes having the target rank order based at
least partially upon a relationship between each one of the second
target attributes with respect to each of the other second target
attributes; the comparison module is to compare the second target
profile to the subject profile to identify matched attributes; and
the scoring module is to: assign second numbers to each matched
attribute in the second target profile based on the target rank
order of the matched attribute within the second target profile
and, separately, to each matched attribute in the subject profile
based on the subject rank order of the matched attribute within the
subject profile; and sum the second numbers assigned to all matched
attributes in both the second target profile and the subject
profile to arrive at a second total matching score.
18. The system of claim 17, wherein the scoring module is to assign
ranked orders to the resume and the second resume based on the
relative difference between the total matching score and the second
total matching score.
19. The system of claim 17, wherein the scoring module is to
normalize the total matching score and the second total matching
score based on a objective total matching score, the objective
total matching score being the total matching score that would be
calculated if the resume were to exactly matched the
non-standardized job description.
20. The system of claim 19, wherein the scoring module is to:
assign ranked orders to the resume and the second resume based on
the relative difference between the normalized total matching score
and the normalized second total matching score; and recommend one
of the resume or the second resume based on which one of the resume
and the second resume that has a higher ranked order.
Description
CROSS-REFERENCE
[0001] This application is a continuation of U.S. patent
application Ser. No. 13/744,715, filed on Jan. 18, 2013 (to issue
as U.S. Pat. No. 9,355,151 on May 31, 2016), which is a
continuation of U.S. patent application Ser. No. 11/869,570, filed
on Oct. 9, 2007 (issued as U.S. Pat. No. 8,375,026 on Feb. 12,
2013), which is a continuation-in-part of U.S. patent application
Ser. No. 11/835,994, filed on Aug. 8, 2007 (issued as U.S. Pat. No.
8,103,679 on Jan. 24, 2012), which is a continuation-in-part of
U.S. patent application Ser. No. 11/622,572 filed on Jan. 12, 2007
(now abandoned), which was a non-provisional patent application
claiming priority to U.S. provisional application No. 60/759,242
filed on Jan. 13, 2006. These prior applications are incorporated
herein in their entirety by reference.
BACKGROUND OF THE INVENTION
[0002] This invention relates generally to a method and system for
receiving a plurality of non-standardized data sets and generating
respective standardized profiles 80 that can be used for
efficiently comparing and matching the data sets.
[0003] One application for the current invention is providing
online recruiting services, and more specifically, for converting
job seekers' resumes on the one hand and job postings on the other
hand into standardized profiles, which can be compared and matched
to one another. Conventional online recruiting systems permit
employers to create job posting for available positions and permit
job seekers to post their resumes. Conventional online recruiting
systems have also permitted job seekers to browse or conduct
keywords searches through available job postings and submit their
resumes for specific jobs. Conversely, these systems have also
permitted employers to browse or conduct keyword searches through
available candidate resumes. However, the task of browsing for
candidate resumes or job postings is time consuming and can be a
hit-or-miss proposition for both the job seeker and the employer.
While conducting targeted keyword searches may reduce the total
number of job postings or resumes, the only way to find the most
suitable match is to review and evaluate each resume or job posting
individually.
SUMMARY OF THE INVENTION
[0004] A system and method is described for receiving a plurality
of non-standardized data sets and generating respective
standardized profiles that can be used for efficiently comparing
and matching the data sets. One application of this invention is to
convert job seekers' resumes and job postings into respective
standardized profiles and then ranking the standardized profiles
according to their suitability for a particular job posting.
Generally, the system includes a remote computer, which is
connected to a server computer via a network system or the Internet
and which is capable of exchanging files and information with the
server computer.
[0005] A better understanding of the objects, advantages, features,
properties and relationships of the invention will be obtained from
the following detailed description and accompanying drawings which
set forth an illustrative embodiment and which are indicative of
the various ways in which the principles of the invention may be
employed.
BRIEF DESCRIPTION OF DRAWINGS
[0006] For a better understanding of the invention, reference may
be had to the following Appendices, which further describe a
preferred embodiment of the present invention and which include
drawings and exemplary screen shots therefore:
[0007] FIG. 1 is a diagram depicting a computer network on which an
embodiment of the invention may be operated.
[0008] FIG. 2 is a sample graphical user interface of one screen
employed by the present invention.
[0009] FIG. 3 illustrates an exemplary data set in the form of a
job posting.
[0010] FIG. 4 illustrates an exemplary data set in the form of a
candidate resume.
[0011] FIG. 5A-5B illustrates an illustrative band array generated
from the data set shown in FIG. 4.
[0012] FIG. 6 illustrates the steps for parsing a data set into
bands.
[0013] FIGS. 7A-7D illustrate an illustrative word array generated
from the data set shown in FIG. 4.
[0014] FIG. 8 illustrates the steps for parsing the band array of
FIG. 4 into a word array shown in FIGS. 7A-7D.
[0015] FIG. 9 illustrates an excerpt of a substitute database, as
used in the present invention.
[0016] FIG. 10 illustrates the steps for evaluating words for entry
into the attribute array.
[0017] FIG. 11 depicts an excerpt from the common word database as
used in the present invention.
[0018] FIG. 12 illustrates an excerpt of the attribute dictionary,
as used in the present invention.
[0019] FIGS. 13A-13C illustrate an exemplary attribute array
generated from the data set shown in FIG. 4 according to the
present invention.
[0020] FIG. 14 illustrates the steps for entering a word or phrase
into the attribute array.
[0021] FIG. 15 illustrates an excerpt from an exemplary pod, as
used in the present invention.
[0022] FIG. 16 illustrates the steps for calculating support values
and ranking the attributes within the profile.
[0023] FIG. 17 illustrates an exemplary profile generated from the
data set shown in FIG. 4 according to the present invention.
[0024] FIG. 18 illustrates a recommendation engine, as used in the
present invention.
[0025] FIG. 19 illustrates the profile matching conducted by the
recommendation engine shown in FIG. 18.
[0026] FIGS. 20A-20E illustrate an exemplary tagged job posting
featuring an embodiment employing the pond, job level, and
education level.
[0027] FIG. 21 illustrates an exemplary tagged resume featuring an
embodiment employing the pond, job level, and education level.
[0028] FIG. 22A-22B illustrate any exemplary recommendation page
featuring dynamic pods corresponding to a profile.
DETAILED DESCRIPTION
[0029] Turning now to the Figures, wherein like reference numerals
refer to like elements, there is illustrated a system and method
for receiving a plurality of non-standardized data sets and
generating respective standardized profiles 80 that can be used for
efficiently comparing and matching the data sets. The system
permits users to use the standardized profiles 80 to compare and
match various data sets.
[0030] As will be described, each data set is processed to (A)
parse the data set into bands 92; (B) identify attributes 70a, 70b,
70c, etc., such as concepts 85 or titles 87 related to the data
set; (C) identify the band 92 in which each attribute 70 is first
found; (D) identify the number of occurrences 108 in which each
attribute is associated with each data set; and (E) identify what
support 140 is present in the rest of each data set for each
attribute 70. The results provided in an array 25c can then be
weighted to create a profile 80. For example, all of the attributes
70a, 70b, 70c, etc. can be ranked depending on one or more metrics
90a, 90b, 90c, etc., which are described herein. The metrics 90a,
90b, 90c, etc. may include band 92, occurrences 108, support 140 or
various combinations of all three metrics.
The System
[0031] Although not required, the system and method will be
described in the general context of a computer network 20, as is
well know in the industry, and computer executable instructions
being executed by general purpose computing devices within the
computer network 20. Referring to FIG. 1, in this regard, the
general purpose computing devices may comprise one or more server
computers 22a hosting a data set software application. If there are
multiple server computers 22a, they may interface via a network or
serial interface either directly or over the Internet or other
local or wide area network. The server computer 22a can also
include one or more databases for storing data sets. Data sets can
include resume information, job-posting information, personal
profile information, housing information, or any other data sets
for which it would be advantageous to compare one data set against
other data sets to select appropriate matches. In the context of
recruiting services, data sets may include (1) detailed information
about a prospective applicant, such as, previous job history,
experience, education, and job-search criteria, or (2) information
about an employer or possible job posting, such as, hiring
criteria, educational and skill qualifications, location, and
employee benefits. It should be appreciated that the network
components could be described as having client and server
relationships, as generally known in the art.
[0032] To allow each user having a client computer 22b to access
and utilize the data matching system, the software application will
reside on the server computer(s) 22a. Further, it is preferable
that client users access the software application via an internet
browser, which acts as an interface between the software
application and the operating system for the server computer 22a.
The operating system for the server computer 22a and the client
computer 22b may be Windows.RTM.-based or could employ any one of
the currently existing operating systems, such as LINUX.RTM., MAC
OS.RTM., Mozilla.RTM., etc. In addition, it should be appreciated
by those with skill in the art that other applications besides the
browser may also be utilized to act as an interface between the
software application and the server computers 22a.
[0033] For editing, populating and maintaining the databases, the
browser includes a graphical user interface 50. As shown in FIG. 2,
the graphical user interface 50 is further comprised of various
menu bars, drop-down menus, buttons and display windows.
[0034] As will be appreciated by those of skill in the art, the
computers 22a, 22b need not be limited to personal computers, but
may include hand-held devices, multiprocessor systems,
microprocessor-based or programmable consumer electronics,
minicomputers, mainframe computers, personal digital assistants,
cellular telephones or the like depending upon their intended end
use within the system. For performing the procedures described
hereinafter, the computer executable instructions may be written as
routines, programs, objects, components, and/or data structures
that perform particular tasks. Within the computer network 20, the
computer executable instructions may reside on a single computer
22, a server computer 22a, a client computer 22b, or the tasks
performed by the computer executable instructions may be
distributed among any combination of those computers 22, 22a, 22b.
Therefore, while described in the context of a computer network, it
should also be understood that the present invention may be
embodied in a stand-alone, general purpose computing device that
need not be connected to a network.
[0035] To efficiently provide users with access to the software
application 30, the server computers 22a and the underlying
framework for the computer network 20 may be provided by the
service company itself or by outsourcing the hosting to an
application service provider ("ASP"). ASP's are companies that
provide server computers that store and run a software application
on behalf of a third party, which is accessible to that party's
users via the Internet or similar means. Therefore, companies are
able to provide a computer network without supplying the server
computer(s) 22a. In addition, users are able to access and use
software applications without storing the software application on
their computers. It should be understood, however, that ASP models
are well-known in the industry and should not be viewed as a
limitation with respect to the type of system architectures that
are capable of providing a computer network 20 that can properly
operate the software application discussed herein. Similarly, a
provider of the system may also choose to host the system on its
own equipment or employ a third-party hosting service to maintain
the system.
[0036] To perform the particular tasks in accordance with the
computer executable instructions, the computers 22a, 22b may
include, as needed, a video adapter, a processing unit, a system
memory, and a system bus that couples the system memory to the
processing unit. The video adapter allows the computers 22a, 22b to
support a display, such as a cathode ray tube ("CRT"), a liquid
crystal display ("LCD"), a flat screen monitor, a touch screen
monitor or similar means for displaying textual and graphical data
to a user. The display allows a user to view information, such as,
code, file directories, error logs, execution logs and graphical
user interface tools.
[0037] The computers 22a, 22b may further include read only memory
(ROM), a hard disk drive for reading from and writing to a hard
disk, a magnetic disk drive for reading from and writing to a
magnetic disk, and/or an optical disk drive for reading from and
writing to a removable optical disk or any other suitable data
storage device. The hard disk drive, magnetic disk drive, optical
disk drive or other data storage device may be connected to the
system bus by a hard disk drive interface, a magnetic disk drive
interface, or an optical disk drive interface, respectively, or
other suitable data interface. The drives and their associated
computer-readable media provide a means of non-volatile storage for
the computer executable instructions and any other data structures,
program modules, databases, arrays, etc. utilized during the
operation of the computers 22a, 22b.
[0038] To connect the computers 22a, 22b within the computer
network 20, the computers 22a, 22b may include a network interface
or adapter. For example, used in a wide area network, such as the
Internet, the computers 22a, 22b typically include a modem, router
or similar device. The modem, which may be internal or external,
may be connected to the system bus via a serial port interface. It
will be appreciated that the described network connections are
exemplary and that other means of establishing a communications
link between the computers 22a, 22b may be used. For example, the
system may also include a wireless access interface that receives
and transmits information via a wireless communications medium,
such as a cellular communications network, a satellite
communications network, or another similar type of wireless
network. It should also be appreciated that the network interface
will be capable of employing TCP/IP, FTP, SFTP, Telnet SSH, HTTP,
SHTTP, RSH, REXEC, etc. and other network connectivity
protocols.
[0039] As mentioned above, in one embodiment, the software
application 30 and databases reside on the server computer(s) 22a
and are managed by the provider of the software application 30 or
by a third-party. Those with skill in the art will understand,
however, that the software application and databases may reside on
the remote client computer 22b and be managed and maintained by a
user. The graphical user interface 50 may load web pages via HTTP
or HTTPS or other suitable application protocol.
[0040] For populating the databases, the browser may be utilized,
but this may also be accomplished via an MS-SQL Server Enterprise
Manager. While the software application 30 may be programmed in any
software language capable of producing the desired functionality,
it is envisioned that the software application will be programmed
using Microsoft ASP.net, HTML, Javascript, PHP3, or MS-SQL Stored
Procedures.
[0041] For maintaining the security associated with the software
application and databases, a unique login page may be maintained
for each user including, for example, individuals and employers.
The login page may also be used to control the access privileges
for various levels of users. In addition, each login page may also
require a user name and password. For security purposes, the user
names and passwords may be kept separately for each company that is
accessing the software application. To gain access to the software
application, the user must enter the proper user name and password.
It should be appreciated that different login procedures may be
employed, which are well know in the industry, on an as-needed
basis.
[0042] To maintain edit, populate and maintain the databases, the
graphical user interface 50 allows the user to perform standard
text editing functions, including, mouse placement of the cursor,
click-and-drag text selection and standard Windows.RTM. key
combinations for cutting, copying and pasting data. In addition,
the graphical user interface 50 allows users to access, copy, save,
export or send data or files by using standard Windows.RTM. file
transfer functions. It should be understood that these editing and
file transfer functions may also be accomplished within other
operating system environments, such as LINUX.RTM., MAC OS.RTM.,
UNIX, Mozilla.RTM., etc.
Data Sets
[0043] While the system can be used for any application in which it
would be desirable to compare non-standardized data sets, the
following description applies the system in the context of
employment recruiting and job searching. As shown in FIG. 3, job
posting 61 for a Web Developer is an exemplary data set, which
typically provides a title 62, job description 64, and the criteria
66 for the job posting 61, including the type and level of
education, professional credentials, and experience that a
qualified job seeker should possess. As will be described in
greater detail below, from each of these pieces of information, the
system can generate an attribute. In this example, job posting 61
calls for a job seeker with, among other things, a bachelors degree
in computer science and experience in development in HTML and
ASP.
[0044] Similarly, a resume 71 represents another data set that
comprises information about a job seeker. FIG. 4 provides an
illustrative resume 71 for an individual seeking position as a
software developer. Information about a job seeker may include, for
example, professional objectives 72, qualifications 73, levels of
education 74, past and present job titles and experience 76, and
personal interests 78. As described below, the system may
optionally permit a user to input her last job title 75 and offer
pre-defined categories from which the user can select. The title 75
and categories can then be associated in the data set. As with job
postings 61, the system can generate one or more attributes from
each of these pieces of information.
[0045] In one embodiment of the invention, each data set is
processed by system to generate a corresponding profile 80
comprising a plurality of attributes 70a, 70b, 70c, etc. generated
from each of the respective data sets. An exemplary profile 80 is
shown in FIG. 17. Each data set may comprise a job posting 61 or a
resume 71. In another embodiment of the invention, the system may
generate attributes 70 that are separately sub-categorized into
concepts 85 and titles 87. As will be appreciated by those of
ordinary skill in the art, without departing from the invention,
attributes 70 may optionally remain consolidated or may be
categorized by any number of characteristics other than concepts 85
and titles 87, such as, for example only, education, interests, and
work schedule.
Profiles
[0046] Bands
[0047] The system and process for creating a profile 80 from each
data set will now be described. FIG. 4 illustrates data set
comprising a user-provided resume 71. The system associates at
least one of a plurality of metrics 90a, 90b, 90c, etc. (identified
in FIGS. 13A and 17) with at least one attribute 70 (for example,
concept 85 or title 87) generated from resume 71. In one
embodiment, a metric 90a is a band 92 representing the relative
position of text within the data set. Frequently, the relative
location of data within a data set is indicative of the relative
importance of that data. For example, in resume 71, the most recent
experience 76 or the job seeker's professional objective 72 is
typically near the top of the resume 71. In the context of a data
set for a real estate listings, the address and price of the
property is typically also at the top of the listing. Accordingly,
metric 90a for band 92, which represents the location of data
within the data set, is helpful in assigning relative importance to
each datum within the data set as the corresponding attributes 70
are generated.
[0048] As shown in FIG. 5, Resume 71 is first broken into bands 92
and placed into band array 25a. In one embodiment, when a user
uploads or enters her resume 71 into the system, the user assigns
the resume 71 a title 75 and the user's most recent job title 81.
The system may also request that the user select a job category 83
from a predetermined list of categories 83. The steps of parsing
the data set into bands 92 is shown in FIG. 6. In step 210, system
assigns the title 75, if any, to band "0" 92a. The remaining text
of resume 71 is parsed by dates. At step 220, after the title 75 is
assigned to band "0", the entire remaining text of resume 71 is
entered into a memory field of band array 25a identified as band
"1" 92b, as shown in FIG. 5A. System may use a regular expression
to locate a date expression 94 in various formats, for example,
Jan. 1, 2005, Jan. 1, 2005, 1/1/05, etc. Once a first date 94a is
found, the system dumps all of the text that appears in resume 71
after first date 94a into a second row in the array 25a called band
"2" 92c. System continues to run the regular expression through the
text of data set of resume 71 until it finds the next date 94b, at
which time it dumps any text data appearing after next date 94b
into a new row in the array 25a referred to as band "3" 92d.
Regular expression continues to search for dates 94c, 94d, 94e,
etc. and dumps the text that follows each of those dates 94c, 94d,
94e, etc. into respective bands 92e, 92f, 92g until no further
dates are found in the remaining text. Finally, at step 230, the
system dumps the user-selected categories 83 in a final band 92g,
which may optionally be segregated by an open band 92f, as depicted
in FIG. 5B.
[0049] As will be appreciated by those of skill in the art, without
departing from the invention, other variables may be used to parse
bands 92, for example, biographical data like "education",
"experience," "skills," and "professional associations". In one
embodiment, the system may permit yet another band (not shown) that
could be manually populated with key words by the system provider
or user.
Word Array
[0050] Next, at step 250 of FIG. 6, and as shown in greater detail
in FIGS. 7A-7D and 8, the system analyzes the text in each band
92a, 92b, 92c, 92d, etc. to create word array 25b. The steps to
create the word array 25b are shown in FIG. 8. Starting with band
"0" 92a shown in FIG. 5a, and continuing with each subsequent band
92b, 92c, etc., all of the text in each band 92 of FIGS. 5a and 5b
is dumped into the word array 25b, shown in FIGS. 7A-7D. At step
260 in FIG. 8, each character string 96 is parsed by spaces, line
feeds or carriage return characters (e.g., word or phrase) to
occupy a separate row of array 25b, along with a second column that
identifies the band 92 from which the word was found. At steps 265,
270 and 275, system then runs through each row of array 25b and
uses another regular expression to identify and remove undesirable
punctuation, such as asterisks or to separate words by slashes. As
shown in FIG. 8, at steps 280 and 285, the system may optionally
check each character string 96a, 96b, 96c, etc., against substitute
database 102 to replace certain character strings 96 that have
well-known abbreviations. An excerpt of substitute database 102 is
shown in FIG. 9. For example, the word "a/p" or "ap" may be
replaced with "accounts payable." By substituting equivalent terms,
a more standardized lexicon of attributes 70 is ultimately
generated in profile 80, while the original data set, such as
resume 71, remains unchanged. In addition, at step 285, the system
may replace irregular word spacing, e.g., "r_&_d".
[0051] FIG. 10 illustrates the steps for determining whether a
character string 96 contained in the word array 25b should generate
an entry into the attribute array 25c. Initially, at step 305, each
word found in the word array is placed into a multi-word buffer, as
described below. Then, at step 310, the system checks the words in
the buffer to determine whether any pre-defined "spam" term is
found within the multi-word buffer. If such a spam word is
identified, at step 315, a flag is set to mark the entire profile
80 as including spam, so that the profile and associated data set
can later be eliminated from matching searches or optionally called
up for further investigation or review.
[0052] After stripping each character string 96 of punctuation, at
step 320, the character string 96 may be searched against common
word database 98. An excerpt from the common word database is
illustrated in FIG. 11. If the character string 96a is found in the
common word database 98, further processing can be aborted at step
345, and system increments to the next word in array 25b comprising
character string 96b. By avoiding processing a common, and
therefore, unhelpful word, the system processing speed is
increased. As shown in FIG. 11, "N" designates that the word is
common and therefore "not allowed." An entry labeled "Y" designates
that the word may be part of a multi-word phrase, and is therefore
retained.
[0053] At step 325, the system then compares each character string
96 in word array 25b against the words contained in at least one
attribute dictionary 104. An excerpt of the attribute dictionary
104 is shown in FIG. 12. If character string 96 is found in
attribute dictionary 104, attribute array 25c is created at step
350 and character string 96 is placed in attribute array 25c, along
with an association to the band 92 in which the character string 96
was first found. A sample attribute array 25c is shown in FIGS.
13A-13C. FIG. 14 illustrates the steps for entering single
(stand-alone) or multi-word phrases into the attribute array at
step 360. In addition, counter is incremented to track metric 90b,
which counts the number of occurrences 108 in which character
string 96a is found in the word array 25b. As will be described
later, a third metric 90c, defined as support 140, is tabulated in
another column of attributable array 25c.
[0054] After comparing character string 96 with the attribute
dictionary 104, character string 96 is also copied to buffer array
to determine whether the character string 96 is part of a
multi-word attribute 70. If, however, character string 96 is
followed by a hard carriage return, a comma or other similar
punctuation that would signal that the adjacent words are
unrelated, the buffer array is cleared, as indicated in FIG. 8 at
steps 290 and 295. This flag for termination is shown in FIG. 8. If
character string 96 does not include such a flag, the buffer array
retains the character string 96a to be compared with the next few
words that are found in the word array 25b. The number of words to
be saved in the buffer array can be varied within the system to
optimize results.
[0055] System then searches to see whether there are any more
character strings 96 in word array 25b, shown in FIG. 8 at step
278. If so, the steps shown in FIGS. 10 and 14 are repeated. If the
character string 96 is in the common word database 98 or ends in
appropriate punctuation, then at step 295 on FIG. 8, the multi-word
buffer array is cleared and the system processes the next character
string 96 in the word array 25b. If not, then at step 335 on FIG.
10, the multi-word buffer array is retained, and system searches
attribute array 25c to see whether character string 96 has already
been placed in attribute array 25c. If the next character string 96
is already in the array 25c, the occurrence counter is incremented
by one. Within attribute array 25c, the band designation 92 retains
the original value of the band 92 in which the character string 96
was first found, even if later occurrences are identified in later
bands. The system then checks, at step 335 on FIG. 10, to see
whether the multi-word buffer array contains any multi-word
attributes 70 contained in the attribute dictionary 104. If so, the
system checks to see whether the multi-word is found in the
attribute dictionary 104. If it is in the attribute dictionary 104,
then at steps 365-375 on FIG. 14, the attribute array 25c is
populated with a new multi-word attribute 104, then at steps
365-375 on FIG. 14, along with the band 92 from which the
multi-word attribute word was triggered.
[0056] An example will illustrate the population of the attribute
array 25c. Refer to the following text that is entered into band
array 25a shown in FIG. 5A: "attorney/software developer who has
designed, written and been selling and supporting legal practice
software applications." As shown in FIG. 7A, the character string
96 "attorney" is encountered in the word array 25b at line 2. The
word "attorney" is located in the attribute dictionary 104
(although the word "attorney" is not specifically shown), so it is
placed in attribute array 25c, shown in FIG. 13A, along with the
band 0. In addition, the occurrence counter is incremented to "1."
The word "attorney" is then saved in the buffer array. The system
then finds the next character string 96, in this example,
"software." As described below, because "software" is such a
commonly-used word, it is considered a dependent attribute, and is
not placed in the attribute array 25c. Similarly, the next word,
"developer," another commonly-used word, is also designated a
dependent attribute, and is therefore not placed in the attribute
array 25c. But, the multi-word buffer array 110 now contains the
words "software" and "developer," which, as a combined multi-word
phrase, is found in the attribute dictionary 104 (multi-word phrase
is not shown). Accordingly, system checks the attribute array 25c
to see whether the multi-word attribute 70 "software developer" has
already been entered. Since this is the first occurrence of
"software developer," the multi-word attribute 70 is entered in the
array 25c, along with its associated band 92, band "0" 92a, and the
counter is initially incremented to "1." As seen in FIG. 13A, the
multi-word "software developer" attribute is found in the word
array 25b for a total of six occurrences.
[0057] As also depicted in FIG. 13A, the system also identified the
multi-word attributes 70 "attorney software" and "attorney software
developer." As seen with this example, the generation of a single
occurrence of the words "attorney", "software" and "developer" in
sequential order within the word array 25b yielded four separate
attributes 70 in the array 25c, namely, "attorney", "software
developer," "attorney software," and "attorney software developer."
Later, as shown on FIG. 13B at line 7, when the system encounters
"software" followed by "application," it created a new entry in
attribute array 25c for "software application," which was
incremented for a total of four occurrences. Referring to element 6
in FIG. 7A, the word in word array 25b is "am," which is found in
the common word database 98, so the "am" character string 96 is
ignored, the buffer array is cleared and the system selects the
next character string in band array 25b, which is element 7,
"an."
[0058] In one embodiment, a further enhancement is provided by
subcategorizing the attributes 70 as either concepts or titles. For
example, the word "accountant" is identified as a title, whereas
the word "accounting" is considered a concept. This can be
accomplished by distinguishing between concepts and titles within
the attribute dictionary 104 or by creating separate dictionaries,
one title dictionary and another concept dictionary. For example,
the excerpt from the attribute dictionary 104 shown in FIG. 12
differentiates titles and concepts as follows: a "c" represents an
independent (or stand-alone) concept; "cd" represents a dependent
concept; "s" represents a stand-alone title; and "d" represents a
dependent title. Alternatively, separate dictionaries may be used,
and the system can look up each character string 96 first in the
title dictionary and if no match is found, then character string 96
may be looked up in the concept dictionary.
[0059] The idea of identifying independent attributes, which are
entered in the attribute array 25c by themselves, and dependent
attributes, which must be combined with other terms, can be applied
to concepts and titles as shown in FIG. 12. The dependent concepts
and titles are words that are commonly used, but provide little or
no value in matching a candidate with a relevant job opening,
unless combined with another word. As described in the example
above, neither the concept "software" nor the title "developer" is
helpful by itself in identifying qualifications of a job applicant
or needs of an employer. But when the two words are combined, the
phrase "software developer" is a recognized job title that is a
helpful attribute.
[0060] Alternatively, dependent concepts and dependent titles can
be separated into separate databases, for example, in dependent
concept database and dependent title database. If the character
string 96 is found on either database, character string 96 is not
placed in the array 25c, but it is placed in the multi-word buffer
and may be placed in the array 25c along with the next character
string 96b if the next word meets the criteria in steps described
in FIGS. 10 and 14. The system can be set to buffer a variable
number of words, although buffering up to four words has been found
advantageous. This permits multi-word attributes 70 comprised of
four or less words to be identified, for example, "securities
transactional paralegal," "information technology consultant," and
"corporate securities transactional."
[0061] The steps in FIGS. 8, 10, and 14 are repeated until there
are no more character strings in the word array 25b. At this point,
attribute array 25c will be filled with the all of the attributes
70 (or substitutions) generated by the word array 25b that appear
in attribute dictionary(ies) along with the identity of the
respective band 92 in which each attribute 70 was first encountered
and the total number of occurrences that each attribute 70a, 70b,
70c, etc. appeared in word array 25b.
[0062] Next, the system checks each attribute 70 (concept or title)
in the array 25c against the attribute dictionary (104, shown in
FIG. 12) to identify synonyms as shown in column 105 to reduce
redundancy and enhance the results during the searching and
matching routine. For example, the words "a+", "a+ certification"
and "a+ certified" would all be replaced by the attribute "ID" 70
for the attribute "a+ certified" as provided in the synonym column
105, shown in FIG. 12. As with the substitute list 102 described
earlier, this routine adds consistency to the results.
Tagger
[0063] In an alternative embodiment, rather than placing the words
in the word array 25b, pond 35 of words is created from the band
array 25a. The pond 35 is created by converting the data structure
and data contents of the band array 25a into a doubly-linked list.
As will be known by those of ordinary skill in the art, a
doubly-linked list comprises a sequence of nodes, each containing a
data field and having two references, one pointing to the previous
node and the other to the next node in the list.
[0064] Through a series of operations, the doubly-linked list,
which comprises pond 35, is tagged using HTML to assign tokens to
certain items in the double-linked list. The tokens may identify
attributes 70, such as a dependant concept or title 36, an
independent concept or title 37, a negator such as the word "not"
(not shown), a break 38, an unrecognized word 39, a slash group 45,
or a connector 43, as will be described further. In one embodiment,
pond 35 tokens may be color coded in HTML. For example, in FIGS.
20A-20E, pond 35 identifies several pond items, including
"software" and "program," which may be displayed in orange to
designate dependant concepts or titles 36; "engineer" and "c++,"
which may be displayed in green to designate concepts or titles 37;
"<--band-->," ":" "," which may be displayed in red to
designate hard breaks 38 between words in the pond; "in the," "in,"
"of," and "<--line-->," which may be displayed in blue to
designate connector 43 that may join adjacent items; and finally,
items grouped within "{ }" brackets to indicate slash groups
45.
[0065] The addition of the tokens may be done in sequential
operations. For example, multiword concepts--such as "visual
basic," in which neither "visual" nor "basic" is, by itself, a
concept or title but becomes a concept when appearing adjacent one
another--may be identified and tagged. Similarly, multi-part
concepts--such as "computer science," which include dependant and
independent concepts or titles--may be identified and tagged. The
use of a double-linked list for the pond eliminates the need to
maintain a multi-word buffer, and also permits greater flexibility
for combining attributes 70 that are spaced more than a limited
number of positions away from one another. For example, in the
embodiment employing a buffer, described above, the combination of
multi-word attributes is limited by the number of words or
character strings 96 held in the buffer, whereas by using a
double-linked list, the words may searched for attributes 70
anywhere within the double-linked list (i.e., pond 35), subject to
any tagged limitations, such as hard breaks 38 or negators. This
advantage increases the flexibility for combining separate
attributes surrounding attributes within a slash group. The tokens
further provide flexibility for defining and modifying behaviors
(i.e., programming instructions) associated with particular types
of words. For example, by retaining a connection between two lists
of words, a connector 43, such as "of," may identify a useful
attribute 70, such as "human resources manager" from the original
"manager of human resources."
[0066] A post processing step may be optionally provided in which
"orphaned" attributes are identified and replaced with more
meaningful attributes. For example, if the dependant title
"engineer" appears by itself in one location in the pond but the
multi-word title "electrical engineer" appears elsewhere in the
pond, the orphaned dependant title "engineer" may be replaced with
"electrical engineer."
[0067] In further aspect of this embodiment, because the
attributes, such as independent concepts or titles (or both) 37,
have already been identified in the pond 35 during the tagging
process, the attribute array 25c may be readily created by running
through the tagged pond. The attribute array is illustrated in
FIGS. 13A, 13B, 13C, and 20A.
Assigning Support Metric
[0068] To further enhance the accuracy of the profile generation,
each attribute 70 that is entered into array 25c is evaluated by
how closely the attribute 70a, 70b, 70c, etc. is related to other
attributes 70a, 70b, 70c, etc. in the array 25c. This is
accomplished by the use of attribute "pods" 125. FIG. 15 shows
excerpts from a sample pod 125a. FIG. 16 illustrates the steps
described next for generating a support metric 90c.
[0069] Each pod 125a, 125b, 125c, etc. identifies the relatedness
of a "root" attribute 130 (for example, concept or title) to other
words that may appear within word array 25b (which, in turn, are
related to words appearing in the data set, for example, a resume
71 or a job posting 61). Each pod 125a, 125b, 125c, is created by
conducting an analysis for each root 130 to determine what other
attributes 70a, 70b, 70c, etc. are related to the root 130. In one
embodiment, every attribute 70a, 70b, 70c, etc. is designated, in
turn, as the root 130 and searches are conducted through a large
number of sample data sets (for example, resumes 71a, 71b, 71c,
etc. and/or job postings 61a, 61b, 61c, etc.) or sample sets of
profiles 80 to identify each occurrence of another attribute 70,
which is referred to as a "leaf" 135.
[0070] The pod 125 information can be refined, for example, by
counting the number of occurrences in which both the root 130 and
each leaf 135 appears (a) within a given data set, (b) within the
same paragraph of a data set, and/or (c) within the same sentence
of a data set. Similarly, the comparisons could be made between
attributes 70 appearing in profiles 80 and within the same bands
92. The resulting occurrences 108 for the sample data sets are then
compiled into a pod 125 for each root 130, identifying how many
times each leaf 135 is associated with the root 130. Thus each pod
125 can list the number and percentage of occurrences that both the
root 130 and each leaf 135 appeared within the same document,
paragraph, and sentence of the sample data sets or same bands 92 of
profiles 80a, 80b, 80c, etc. An example of the pod 125a for the
root, "accountant" is set forth in FIG. 15.
[0071] Pods 125a, 125b, 125c, etc. may be used to scale the profile
80 in several ways and to add various degrees of precision by
assigning a metric 90c for "support" 140, which signifies the
presence of attributes 70 that are more likely related to the root
130. For example, in one embodiment, the pod 125 may be truncated
into a binary value, whereby "1" identifies the existence of a
relationship and "0" identifies the absence of a relationship. This
assignment of support value is shown in steps 405-430 on FIG. 16.
To illustrate, in a given array 25c, if a leaf 135 appears in the
pod 125 for a root 130, support 140 counter would be incremented by
one, at step 430, regardless of whether the leaf 135 appeared in
all of the sample data sets or only one of the sample data sets. In
this scenario, each time any leaf 135 is found in the pod 125 for a
root 135a, the counter would be incremented by 1 for that
particular root 130a. Thus, if many leafs 135a, 135b, 135c, etc.
for a particular root 130a are found in the attribute array 25c,
the support 140 for the root 130a is high and the root 130a is
weighed more strongly in the profile 80.
[0072] In an alternate embodiment, the relative percentage of
appearances of each leaf 135a, 135b, 135c, etc. to each root 130a
can be cumulatively added and then normalized with the other
metrics 90 (e.g., the band 90a and occurrence 90b scores). For
example, as seen in FIG. 15 pod 125a for the root 130a "accountant"
and the leaf 135a "certified" provides support 140a of 54.16%, and
support 140b for the leaf 135b, "gaap" of 76.00%. So, if a profile
80 includes the root "accountant" and the leafs "certified" and
"gaap", these support values can be added to 130.16%. Accordingly,
the support 140 values for all the leafs 135a, 135b, 135c, etc. in
the attribute array 25c associated with each root 130 could be
totaled for a grand support 140 value for each concept in the
attribute array 25c.
[0073] In another embodiment this total support 140 value can then
normalized to correspond with the approximate magnitude of the
other metrics 90a, 90b, 90c, etc. associated with the attribute
array 25c. Normalizing the support 140 value can be done many ways
without departing from the invention. For example, in one
embodiment, the support value 140 totals are divided by a value
such as the highest score of all the support 140a, 140b, 140c, etc.
value totals and then multiplied by a multiplier.
[0074] In another embodiment, each gross support 140a, 140b, 140c,
etc. value can merely be ranked. For example, the gross support 140
value can be replaced by the reverse rank (so the highest gross
support 140 value would have the highest value). To illustrate, as
shown in Table 1, if a series of root attributes 130 have a gross
support 140 values of root 140a=1209, root 140b=2409, root
140c=478, root 140d=8904, root 140e=35, root 140f=0, the support
140 values assigned in attribute array 25c could be as follows:
root 140a=3, root 140b=4, root 140c=2, root 140d=5, root 140e=1 and
140f=0. Various methods for using the pods 125 for assigning
relative weighting for the support 140 value may be employed
without departing from the invention.
TABLE-US-00001 TABLE 1 ROOT GROSS SUPPORT SUPPORT VALUE 140d 8904 5
140b 2409 4 140a 1209 3 140c 478 2 140e 35 1 140f 0 0
Ranking the Profile
[0075] To complete the profile 80 for each data set, the metrics 90
are used to rank the attributes according to relative importance,
as identified in steps 450 and 455 of FIG. 16. FIG. 17 shows an
exemplary profile 80. In one embodiment, all the generated
attributes 70a, 70b, 70c, etc. are placed in the array 25c in order
of appearance within the bands 92 as shown in steps 450 and 455,
shown in FIG. 16. That is, all the attributes 70a, 70b, 70c, etc.
found for the first time in band "0" 92a are listed as band "0",
then band "1", band "2", and so on. Next, after the support values
140a, 140b, 140c, etc. are assigned, the leafs 135a, 135b, 135c,
etc. that are found supporting each root 130a, 130b, 130c, etc. are
pulled up in order of descending support value 140 behind each
related root 130. Finally, within each group of root 130 and
associated leafs 135a, 135b, 135c, etc., the leafs are listed in
order of number of occurrence 108. This ranking or weighting scheme
is exemplary and other schemes may be used without departing from
the invention.
[0076] Once the array 25c and associated metrics 90a, 90b, 90c,
etc., such as, band 92, occurrence 108 and support 140, are ranked,
the attributes 70 and associated metrics 90a, 90b, 90c, etc. can be
saved as a profile 80, which is associated with the respective data
set from which the profile 80 was generated. For example, FIG. 17
illustrates the ranked attributes 70 for the sample resume 71 shown
in FIG. 4. In this example, titles 87 are broken out from concepts
85 into separate lists. The values in parenthesis after each
attribute 70 represent the band 92, occurrences 108, and support
140 generated for each attribute 70. The attributes 70 are thereby
ranked in order of relative importance in the context of the
originating data set. The respective list of titles 87 and concepts
85 can be selectively combined, for example, by interleaving the
two ranked lists, (i.e., by placing the highest ranked title 87
first, then the highest ranked concept 85, then the second highest
ranked title, etc.) or by giving each variable weight.
[0077] In addition, the data set may also be further associated
with user account information. For example, a job seeker may have
an account set up that can include contact information, history of
job postings that the job seeker has reviewed, job postings that
the job seeker has applied for, and other data associated with the
individual. Similarly, a job poster or employer may have a user
account that retains contact information, service packages, billing
information, other job postings, applications received for each job
posting, and other information associated with the employer.
[0078] In one embodiment, a user may be given an opportunity to see
the resulting profile 80, for example in the format shown in FIG.
17, and be permitted to modify the profile 80. For example, the
user could be permitted to emphasize or deemphasize certain
attributes 70, their associated metrics 90 or manually adjust their
ranking. A job seeker may notice that a particularly important
attribute 70 is ranked lower than other less important (to the
user) attributes 70. Accordingly, the user may optionally be
permitted to adjust one or more of the metrics 90 for the
attribute(s) 70 to give the attribute(s) 70 more significance when
used for matching, as described below.
[0079] It will be appreciated by those of ordinary skill in the art
that the system and method, which is described above in the context
of data sets comprising resumes 71, could just as readily be used
for other data sets, including job postings 61. For other data
sets, the metrics 90 used to score the attributes 70 may be varied.
For example, job postings 61 typically do not delineate information
by date, as is typical with resumes 71, but may instead parse the
data by title, experience, and skills. Accordingly, bands 92 could
use different character strings or words rather than dates to parse
the data set.
[0080] Moreover, the system and method for creating standardized
profiles 80 for non-standard data sets can be used for data sets
unrelated to recruiting and employment, including for example,
dating or match-making services, real estate listings, classified
advertising, used-car listings, etc.
Assigning a Level Metric
[0081] In another embodiment, the profile 80 may include a level,
which is commensurate with the degree of skill represented by the
data set, that is, a level being sought by a job posting or
attained by a job seeker. For example, education level 47
represents a metric 90 and job level 49 represents another metric
90.
[0082] The education level 47 is derived from a search of keywords
contained in the data set that represent education, such as "high
school", "BA", "BS", "MBA", "masters degree", "MD", "PhD", etc. If
such keyword are contained in the data set, a metric 90
representing 47 is assigned, which is indicative of the education
level. Education levels may include:
TABLE-US-00002 EL_HS High School EL_BA Bachelor of Arts EL_BS
Bachelor of Science EL_RN Registered Nurse EL_MS Master of Science
EL_MBA Master of Business Administration EL_MD Medical Doctor
EL_PHD Doctor Philosophy
[0083] The job level 49 may be derived using the education level 47
as an input, as well as by factoring in additional data from which
a job level 49 may be assigned. For example, in one embodiment, the
job levels 49 may be assigned as "entry", "mid", "senior", or
"executive." In addition to the education level 47 described above,
another input may include the results of a search identifying other
keywords that are indicative of job level 49, for example, "vice
president", "vp", "manager", "supervisor", and others.
[0084] Beyond keyword searching, the method and system may also
seek other indicia of job responsibility, like managing other
people. In one embodiment, upon a job seeker posting his or her
resume, the system may ask whether the job seeker has ever managed
people, and the resulting answer may be used as an input for
evaluating job level 49. Similarly, an input for the appropriate
job level 49 may include the numbers of years of experience in a
particular field. The number of years of experience can be
identified by using regular expressions to identify all the date
designations for a particular position on a resume, and calculating
the number of years from the date of one position to the date of
the prior position.
[0085] Based on one or more of the inputs described above, scores
may be tallied to assign a job level 49. In one embodiment, job
levels 49 are assigned as either "entry", "mid", "senior", or
"executive". In one embodiment, the data set begins with a default
job level of "mid" and the inputs described above are used to tally
points for the respective levels. For example, the presence of
"high school" may generate adding a point to the job level,
"entry", whereas the presence of "MBA" may generate adding a point
to the job level, "senior". Similarly, having less than two years
of experience may generate a point for the job level, "entry",
while more than five years experience may generate a point for the
job level 49, "senior". Upon tallying all the points based on the
various inputs described above, the tallies are compared to assign
a job level 49. In one exemplary tally, if the tally for
"executive" had more points than "entry" and "senior," the data set
would be assigned an "executive" job level 49. If the tallies for
"entry" is less than "senior", and "senior" is tied with
"executive," a job level 49 of "senior" would be assigned. If the
tally for "senior" is two or more than the tally for "entry", the
data set may be assigned "senior", whereas any differential less
than two will result in an assignment of "mid." In another example,
if the tally for "entry" is two or more than the tally for
"senior," the data set may be assigned "entry," whereas any
differential less than two will result in an assignment of "mid."
In the last two examples, requiring at least a two-point
differential between "senior" and "entry" results in a "mid"
assignment for any close results, and prevents skewing the assigned
job level 49 unless there is a meaningful indication that a job
level 49 is higher. In the described embodiment, the default job
level 49 when there are no indicia of seniority is "mid."
[0086] As will be understood by a person of ordinary skill in the
art, the various levels create a spectrum of experience and skill,
and various logic can be used to generate job level 49 assignments
that will be beneficial in matching data sets.
[0087] Once assigned, the job level 49 may be used in various ways
to filter, search and display the output of matched profiles, as
will be described next.
Matching Profiles
[0088] Once profiles 80 are generated for a series of data sets,
the profiles 80 may be leveraged in many ways. Because the data
sets--be they resumes 71, job postings 61, or others--are generated
into profiles 80 having standardized sets of attributes 70 and
organized in a standard ranking or scaling scheme, disparate data
sets can be efficiently compared, grouped, and ranked. One use for
the profiles 80 is to match prospective job seekers having
respective resumes 71 to a particular job posting 61. Conversely,
the profiles 80 can be used to match prospective job postings 61 to
a particular job seeker having a resume 71. In addition, a job
seeker who is interested in a particular job posting 61 can
leverage that particular job posting profile 80 to search for other
job postings that are similar to the job posting of interest.
Similarly, employers can leverage the profile 80 of a particular
job seeker's resume to search for other job seekers whose resumes
are similar to the resume of interest.
[0089] Once the profiles 80 of the data sets are generated, there
are many ways known in the art to conduct searches and generate
matches between one profile 80a of a data set to find the closest
matching other profiles 80b, 80c . . . 80n. In one embodiment, the
system converts each profile 80 into a series of numerical values,
where each available attribute 70 is assigned a unique numeric
integer value or identifier (e.g., "ID"). Such numeric IDs are
illustrated in FIG. 12. Converting the text value of each attribute
70a, 70b, 70c, etc. into a numeric value increases the efficiency
of commercially available search engines. Accordingly, each of the
attributes 70a-70n in a profile 80 can be converted into its
assigned numeric value, for example, the attribute 70 ".net" shown
in FIG. 12 may be assigned numeric value "80 4685." Because integer
values can comprise significantly smaller amounts of data than full
ASCII character words, this translation can speed up the processing
time for the search engine. This conversion from text character to
integer value can be performed while the profile 80 is being
created or after it is done.
[0090] One example of a suitable search engine 60 for use in
generating searches to match various profiles 80 is offered by Fast
Search & Transfer ASA. One search engine solution offered by
Fast and suitable for use with an embodiment of this invention is
FAST Data Search.TM..
[0091] To conduct a candidate search of a plurality of resumes 71a,
71b, 71c, . . . 71n based on a profile 80a for a job posting, (for
ease of reference, the "subject profile 80"), the subject profile
80a can be readily converted into a search query for input into the
search engine 160 to conduct a search of a plurality of resume
profiles 80 (the "target profiles 80b-80n").
[0092] The search can optionally be weighted to further enhance the
search results. In one embodiment, the query based upon the subject
profile 80a can be created by weighting each attribute 70a-70n
according to its ranking within profile 80a, so that the highest
ranking attribute 70a is weighted highest in the search, the second
highest-ranking attribute 70b is weighted second highest, and so on
through all the attributes 70n.
[0093] Similarly, it is beneficial to weight the target profiles
80b-80n to enhance the search results. While the search query can
include as many attributes 70 as desired, it is more practical and
efficient to limit the number of attributes 70 that are separately
weighted among the target profiles 80b-80n; otherwise, the amount
of data for all the attributes 70 associated with all the target
profiles 80b-80n would slow the search engine. Accordingly, the
attributes 70 of the target profiles 80b-80n may be weighted in
tiers. If each target profile 80 (e.g., resume profile) contains a
ranked list of, for example, forty-three separate attributes 70,
the forty-three attributes 70 can be weighted according to the
following tiers. The first 10 attributes can each be assigned a
weight of, for example, 5000 points, while attributes 11-20 may
each be assigned a weight of, for example, 700 points, and
attributes 21-43 may be assigned a weight of, for example, 10
points.
[0094] The query generated from the subject profile 80a will then
cause the search engine 160 to return a list of target profiles
80b-80n (in the foregoing example, resume profiles) in a ranked
order by how closely the weighted target profiles 80b-80n match the
subject profile query. These are matching profiles 165, as
identified in FIG. 18.
[0095] The system can optionally provide even further refinement of
the search results by using a recommendation engine 155, as
illustrated in FIG. 18, to select recommended profiles 175 from the
matching profiles 165. The recommendation engine 155 may eliminate
target profiles 80b-80n that fail to meet a minimum threshold
matching score or modify the ranking of the profiles 80b-80n. In
other words, the subject profile 80a may be compared against each
target profile 80b-80n retrieved by the search engine and through
the following process matching scores may be assigned to each
target profile 80b-80n. Specifically, the system checks each
attribute 70 in the subject profile 80a against each target profile
80b-80n retrieved by the search engine and, using a suitable
formula that will be described below, assigns points corresponding
to how closely the attributes 70 in the subject profile 80a
correlate with the attributes 70 in the target profile 80b-80n. An
attribute 70 that is listed in both the subject profile 80a and a
target profile 80b-80n can be referred to as a "matching attribute"
150. The degree with which a subject profile 80a matches a target
profile 80b-80n will depend on the number of matching attributes
150 and the relative ranking of each matching attribute 150 within
the subject profile 80a and a target profile 80x. For example, a
target profile 80x, shown in FIG. 19, whose lowest-ranked attribute
matches the highest-ranked attribute of the subject profile 80a
will likely be less relevant than a target profile 80y, whose
highest-ranked attribute matches the highest-ranked attribute of
the subject profile 80a.
[0096] Accordingly, in one embodiment, points are assigned to each
target profile 80b-80n based on how high the matching attributes
150 for both the subject profile 80a and the target profile 80x
rank. For example, the system checks each attribute 70x in the
subject profile 80a to determine whether the same attribute 85 is
also included in the target profile 80x. For attributes that do not
match, no points are assigned, and the system moves to the next
attribute 70 in the subject profile 80a. If the system finds a
matching attribute 150, it assigns points based on how high the
matching attribute 150 is ranked in the subject profile 80a. The
system runs through all the attributes in the subject profile 80a
and compiles the total points based on the ranking of the matching
attributes 70 within the subject profile 80a. Obviously, if only
the five bottom ranked attributes 70 in the subject profile 80
matched the attributes in the target profile 80x, there may not be
a very good match, even if such five matching attributes 150 were
ranked high in the target profile 80x. As a result, the system then
repeats the process, but this time assigns points based on how high
the matching attributes 150 are ranked in the target profile 80x.
Then the points assigned for the subject profile 80a and the points
for the target profile 80x are added together for a total matching
score.
[0097] To convert the highest rank (which is typically represented
by the lowest number, i.e., first or 1) to the highest points, the
system assigns the total number of attributes in the subject
profile 80a, minus the rank of each matching attribute 150. For
example, assuming there are 50 attributes in the subject profile
80, if a matching attribute 150 is the highest ranking attribute in
the target profile 80x, the target profile 80x would be assigned
points equal to 50-1=49.
[0098] In one embodiment, to enhance the screening and create even
more differentiation between the rankings, the results are then
squared. So in the last example, (50-1).sup.2=49.sup.2=2401 would
be assigned to the target profile 80x. The system may then search
for the next matching attribute 150 and continue assigning points
until all the matching attributes 150 were assigned points. The
total points will identify how high the matching attributes 150
were ranked in the target profile 80x. Then the system repeats the
tally by assigning points for how high the matching attributes 150
ranked in the subject profile 80a.
[0099] This can be illustrated by an example, as shown in FIG. 19.
Assume that there are five matching attributes 150 between a
subject profile 80a and a target profile 80x, and for simplicity,
assume that both the target profile and subject profile each have
50 attributes. Further assume that the matching attributes 150 were
the top five ranked attributes in the subject profile 80a. In this
case, the score would be
(50-1).sup.2+(50-2).sup.2+(50-3).sup.2+(50-4).sup.2+(50-5).sup.2=11055.
If the five matching attributes 150 were ranked 46-50 (at the
bottom) in target profile 80x, the totals would be
(50-50).sup.2+(50-49).sup.2+(50-48).sup.2+(50-47).sup.2+(50-46).sup.2=0+1-
+4+9+16=30. To further enhance the matching results, the two scores
can be added together for a total score of 11085. In contrast,
compare to another example using a target profile 80y having the
same matching attributes 150 as target profile 80x, but where they
are ranked in the top five on the subject profile 80a. This would
yield a score of 11055, so when the two scores were added together,
the total score would be 22110.
[0100] This calculation can be completed for each target profile
80b-80n retrieved by the search engine. Finally, the point totals
are normalized by dividing the score for each target profile
80b-80n by a perfect score for the subject profile 80a, where a
perfect score would be the matching score that would be yielded by
a profile that exactly matched the subject profile 80a. Using this
scoring method, it has been found that matching scores of less than
18% yield unsatisfactory results. Thus, target profiles 80 yielding
a match score less than a preset threshold may be optionally
discarded. It should be understood that this threshold can be
changed or varied to optimal values without departing from the
invention.
[0101] While this describes one method for identifying how closely
a target profile matches a subject profile, many other methods can
be employed without departing from the invention. For example, the
ranking of each matching attribute within the subject profile and
the target profile can be compared to determine the relative degree
of similarity between the two profiles. For example, if a matching
attribute is ranked third in the subject profile and ranked
34.sup.th in the target profile, the matching attribute could be
assigned a score of the difference, i.e., 34-3=31, and this score
can be used to screen or weight the importance of the matching
attribute. So, for example, the system could optionally discard any
matching attributes that are not within a predetermined number of
ranking from each other.
[0102] The same technique can be used to input a resume profile 80
into the search engine and generate job posting profiles. Indeed,
the system can be used to create matches between the profiles
created for any data sets. For example, the system could be used to
compare individual profiles for a personal match-making service,
real estate listings, classified advertising, used car listings,
etc.
[0103] As will be appreciated by those of skill in the art, the
present system may be used to generate matches between various data
sets. For example, upon uploading a new resume, a user could be
provided with a list of suitable job postings. Similarly, upon an
employer uploading a job posting could be provided a list of
suitable resumes based on the output of the system and method
described herein. In addition, a job seeker who has found one job
posting of interest could request that the system find other job
postings that are similar to the job posting of interest.
Conversely, an employer who finds a candidate of interest could
request the system generate a search using the system and method
disclosed herein to provide a list of similar candidate
resumes.
Recommendation Page
[0104] Once matching profiles are selected, they can be further
arranged or categorized in groups to assist the job seeker or job
poster in selecting the most appropriate matches.
[0105] For example, in one embodiment, to further enhance searching
and matching capabilities, job levels 49 may be used to rank,
organize or filter the data set search results. For example, upon
searching for job postings that match a particular job seeker's
resume, the resulting job postings could be presented in groups
according to discrete job levels. Similarly, upon searching for
resumes that match a particular job posting, the resulting resumes
may be presented in groups according to job levels, such as
"entry-level," "mid-level," "senior," and "executive." Further, the
system may optionally filter out the results to limit the results
only to matching job levels. In yet another embodiment, searches
may permit a user to specify a particular job level to match the
interests of a given job seeker or job poster.
[0106] In addition, matching profiles may be organized by "fresh"
for more recent postings, "close" for ranking profiles that are
located geographically nearby, "region" for ranking profiles by
geographical region, or "relevant" for ranking profiles by how
closely the match.
[0107] In another embodiment, the target profiles can be organized
by various topics within the subject profile. In other words,
attributes contained in a profile may be identified and grouped
together in order to conduct searches that are focused on specific
topics. In one embodiment, each attribute in the subject profile is
compared to every other attribute in the profile and uses the pods
125 to determine how related (if at all) the two attributes are.
Each attribute is then paired with its highest matching other
attribute to form "dynamic pods" containing two attributes. To
ensure that attributes have some minimum level of relatedness,
thresholds can be set. The system then takes another pass comparing
each dynamic pod to each other using a lower threshold, and the
highest matching dynamic pods are combined to form larger pods.
This comparison process continues until an optimum dynamic-pod size
is reached. For example, the system can be configured to continue
iterations until there is an optimum number of attributes within
each dynamic pod (e.g., 4-8) or until there is an optimum number of
dynamic pods (e.g., 3-5). In further enhancement, the system may
optionally add attributes to the dynamic pods by adding attributes
that are highly related based on the pods 125, even where the
related attribute is not actually in the profile. This can create
more robust results based on the dynamic pods, as will be described
next.
[0108] Once the dynamic pods are generated, user may seek profiles
that match only the attributes captured within the dynamic pod. For
example, the dynamic pod may be displayed in a tab on a
recommendation page, and by clicking on the tab, the system can
fire a new search seeking for profiles matching just the attributes
listed in the dynamic pods. This search will generate more focused
profile search results.
[0109] While specific embodiments of the present invention have
been described in detail, it will be appreciated by those skilled
in the art that various modifications and alternatives to those
details could be developed in light of the overall teachings of the
disclosure. For example, the processes described with respect to
computer executable instructions can be performed in hardware or
software without departing from the spirit of the invention.
Furthermore, the order of all steps disclosed in the figures and
discussed above has been provided for exemplary purposes only.
Therefore, it should be understood by those skilled in the art that
these steps may be rearranged and altered without departing from
the spirit of the present invention. In addition, it is to be
understood that all patents discussed in this document are to be
incorporated herein by reference in their entirety. Accordingly,
the particular arrangement disclosed is meant to be illustrative
only and not limiting as to the scope of the invention which is to
be given the full breadth of the appended claims and any
equivalents thereof.
* * * * *