U.S. patent application number 17/495673 was filed with the patent office on 2022-07-28 for method and system for correlating an image capturing device to a human user for analyzing gaze information associated with cognitive performance.
The applicant listed for this patent is Neurotrack Technologies, Inc.. Invention is credited to John J. ANDERSON, Scott ANDREWS, Nicholas T. BOTT, Jian YAO.
Application Number | 20220236794 17/495673 |
Document ID | / |
Family ID | 1000006272533 |
Filed Date | 2022-07-28 |
United States Patent
Application |
20220236794 |
Kind Code |
A1 |
BOTT; Nicholas T. ; et
al. |
July 28, 2022 |
METHOD AND SYSTEM FOR CORRELATING AN IMAGE CAPTURING DEVICE TO A
HUMAN USER FOR ANALYZING GAZE INFORMATION ASSOCIATED WITH COGNITIVE
PERFORMANCE
Abstract
The present invention provides a method for a finalized
processed image and related data to identify a spatial location of
each pupil in the region. Each pupil is identified by a
two-dimensional spatial coordinate. The method includes processing
information associated with each pupil identified by the
two-dimensional spatial coordinate to output a plurality of
two-dimensional spatial coordinates, each of which is in reference
to a time, in a two-dimensional space. The method then includes
outputting a gaze information about the human user. The gaze
information includes the two-dimensional spatial coordinates each
of which is in reference to a time in a two-dimensional space.
Inventors: |
BOTT; Nicholas T.; (Menlo
Park, CA) ; YAO; Jian; (Redwood City, CA) ;
ANDERSON; John J.; (Redwood City, CA) ; ANDREWS;
Scott; (Redwood City, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Neurotrack Technologies, Inc. |
Redwood City |
CA |
US |
|
|
Family ID: |
1000006272533 |
Appl. No.: |
17/495673 |
Filed: |
October 6, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16843172 |
Apr 8, 2020 |
11163359 |
|
|
17495673 |
|
|
|
|
16712986 |
Dec 12, 2019 |
10984237 |
|
|
16843172 |
|
|
|
|
15809880 |
Nov 10, 2017 |
10517520 |
|
|
16712986 |
|
|
|
|
62420521 |
Nov 10, 2016 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 3/14 20130101; G06V
40/165 20220101; G06V 20/46 20220101; G06V 40/193 20220101; G06V
40/162 20220101; G06V 40/171 20220101; G06F 3/013 20130101; G06V
40/19 20220101 |
International
Class: |
G06F 3/01 20060101
G06F003/01; G06F 3/14 20060101 G06F003/14; G06V 20/40 20060101
G06V020/40; G06V 40/19 20060101 G06V040/19; G06V 40/16 20060101
G06V040/16; G06V 40/18 20060101 G06V040/18 |
Claims
1. A method for processing information from a user using paired
learning and testing of objects, the objects being selected from a
group consisting of a paired symbol digit comparison, a paired
arithmetic, a paired line orientation, a paired line length, a
paired feature binding, or a paired price comparison, the method
comprising: in a learning phase, displaying a pair of learning
objects on a display coupled to a computing device, and then
repeating a learning process of displaying a pair of other learning
objects numbered from 2 to N on the display of the computing
device, where N is an integer from 6 to 16; in a testing phase,
displaying a pair of testing objects on the display of the
computing device, and then repeating a testing process of
displaying a pair of other testing objects numbered from 2 to M,
where M is an integer from 6 to 16, each pair of testing objects
being categorized as either (1) equal to a pair of learning
objects, (2) different from a pair of learning objects where each
of the learning objects is different from each testing word or one
of the learning objects is the same as one of the testing objects
and the other learning word is different form the testing word in
the pair; for each pair of objects displayed in the learning phase
and the testing phase; capturing eye tracking information,
including gaze information, from a user; capturing an input
response information from the user for each pair of objects in the
testing phase; and storing the eye tracking information and the
input response information in a memory device coupled to the
computing device.
Description
CROSS-REFERENCE TO RELATED CASES
[0001] This application is a continuation of and claims priority to
U.S. Ser. No. 16/843,172 filed Apr. 8, 2020, which is a
continuation in part of and claims priority to U.S. Ser. No.
16/712,986 filed Dec. 12, 2019, now issued as U.S. Pat. No.
10,984,237 on Apr. 20, 2021, which is a continuation in part of and
claims priority to U.S. Ser. No. 15/809,880 filed Nov. 10, 2017,
now issued as U.S. Pat. No. 10,517,520 on Dec. 31, 2019, which
claims priority to U.S. Provisional Ser. No. 62/420,521 filed Nov.
10, 2016 and that application is incorporated by reference herein,
for all purposes.
BACKGROUND
[0002] The present invention relates to methods and apparatus for
diagnosing cognitive impairment of a subject. In particular, the
present invention relates to methods and an apparatus for
acquisition of eye movement data. More particularly, the present
invention provides methods and an apparatus for acquisition of eye
movement data and gaze information.
[0003] According to embodiments of the present invention,
techniques for processing information associated with eye movement
using web based image-capturing devices is disclosed. Merely by way
of example, the invention can be applied to analysis of information
for determining cognitive performance of subjects.
[0004] Historically, recognition memory of a subject has been
assessed through conventional paper-pencil based task paradigms.
Such tests typically occur in a controlled environment (e.g.
laboratory, doctor's office, etc.) under the guidance of a test
administrator using expensive (e.g. $10k-$80K) systems. Such tests
also require the subject to travel to the laboratory and spend over
an hour preparing for and taking such tests. Typically, a test
administrator shows a series of visual stimuli to subjects at a
certain frequency and rate. After the exposure phase the user waits
for a time delay of over twenty-five minutes before the test
administrator tests the subject's recall of the visual stimuli. In
addition to the visual stimuli and the test administrator, visual
recognition memory paradigms also require response sheets to
facilitate administrator scoring. Although effective, conventional
paradigms are expensive, cumbersome, and subjective.
[0005] From the above, it is seen that techniques for improving
acquisition of eye movement data are highly desired.
SUMMARY
[0006] According to the present invention, techniques for
processing information associated with eye movement using web based
image-capturing devices. Merely by way of example, the invention
can be applied to analysis of information for determining cognitive
performance of subjects.
[0007] In an example, the present invention provides a method for
identifying a feature of an eye of a human user. The method
includes initiating an image capturing device, such as a camera or
other imaging device. In an example, the image capturing device
comprises a plurality of sensors arranged in an array. In an
example, the method includes capturing video information from a
facial region of the human user using the image capturing device.
In an example, the video information is from a stream of video
comprising a plurality of frames.
[0008] In an example, the method includes processing the video
information to parse the video information into the plurality of
images. In an example, each of the plurality of images has a time
stamp from a first time stamp, a second time stamp, to an Nth time
stamp, where N is greater than 10, or other number.
[0009] In an example, the method includes processing each of images
to identify a location of the facial region and processing each of
the images with the location of the facial region to identify a
plurality of landmarks associated with the facial region. In an
example, the method includes processing each of the images with the
location of the facial regions and the plurality of landmarks to
isolation a region including each of the eyes. The method includes
processing each of the regions, frame by frame, to identify a pupil
region for each of the eyes. In an example, the region is
configured as a rectangular region having an x-axis and a y-axis to
border each of the eyes of the human user.
[0010] In an example, the processing comprises at least: processing
the region using a grayscale conversion to output a grayscale
image; processing the grayscale image using an equalization process
to output an equalized image; processing the equalized image using
a thresholding process to output a thresholded image; processing
the thresholded image using a dilation and erosion process to
output a dilated and eroded image; and processing the dilated and
eroded image using a contour and moment process to output a
finalized processed image. Of course, there can be other
variations, modifications, and alternatives.
[0011] In an example, the method includes processing the finalized
processed image to identify a spatial location of each pupil in the
region, each pupil being identified by a two-dimensional spatial
coordinate. The method includes processing information associated
with each pupil identified by the two d-dimensional spatial
coordinate to output a plurality of two-dimensional spatial
coordinates, each of which is in reference to a time, in a
two-dimensional space. The method then includes outputting a gaze
information about the human user. The gaze information includes the
two dimensional spatial coordinates each of which is in reference
to a time in a two dimensional space.
[0012] According to one aspect of the invention a method of
processing information including aligning eye movement with an
image capturing device for detection of cognitive anomalies is
described. One method includes initiating an application, under
control of a processor, to output an image of a frame on a display
device to a user, the display device being coupled to the
processor, the processor being coupled to a communication device
coupled to a network of computers, the network of computers being
coupled to a server device, initiating a camera coupled to the
application to capture a video image of a face of a user, the face
of the user being positioned by the user viewing the display
device, and displaying the video of the image of the face of the
user (e.g. including their eyes, pupils, etc.) on the display
device within a vicinity of the frame being displayed. A process
includes positioning the face of the user within the frame to align
the face to the frame, and capturing an image of the face of the
user, processing captured information regarding the image of the
face to initiate an image capturing process of eye movement of the
user, and outputting an indication on a display after initiation of
the image capturing process; and moving the indication spatially to
one of a plurality of images being displayed on the display device.
A technique includes capturing a video of at least one eye of the
human user, while the user's head/face is maintained within the
viewing display of the device (e.g. visible to the camera), and one
or both eyes of the user moves to track the position of the
indication of the display, the image of each eye comprising a
sclera portion, an iris portion, and a pupil portion, parsing the
video to determine a first reference image corresponding to a first
eye position for a first spatial position for the indication; and a
second reference image corresponding to a second eye position for a
second spatial position of the indication, and correlating each of
the other plurality of images to either the first reference image
or the second reference image. In various embodiments, the parsing
steps may be performed on a user's computing device, or by a remote
server.
[0013] According to another aspect of the invention, a method for
processing information using a web camera, the web camera being
coupled to a computing system is disclosed. One technique includes
placing a user in front of a display device coupled to the
computing system, the computing system being coupled to a worldwide
network of computers, initiating a Neurotrack application stored on
a memory device of the computing system and initiating the web
camera by transferring a selected command from the Neurotrack
application. A process may include capturing an image of a facial
region of the user positioned in front of the display device,
retrieving a plurality of test images from the memory device
coupled to the computing system, the plurality of test images
comprising a first pair of images, a second pair of images, a third
pair of images, etc. (e.g. twentieth pair of images), each of the
pair of images being related to each other, and displaying the
first pair of the test images on the display device to be viewed by
the user. A method may include capturing a plurality of first
images associated with a first eye location while the user is
viewing the first pair of test images, repeating the displaying of
the pairs of images, while replacing one of the previous pairs of
images, and capturing of images for a plurality of second pair of
images to the twentieth pair of images, each of which while the
user is viewing the display device, and capturing a fixation from
an initial point having four regions within a vicinity of the
initial point during the displaying of the pairs of images, while
replacing the previous pair of images, each of the four regions
within about one degree visual angle from the initial point, and an
associated saccade with the fixation. A process may include
processing information to filter the saccade, determining a visual
preference using the fixation on the replaced image from the
plurality of images; and using the visual preference information to
provide the user with feedback.
[0014] According to another aspect of the invention, a method for
playing a matching game on a host computer is disclosed. One
technique may include uploading from the host computer to a remote
computer system, a computer network address for a plurality of
static images, wherein the plurality of static images comprises a
first plurality of static images and a second plurality of static
images. A method may include uploading from the host computer to
the remote computer system, remote computer system executable
software code including: first remote computer system executable
software code that directs the remote computer system to display on
a display of the remote computer system to a player, only static
images from the first plurality of static images but not static
images from the second plurality of static images, wherein each of
the static images from the first plurality of static images is
displayed upon at most half of the display for a first
predetermined amount of time, second remote computer system
executable software code that directs the remote computer system to
inhibit displaying on the display of the remote computer system to
the player, at least one static image from the first plurality of
static images to the player, for a second predetermined amount of
time, third remote computer system executable software code that
directs the remote computer system to simultaneously display on the
display of the remote computer system to the player, a first static
image from the first plurality of static images and a second static
image from the second plurality of static images, wherein the first
static image and the second static image are displayed upon at most
half of the display for a third predetermined amount of time,
fourth remote computer system executable software code that directs
the remote computer system to capture using a web camera of the
remote computer system video data of the player, wherein the video
data captures eye movements of the player while the display of the
remote computer system is displaying to the player the first static
image and the second static image, fifth remote computer system
executable software code that directs the remote computer system to
create edited video data from a subset of the video data in
response to a pre-defined two dimensional area of interest from the
video data, wherein the edited video data has a lower resolution
than the video data, and sixth remote computer system executable
software code that directs the remote computer to provide to the
host computer, the edited video data. A process may include
determining with the host computer a first amount of time
representing an amount of time the player views the first static
image and a second amount of time representing an amount of time
the player views the second static image, in response to the edited
video data, determining with the host computer a viewing
relationship for the player between the second amount of time and
the first amount of time, in response to the first amount of time
and the second amount of time, and determining with the host
computer whether the viewing relationship for the player between
the second amount of time and the first amount of time exceeds a
first threshold and generating a success flag in response thereto.
A technique may include providing from the host computer to the
remote computer system, an indication that the player is
successful, in response to the success flag.
[0015] The above embodiments and implementations are not
necessarily inclusive or exclusive of each other and may be
combined in any manner that is non-conflicting and otherwise
possible, whether they be presented in association with a same, or
a different, embodiment or implementation. The description of one
embodiment or implementation is not intended to be limiting with
respect to other embodiments and/or implementations. Also, any one
or more function, step, operation, or technique described elsewhere
in this specification may, in alternative implementations, be
combined with any one or more function, step, operation, or
technique described in the summary. Thus, the above embodiment
implementations are illustrative, rather than limiting.
BRIEF SUMMARY OF DRAWINGS
[0016] FIGS. 1A-E illustrate a flow diagram according to an
embodiment of the present invention;
[0017] FIG. 2 is a simplified diagram of a process according to an
embodiment of the present invention;
[0018] FIG. 3 is a block diagram of typical computer system
according to various embodiment of the present invention;
[0019] FIG. 4 is a graphical user interface of an embodiment of the
present invention;
[0020] FIG. 5 is a simplified flow diagram of a process according
to an embodiment of the present invention;
[0021] FIG. 6 is a simplified flow diagram of a process according
to an alternative embodiment of the present invention;
[0022] FIGS. 7 through 20 are simplified flow diagrams of various
processes and related applications of using gaze information
according to an embodiment of the present invention;
[0023] FIG. 21 is a simplified block diagram of a process and
module for an apparatus according to an embodiment of the present
invention; and
[0024] FIGS. 22 through 27 are illustrations of screen shots of
paired objects for gaze tracking according to an embodiment of the
present invention
DETAILED DESCRIPTION
[0025] According to the present invention, techniques for
processing information associated with eye movement using web based
image-capturing devices. Merely by way of example, the invention
can be applied to analysis of information for determining cognitive
diseases.
[0026] Without limiting any of the interpretations in the claims,
the following terms have been defined.
[0027] Choroid: Layer containing blood vessels that lines the back
of the eye and is located between the retina (the inner
light-sensitive layer) and the sclera (the outer white eye
wall).
[0028] Ciliary Body: Structure containing muscle and is located
behind the iris, which focuses the lens.
[0029] Cornea: The clear front window of the eye which transmits
and focuses (i.e., sharpness or clarity) light into the eye.
Corrective laser surgery reshapes the cornea, changing the
focus.
[0030] Fovea: The center of the macula, which provides the sharp
vision.
[0031] Iris: The colored part of the eye which helps regulate the
amount of light entering the eye. When there is bright light, the
iris closes the pupil to let in less light. And when there is low
light, the iris opens up the pupil to let in more light.
[0032] Lens: Focuses light rays onto the retina. The lens is
transparent, and can be replaced if necessary. Our lens
deteriorates as we age, resulting in the need for reading glasses.
Intraocular lenses are used to replace lenses clouded by
cataracts.
[0033] Macula: The area in the retina that contains special
light-sensitive cells. In the macula these light-sensitive cells
allow us to see fine details clearly in the center of our visual
field. The deterioration of the macula is commonly occurs with age
(age related macular degeneration or ARMD).
[0034] Optic Nerve: A bundle of more than a million nerve fibers
carrying visual messages from the retina to the brain. (In order to
see, we must have light and our eyes must be connected to the
brain.) Your brain actually controls what you see, since it
combines images. The retina sees images upside down but the brain
turns images right side up. This reversal of the images that we see
is much like a mirror in a camera. Glaucoma is one of the most
common eye conditions related to optic nerve damage.
[0035] Pupil: The dark center opening in the middle of the iris.
The pupil changes size to adjust for the amount of light available
(smaller for bright light and larger for low light). This opening
and closing of light into the eye is much like the aperture in most
35 mm cameras which lets in more or less light depending upon the
conditions.
[0036] Retina: The nerve layer lining the back of the eye. The
retina senses light and creates electrical impulses that are sent
through the optic nerve to the brain.
[0037] Sclera: The white outer coat of the eye, surrounding the
iris.
[0038] Vitreous Humor: The, clear, gelatinous substance filling the
central cavity of the eye.
[0039] Novelty preference: Embodiments of the present invention
assess recognition memory through comparison of the proportion of
time an individual spends viewing a new picture compared to a
picture they have previously seen, i.e., a novelty preference. A
novelty preference, or more time spent looking at the new picture,
is expected in users (e.g. individuals, test subjects, patients)
with normal memory function. By contrast, users with memory
difficulties (cognitive impairments) are characterized by more
equally distributed viewing times between the novel and familiar
pictures. The lack of novelty preference suggests a cognitive
dysfunction with regards to what the subject has already
viewed.
[0040] Cameras capturing images/videos (e.g. web cameras) are
increasingly part of the standard hardware of smart phones, tablets
and laptop computers. The quality and cost of these devices has
allowed for their increased use worldwide and are now a standard
feature on most smart devices, including desktop and laptop
computers, tablets, and smart phones. The inventor of the present
invention has recognized that it is possible to incorporate the use
of such web cameras for visual recognition tasks. In particular,
the inventor has recognized that using such web cameras, he can now
provide web-based administration of visual recognition tasks.
[0041] Advantages to embodiments of the present invention include
that such visual recognition tasks become very convenient for
subjects. Subjects need not travel to and from an administration
facility (e.g. laboratory, doctor's office, etc.) and the subjects
can have such tasks performed from home. Other advantages include
that the visual recognition tasks can be administrated by a
technician remote from the user, or the tasks can be administrated
by a programmed computer.
[0042] Still other advantages to embodiments of the present
invention include that the subject's performance on such tasks may
be evaluated remotely by an administrator or in some instances by a
computer programmed with analysis software. Other advantages
include that the subject's test data may be recorded and later
reviewed by researchers if there is any question about the test
results, whether evaluated by an administrator or by a software
algorithm implemented on a computer.
[0043] FIG. 1 illustrates a flow diagram according to an embodiment
of the present invention. Initially, a subject/user directs their
web browser on their computing device to a web page associated with
embodiments of the present invention, step 100. In various
embodiments, the user may perform this function by entering a URL,
selecting a link or icon, scanning a QR code, or the like. Next, in
some embodiments, the user may register their contact information
to receive their results, or the like.
[0044] Next, in response to the user request, a web server provides
data back to the user's device, step 110. In various embodiments,
the data may include multiple images for use during the recognition
task, as well as program code facilitating the recognition task, as
described below. In some examples, the program code may include
code that may run via the browser, e.g. Adobe Flash code, Java
code, AJAX, HTML5, or like. In other examples, the program code may
be a stand-alone executable application that runs upon the user's
computer (Mac or PC).
[0045] Initially a series of steps are performed that provide a
calibration function. More specifically, in some embodiments the
front-facing camera on a user's computing system (e.g. computer,
smart device, or the like) is turned on and captures images of the
user, step 120. The live images are displayed back to the user on
the display of the computing system, step 130. In some embodiments,
a mask or other overlay is also displayed on the display, and the
user is instructed to either move their head, camera, computing
device, or the like, such that the user's head is within a specific
region, step 140. In some examples, the mask may be a rectangular,
ovoid, circular region, or the like generally within the center of
a field of view of the camera.
[0046] FIG. 4 illustrates an example of an embodiment of the
present invention. More specifically, FIG. 4 illustrates a typical
graphical user interface 700 that is displayed to a user that is
mentioned in FIG. 1A, steps 130-150. As can be seen in GUI 700, the
user is displayed a series of instructions 710, and an overlay
frame 720. Instructions 710 instruct the user how to position their
head 730 relative to overlay frame 720. In this example, the user
moves their head 730 such that their head fits within overlay frame
720 before the calibration process begins.
[0047] Next, in some embodiments, a determination is made as to
whether the eyes, more specifically, the pupils of the user can be
clearly seen in the video images, step 150. This process may
include a number of trial, error, and adjustment feedback by the
computing device and the user. For example, adjustments may be made
to properties of the video camera, such as gain, ISO, brightness,
or the like; adjustments may include instructions to the user to
increase or decrease lighting; or the like. In various embodiments,
this process may include using image recognition techniques for the
user's pupil against the white of the user's eye, to determine
whether the pupil position can be distinguished from the white of
the eye in the video. Once the system determines that the eyes can
be sufficiently tracked, the user is instructed to maintain these
imaging conditions for the duration of the visualization task.
[0048] As illustrated in FIG. 1, the process then includes
displaying a small image (e.g. dot, icon, etc.) on the display and
moving the image around the display and the user is instructed to
follow the image with their eyes, step 160. In various embodiments,
the locations of the dot on the display are preprogrammed and
typically incudes discrete points or continuous paths along the
four corners of the display, as well as near the center of the
display. During this display process, a video of the user's eyes
are recorded by the camera, step 170. In some embodiments, the
video may be the full-frame video captured by the camera or the
video may be a smaller region of the full-frame video. For example,
the smaller region may be roughly the specific region mentioned in
step 140, above (e.g. oval, circle, square, rectangular, etc.); the
smaller region may be a specific region where the user's eyes are
located (e.g. small rectangle, etc.); the smaller region may be a
region capturing the face of a user (e.g. bounding rectangle,
etc.); or the like. Such embodiments may be computationally
advantageous by facilitating or reducing the computations and
analysis performed by the user's computer system or by a remote
computing system. Further, such embodiments may be advantageous by
greatly reducing communications to, data storage of, and
computations by the remote server. In one example, the video is
captured using a video capture program called HDFVR, although any
other programs (e.g. Wowza GoCoder, or the like) may be used other
implementations.
[0049] Next, in various embodiments, an analysis is performed upon
the video captured in step 170 based upon the display in step 160
to determine a gaze model, step 180. More specifically, the
position of the user's pupil with regards to the white of the eye
is analyzed with respect to the locations of the dot on the
display. For example, when the dot is displayed on the upper right
hand side of the display, the position of the user's pupils at the
same time are recorded. This recorded position may be used to
determine a gaze model for the user. In one specific example, the
dot is displayed on the four corners and the center of the display,
and the corresponding positions of the user's pupils are used as
principal components, e.g. eigenvector, for a gaze model for the
user. In other examples, a gaze model may include a larger or
smaller (e.g. two, signifying left and right) number of principal
components. In other embodiments, other representations for gaze
models may be used.
[0050] In various embodiments, the video (or smaller video region)
is combined with metadata, and sent to a remote server (e.g.
analysis server), step 190. In some examples, the metadata in
embedded in the video on a frame by frame basis (e.g. interleaved),
and in other examples, the metadata may be sent separately or at
the end of the video. The inventor believes there are computational
advantages to interleaving metadata with each respective video
image compared to a separate metadata file and video image file.
For example, in some embodiments eye gaze position data for a
specific video image is easily obtained from metadata adjacent to
that frame. In contrast, in cases of a single metadata file, the
computer must maintain an index in the metadata and the video
images and hope that the index synchronization is correct.
[0051] In some embodiments, the metadata may include some
combination, but not necessarily all of the following data, such
as: camera setting data, data associated with the user (e.g.
account name, email address), browser setting data, timing data for
the dots on the display, the gaze model, a determined gaze
position, and the like. As examples, the data may provide a
correspondence between when a dot is positioned on the upper right
corner of the display and an image of how the user's eyes appear in
the video at about the same time; the meta data may include timing
or a series of frame numbers; or the like. In one example, the
combined file or data stream may be a Flash video file, e.g. FLV, a
web real-time communications file (WebRTC), or the like. Further,
the remote server may be a cloud-based video server, such as a
Wowza Amazon web service, or others. In one embodiment, an instance
of Wowza can be used to store the uploaded all of the integrated
video and metadata data discussed herein. In some embodiments, to
reduce communications to, data storage of, and computations by the
remote server, the frame rate of the video transferred is at the
recording frame rate, e.g. 25 frames per second, 60 frames per
second, or the like. however the remote server may record the video
at less than the recording frame rate; in other embodiments, the
frame rate of the transferred video to the remote server may be
less, e.g. from about 2 to 3 frames per second up to the recording
frame rate.
[0052] Next, in various embodiments, a series of steps may be
performed that determine whether the gaze model is usable, or not.
Specifically, the process includes displaying a small dot, similar
to the above, to specific locations on the display, and the user is
instructed to stare at the dot, step 200. As the user watches the
dot, the video camera captures the user's eyes, step 210. Next,
using the full-frame video, or a smaller region of the video,
images representing the pupils of the user's eyes are determined,
step 220.
[0053] In various embodiments, using principal component analysis,
the images of the pupils are matched to the gaze model (e.g.
eigenvectors) to determine the principal components (e.g. higher
order eigenvalues) for the pupils with respect to time. As merely
examples, if the user is looking to the center left of the display
at a particular time, the principal components determined may be
associated with the upper left and lower left of the display, from
the gaze model; if the user is looking to the upper center of the
display at a different time, the principal components determined
may be associated with the center, the upper right and upper left
of the display, from the gaze model; and the like. Other types of
matching algorithms besides principal component analysis may be
implemented in other embodiments of the present invention, such as
least squares, regression analysis, or the like. In still other
embodiments, this process may include determining one or more
visual landmarks of a user's face, and pattern matching techniques
to determine geometric features, e.g. position and shape of the
user's eyes, locations of pupils, direction of pupil gaze, and the
like. In various embodiments, if the images of the pupils
corresponding to the set display positions do not match the gaze
model, the process above may be repeated, step 230.
[0054] In various embodiments, similar to step 190 above, the video
(or smaller video region) may be combined with metadata (e.g.
timing or synchronization data, an indication of where the dot is
on the screen when the image of the user's eyes are captured, and
the like), and sent back to the remote server, step 240.
[0055] Once the gaze model is validated, a series of steps
providing a familiarization phase are performed. More specifically,
in some embodiments, one or more images are displayed to the user
on the display, step 250. In some examples, the images are one that
were provided to the user's computing system in step 110, and in
other examples, (e.g. using AJAX), the images are downloaded
on-demand, e.g. after step 110.
[0056] In various embodiments, the images are specifically designed
for this visualization task. In one example, the images are all
binary images including objects in black over a white background,
although other examples may have different object and background
colors. In some embodiments, the images may be gray scale images
(e.g. 4-bit, 8-bit, etc.), or color images (e.g. 4-bit color, or
greater). Further, in some embodiments, the images are static,
whereas in other embodiments, the images may be animated or moving.
Additionally, in some embodiments, images designed for this
visualization task are specifically designed to have a controlled
number of geometric regions of interest (e.g. visual saliency).
[0057] The number of geometric regions of interest may be
determined based upon experimental data, manual determination, or
via software. For example, to determine experimental data, images
may be displayed to a number of test subjects, and the locations on
the image where the test subjects eyes linger upon for over a
threshold amount of time may be considered a geometric region of
interest. After running such experiments, test images may become
identifiable via the number of geometric regions of interest. As an
example, an image of a triangle may be characterized by three
regions of interest (e.g. the corners), and an image of a smiley
face may be characterized by four regions of interest (e.g. the two
eyes, and the two corners of the mouth). In other embodiments,
geometric regions of interest may be determined using image
processing techniques such as Fourier analysis, morphology, or the
like. In some embodiments, the images presented to the user,
described below, may each have the same number of regions of
interest, or may have different numbers of regions of interest,
based upon specific engineering or research purposes.
[0058] In some embodiments, an image is displayed to the left half
of the display and to the right half of the display for a preset
amount of time. The amount of time may range from about 2 seconds
to about 10 seconds, e.g. 5 seconds. In other embodiments,
different pairs of images may be displayed to the user during this
familiarization phase trial.
[0059] During the display of the images, the video camera captures
the user's eyes, step 260. Next, using the full-frame video, or a
smaller region of the video, images representing the pupils of the
user's eyes are determined. In some embodiments, using principal
component analysis, or the like, of the gaze model, the gaze
position of the user's eyes are determined with respect to time,
step 280. In various embodiments, similar to step 190 above, the
video (or smaller video region) may be combined with metadata (e.g.
including an indication of what is displayed on the screen at the
time the specific image of the user's eyes is captured, etc.), and
the data, or portions of the data may be sent back to the remote
server, step 290. In another embodiment, the video (or smaller
video region) may be combined with metadata (e.g. including an
indication of what is displayed on the screen at the time the
specific image of the user's eyes is captured, etc.), and processed
on the user's device (e.g., computer, phone).
[0060] In various embodiments, this process may then repeat for a
predetermined number of different pictures (or iterations), step
300. In some examples, the process repeats until a predetermined
number sets of images are displayed. In some embodiments, the
predetermined number is within a range of 10 to 20 different sets ,
within a range of 20 to 30 different sets, within a range of 30 to
90 different sets, although different numbers of trials are
contemplated. In some embodiments, the familiarization phase may
take about 1 to 3 minutes, although other durations can be used,
depending upon desired configuration.
[0061] Subsequent to the familiarization phase, a series of steps
providing a test phase are performed. More specifically, in some
embodiments, one image that was displayed within the
familiarization phase is displayed to user along with a novel image
(that was not displayed within the familiarization phase) on the
display, step 310. Similar to the above, in some examples, the
novel images are ones that were provided to the user's computing
system in step 110, whereas in other examples, (e.g. using AJAX),
the images are downloaded on-demand, e.g. after step 110.In some
embodiments, the novel images may be variations of or related to
the familiar images that were previously provided during the
familiarization phase. These variations or related images may be
visually manipulated versions of the familiar images. In some
examples, the novel images may be the familiar image that is
rotated, distorted (e.g. stretched, pin cushioned), resized),
filtered, flipped, and the like, and in other examples, the novel
images may be the familiar image that have slight changes, such as
subtraction of a geometric shape (e.g. addition of a hole),
subtraction of a portion of the familiar image (e.g. removal of a
leg of a picture of a table), addition of an extra geometric
feature (e.g. adding a triangle to an image), and the like. In
various embodiments, the manipulation may be performed on the
server and provided to the user's computing system, or the
manipulation may be performed by the user's computing system
(according to directions from the server). In various
embodiments,
[0062] In various embodiments, the novel images are also
specifically designed to be similar to the images during the
familiarization phase in appearance (e.g. black over white, etc.)
and are designed to have a controlled number of geometric regions
of interest. As an example, the novel images may have the same
number of geometric regions of interest, a higher number of
geometric regions of interest, or the like.
[0063] During the display of the novel and familiar images, the
video camera captures the user's eyes, step 320. Next, using the
full-frame video, or a smaller region of the video, images
representing the pupils of the user's eyes are optionally
determined. In various embodiments, using principal component
analysis, or the like, of the gaze model, the gaze position of the
user's eyes are determined with respect to time, step 340.
[0064] In some embodiments, based upon the gaze position of the
user's eyes during the display of the novel image and the familiar
image (typically with respect to time), a determination is made as
to whether the user gazes at the novel image for a longer duration
compared to the familiar image, step 350. In some embodiments, a
preference for the novel image compared to the familiar image may
be determined based upon gaze time (51% novel to 49% familiar); a
threshold gaze time (e.g. 60% novel to 40% familiar, or the like);
based upon gaze time in combination with a number of geometric
regions of interests (e.g. 4 novel versus 3 familiar); based upon
speed of the gaze between geometric regions of interest (e.g. 30
pixels/second novel versus 50 pixels/second familiar); or the like.
In light of the present patent disclosure, other types of gaze
factors and other proportions of novel versus familiar may be
computed. In various embodiments, the novel image or familiar image
preference is stored as metadata, step 360. In various embodiments,
similar to step 190 above, the video (or smaller video region) may
be combined with metadata (e.g. an indication of which images are
displayed on the right-side or left-side of the display, etc.), and
sent back to the remote server, step 370.
[0065] In various embodiments, this process may then repeat for a
predetermined number of different sets of novel and familiar
images, step 380. In some examples, the testing phase process
repeats until 10 to 20 different sets of images (e.g. iterations)
are displayed to the user, although different numbers of trials
(e.g. 20 to 30 iterations, etc.) are also contemplated. In some
embodiments, novel images that are displayed may have an increasing
or decreasing number of geometric regions of interest as the test
phase iterates, depending upon performance of the user. For
example, if a gaze of a user is not preferencing the novel image
over the familiar image, the next novel image displayed to the user
may have a greater number of geometric regions of interest, and the
like. Other types of dynamic modifications may be made during the
test phase depending upon user performance feedback.
[0066] In some embodiments, after the test phase, the gaze position
data may be reviewed to validate the scores, step 385. In some
embodiments, the gaze position data with respect to time may be
reviewed and/or filtered to remove outliers and noisy data. For
example, if the gaze position data indicate that a user never looks
at the right side of the screen, the gaze model is probably
incorrect calibrated, thus the gaze model and gaze data may be
invalidated; if the gaze position data indicate that the user
constantly looks left and right on the screen, the captured video
may be too noisy for the gaze model to distinguish between the left
and the right, thus the gaze position data may be invalidated; or
the like. In various embodiments, the gaze position data may not
only be able to track right and left preference, but in some
instances nine or more different gaze positions on the display. In
such cases, the gaze position data (for example, a series of (x,y)
coordinate pairs), may be filtered in time, such the filtered gaze
position data is smooth and continuous on the display. Such
validation of gaze position data may automatically performed, or in
some cases, sub-optimally, by humans.
[0067] In various embodiments, after data validation, the
preferencing data determined above in step 350 and or in step 385
may be used to determine a cognitive performance score for the
user, step 390. For example, if the user shows a preference for the
novel image over the familiar image for over about 70% of the time
(e.g. 67%), the user may be given a passing or success score; if
the user has a preference of over about 50% (e.g. 45%) but less
than about 70% (e.g. 67%), the may be given a qualified passing
score; if the user has no preference, e.g. less than about 50%
(e.g. 45%) the user may be given an at risk score or not successful
score. In some embodiments, based upon an at-risk user's score, a
preliminary diagnosis indicator (e.g. what cognitive impairment
they might have) may be given to the user. The number of
classifications as well as the ranges of preference may vary
according to specific requirements of various embodiments of the
present invention. In some embodiments where step 390 is performed
on the user's computer, this data may also be uploaded to the
remote server, whereas if step 390 is performed on a remote server,
this data may be provided to the user's computer. In some
embodiments, the uploaded data is associated with the user in the
remote server. It is contemplated that the user may request that
the performance data be shared with a health care facility via
populating fields in the user's health care records, or on a social
network.
[0068] In some embodiments of the present invention, the user's
computer system may be programmed to perform none, some, or all of
the computations described above (e.g. calibration phase,
validation phase, familiarization phase, and/or test phase). In
cases where not all of the computations are performed by the user's
computer system, a remote server may process the uploaded data
based upon the video images and metadata, e.g. timing,
synchronization data, indication of which images are displayed on
the right-side and the left-side of the display at the time the
image of the user's eyes is captured, and the like. In some
embodiments, the remote server may return the computed data, e.g.
gaze model and principal component analysis results, to the user's
computer system, whereas in other embodiments, such computed data
is only maintained by the remote server.
[0069] In various embodiments the computations performed within the
test phase to determine the preferencing between the novel image
compared to the familiar image may also be partially or completely
performed on the user's computer and/or by the remote server. In
some embodiments, the determination of preferencing may be made by
determining a number of frames having principal components (or
other algorithm) of the novel image compared to the number of
frames having principal components (or other algorithm) of the
familiar image. For example, if the percentage of frames within a
test phase, based upon the user's gaze position, where the user is
looking at the novel image exceeds a threshold, the user may be
considered to successfully pass the test.
[0070] In other embodiments, the evaluation of whether the user is
looking at the novel image or the familiar image may be performed
by one or more individuals coupled to the remote server. For
example, administrators may be presented with the video images of
the user's eyes, and based upon their human judgment, the
administrator may determine whether the user is looking to the left
or to the right of the display, whether the user is blinking,
whether the image quality is poor, and the like. This determination
is then combined by the remote server with the indication of
whether the novel image is displayed on the left or the right of
the display, to determine whether which image the user is looking
at during the human-judged frame. In some initial tests, three or
more administrators are used so a majority vote may be taken. The
inventor is aware that manual intervention may raise the issue of
normal variability of results due to subjectivity of the
individuals judging the images as well as of the user taking the
test, e.g. fatigue, judgment, bias, emotional state, and the like.
Such human judgments may be more accurate in some respects, as
humans can read and take into accounts emotions of the user.
Accordingly, automated judgments made by algorithms run within the
remote server may be less reliable in this respect, as algorithms
to attempt to account for human emotions are not well
understood.
[0071] In some embodiments, the process above may be implemented as
a game, where the user is not told of the significance of the
images or the testing. In such embodiments, feedback may be given
to the user based upon their success in having a preference of the
novel image, step 410. As examples of user feedback may include: a
sound being played such as: a triumphant fanfare, an applause, or
the like; a running score total may increment and when a particular
score is reached a prize may be sent (via mail) to the user; a
video may be played; a cash prize may awarded to the user; a
software program may become unlocked or available for download to
the user; ad-free streaming music may be awarded; tickets to an
event may be awarded; access to a VIP room; or the like. In light
of the present patent disclosure, one of ordinary skill in the art
will recognize many other types of feedback to provide the user in
other embodiments of the present invention.
[0072] In other embodiments of the present invention, if the user
is identified as not being successful or at risk, the user is
identified as a candidate for further testing, step 420. In various
embodiments, the user may be invited to repeat the test; the user
may be invited to participate in further tests (e.g. at a testing
facility, office, lab); the user may be given information as to
possible methods to improve test performance; the user may be
invited to participate in drug or lifestyle studies; the user may
be awarded a care package; or the like. In light of the present
patent disclosure, one of ordinary skill in the art will recognize
many other types of feedback to provide the user in other
embodiments of the present invention. Such offerings for the user
may be made via electronic communication, e.g. e-mail, text, via
telephone call, video call, physical mail, social media, or the
like. Step 430. In other embodiments, prizes, gifts, bonuses, or
the like provided in step 410 may also be provided to the user in
step 430.
[0073] In one embodiment, the user is automatically enrolled into
cognitive decline studies, step 440. As part of such studies, the
user may take experimental drugs or placebos, step 450.
Additionally, or instead, as part of such studies, the user may
make lifestyle changes, such as increasing their exercise, changing
their diet, playing cognitive games (e.g. crossword puzzles,
brain-training games, bridge, or the like.), reducing stress,
adjusting their sleeping patterns, and the like. In some
embodiments, the user may also be compensated for their
participation in such studies by reimbursement of expenses, payment
for time spent, free office visits and lab work, and the like.
[0074] In various embodiments, the user may periodically run the
above-described operations to monitor their cognitive state over
time, step 460. For example, in some embodiments, the user may take
the above test every three to six months (the first being a
baseline), and the changes in the user's performance may be used in
steps 390 and 400, above. More specifically, if the user's
percentage preference for the novel image drops by a certain amount
(e.g. 5%, 10%, etc.) between the tests, step 400 may not be
satisfied, and the user may be identified for further testing.
[0075] In other embodiments of the present invention, various of
the above described steps in FIGS. 1A-1E may be performed by a
remote server, and not the user's computer (e.g. client). As
examples: step 180 may be performed after step 190 in the remote
server; steps 220 and 230 may be performed after step 240 in the
remote server; step 280 may be performed after step 290 in the
remote server; steps 340-360 may be performed after step 370 on the
remote server; steps 390-400 may be performed in the remote server;
steps 410-440 may be performed, in part, by the remote server; and
the like. The division of the processing may be made between the
user's computer and the remote server based upon engineering
requirements or preferences, or the like.
[0076] FIG. 2 is a simplified diagram of a process according to an
embodiment of the present invention. More specifically, FIG. 2
illustrates examples of an image 500 that may be displayed to the
subject within the calibration phase; examples of an image 510 that
may be displayed to the subject within the validation phase;
examples of familiar images 520 that may be displayed to the
subject within the familiarization phase; and examples of an image
530 that may be displayed to the subject within the test phase
including a familiar image 540 and a novel image 550. Additionally,
as shown in FIG. 2, an example 560 of feedback given to the user is
shown. Some embodiments, the feedback may be instantaneous, or
arrive later in the form of an electronic or physical message, or
the like.
[0077] In other embodiments of the present invention, other types
of eye tracking task paradigms and studies may be performed besides
the ones describe above, such as: attention and sequencing tasks,
set-shifting tasks, visual discrimination tasks, and emotional
recognition, bias tasks, and the like. Additional studies may
include processing of additional biological information including
blood flow, heart rate, pupil diameter, pupil dilation, pupil
constriction, saccade metrics, and the like. The process through
which biological data and cognitive task performance is collected
may be similar to one of the embodiments described above.
Additionally, scoring procedures can determine the location and
change over time of various landmarks of the participant's face
(e.g., pupil diameter, pupil dilation). These procedures can also
estimate the eye gaze position of each video frame.
[0078] In alternative embodiments, referring below to FIGS. 7
through 20, we show simplified flow diagrams of various processes
and related applications of using gaze information according to an
embodiment of the present invention. In an example, the processes
can be configured with additional applications, such as those
provided below.
Visual Image Pairs
[0079] In an example, the present process includes a visual image
pairs ("Image Pairs") process. As shown referring to FIG. 7, the
image pairs process comprises a multimodal memory assessment that
has two tasks. In an example, the process includes outputting
semantic pairs to a display during a learning phase. Concurrently,
video images of a subject are captured while viewing the semantic
pairs to learn. Using the present techniques, the process
determines the subject's gaze position based upon a gaze model for
the subject and related video images. In an example, video images
and metadata are transferred to via a network to a remove server.
The process repeats these steps until the learning phase is
complete. Further details of a test phase and a gaze process can be
found throughout the present specification and more particularly
below.
[0080] Now referring to FIG. 8 the process captures video images of
a subject while viewing the learned pairs and novel pairs in a test
phase. Using the present techniques, the process determines the
subject's gaze position based upon a gaze model for the subject and
video images. In an example, the process determines the subject's
visual recognition memory based upon the gaze location and duration
with respect to time, or on a time based metric. In an example, the
process stores the subject's visual recognition as metadata in
storage. In an example, the video images and metadata are provided
and stored in a remoter server. The process then determines whether
the test phase is completed, and then manually or automatically
reviews the subject's gaze position data for the test phase.
Further details of a test phase and a gaze process can be found
throughout the present specification and more particularly
below.
[0081] By collecting two measures of memory performance during one
test administration, the test can identify single-domain mild
cognitive impairment with a relatively low burden on the
test-taker. The first part of the test is a visual paired
comparison (VPC) task that utilizes webcam-based eye tracking data
to assess visual recognition memory. VPC tasks are a method of
memory assessment. Briefly, participants are shown a series of
identical image pairs during the familiarization phase, followed by
a series of about twenty (20) disparate image pairs (containing one
image from the familiarization phase and one novel image) during
the testing phase. VPC tasks produce a novelty preference score by
quantifying the amount of time a person spends viewing the new
images compared to familiar images during the testing phase. VPC
tasks can stratify populations by memory function, with higher
novelty preference scores indicating normal memory and lower
novelty preference scores indicating impaired memory function. The
second part of the Image Pairs test is a visual paired recognition
(PR) task that utilizes haptic feedback data in addition to eye
tracking metrics to provide a second measure of visual learning and
memory. This task utilizes a paired-associate learning paradigm,
which stratify clinical populations by memory function. This part
of the test assesses the participant's learning and memory of the
image pairs shown during the VPC task. Participants are instructed
to discriminate between image pairs that exactly match the pairs
viewed during the VPC testing phase and those that do not across
fifty (50) trials. The image pairs used during this task contain a
mix of identical images from the previous task (targets), altered
images from the previous task (foils), and entirely new images
(shams). Outcome variables include target accuracy, foil accuracy,
sham accuracy, reaction time, and d-prime. In an example, d-prime
is a sensitivity index or d (pronounced `dee-prime`) is a statistic
used in signal detection theory. D-prime provides the separation
between the means of the signal and the noise distributions,
compared against the standard deviation of the signal or noise
distribution.
Visual Semantic Paired Associates
[0082] In an example, the Visual Semantic Paired Associates (VSPA)
is another task that measures learning and memory. Participants are
shown pairs of words across multiple categories in a learning trial
and asked to remember as many of the pairs as they can. A
recognition trial is then administered to measure their learning of
the first set of words. Participants are then shown a second list
of words with some de-coupled words from first list (proactive
interference) and tested on a recognition trial on the second list,
then another recognition trial will re-test the first list
(retroactive interference). Alternate versions of the test allow
for repeated testing. Outcome variables include target accuracy, d
prime, reaction time, gaze location, gaze duration, and other eye
patterns indicative of learning and memory.
[0083] In an example, the present invention provides a method for
processing information from a user. In an example, in a learning
phase, the method includes displaying a pair of learning words on a
display coupled to a computing device, and then repeating a
learning process of displaying a pair of other learning words
numbered from 2 to N on the display of the computing device, where
N is an integer from 6 to 16, or can be other integers. In an
example, in a testing phase, the method includes displaying a pair
of testing words on the display of the computing device, and then
repeating a testing process of displaying a pair of other testing
words numbered from 2 to M, where M is an integer from 6 to 16,
although there can be other integers.
[0084] In an example, each pair of testing words is categorized as
either (1) equal to a pair of learning words, (2) different from a
pair of learning words where each learning word is different from
each testing word or one of the learning words is the same as one
of the testing words and the other learning word is different form
the testing word in the pair.
[0085] In an example, for each pair of words displayed in the
learning phase and the testing phase, the method performs capturing
eye tracking information, including gaze information, from a user;
capturing an input response information from the user for each pair
of words in the testing phase; and storing the eye tracking
information and the input response information in a memory device
coupled to the computing device.
[0086] In an example, the method further comprising processing the
eye tracking information and the input response information during
the learning phase to form a base line composite; and processing
the eye tracking information and the input response information
during the testing phase to form a test phase composite. In an
example, the method further processing the base line composite and
the test phase composite to output a resulting composite. In an
example, the resulting composite is associated with a
characteristic of the user. In an example, the characteristic is
selected from an emotional state, a mental state, and a state of
depression, a state of anxiety, a state of fatigue, and a state of
confidence.
[0087] In an example, capturing of the eye tracking information
comprises one or more of the following steps: initiating an image
capturing device, the image capturing device comprising a plurality
of sensors arranged in an array; capturing information from a
facial region of the user using the image capturing device, the
information comprising a plurality of frames; processing the
information to parse the information into the plurality of images,
each of the plurality of images having a time stamp from a first
time stamp, a second time stamp, to an Nth time stamp, where N is
greater than 10, although there can be other numbers; processing
each of images to identify a location of the facial region;
processing each of the images with the location of the facial
region to identify a plurality of landmarks associated with the
facial region; processing each of the images with the location of
the facial regions and the plurality of landmarks to isolation a
region including each of the eyes; processing each of the regions,
frame by frame, to identify a pupil region for each of the eyes,
the processing comprising at least: processing the region using a
grayscale conversion to output a grayscale image; processing the
grayscale image using an equalization process to output an
equalized image; processing the equalized image using a
thresholding process to output a thresholded image; processing the
thresholded image using a dilation and erosion process to output a
dilated and eroded image; and processing the dilated and eroded
image using a contour and moment process to output a finalized
processed image; processing the finalized processed image to
identify a spatial location of each pupil in the region, each pupil
being identified by a two-dimensional spatial coordinate;
processing information associated with each pupil identified by the
two d-dimensional spatial coordinate to output a plurality of
two-dimensional spatial coordinates, each of which is in reference
to a time, in a two-dimensional space; and outputting a gaze
information about the user.
[0088] In an example, the region is configured as a rectangular
region having an x-axis and a y-axis to border each of the eyes of
the human user. In an example, the method further comprising
associating the gaze information with one or more cognitive
assessment constructs, each of the cognitive assessment constructs
being stored in memory of the computing device. In an example, the
image capturing device is provided in a computer, a mobile device,
a smart phone, or other end user computing device. In an example,
the plurality of video frames is one of a plurality of images. In
an example, each of the plurality of images that have been parsed
comprises RGB information associated with the human user. In an
example, the method further comprising transferring the video
information through a network to a server device, the server device
being coupled to the network.
[0089] In an example, the processing the region using the grayscale
conversion retrieved from a library provided in memory to output a
grayscale image; wherein the processing the grayscale image using
the equalization process retrieved from a library provided in the
memory to output an equalized image; wherein processing the
equalized image using the thresholding process retrieved from a
library provided in the memory to output a thresholded image;
wherein the processing the thresholded image using a dilation and
erosion process retrieved from a library from memory to output a
dilated and eroded image; and wherein the processing the dilated
and eroded image using a contour and movement process retrieved
from a library from the memory to output a finalized processed
image.
[0090] In an example, the capturing of the input response
information comprises receiving a signal initiated by the user from
an input device in response to the display of each pair of words.
In an example, the input device is one of a mouse, a key from a key
board, or a combination of the mouse or the key.
[0091] Of course, there can be other variations, modifications, and
alternatives.
[0092] In an example, the present techniques for eye tracking for
gaze information is combined with voice, key input tracking data,
or other input devices. In an example, the gaze information along
with the other information is stored into a database and forms a
baseline, which can be used for comparison against other time
frames for a particular user.
[0093] In an example, the present technique initiates a display for
outputting a pair of words, such as "lion" and "coyote" or other
pairs. A number of pairs is displayed during a testing phase. A
typical number of pairs can be eight to sixteen but can also be
fewer or more, depending upon the application. An example of a pair
of words can be illustrated in the Figures further described
below.
Paired Symbol Digit Comparison
[0094] Referring now to FIG. 9, the process includes a paired
symbol digit comparison. As shown, the process outputs symbol pairs
to a display during a test trial. The process captures video images
of a subject while viewing the symbol pairs to compare in a test
phase. Using the present techniques, the process determines the
subject's gaze position based upon a gaze model for the subject and
video images. In an example, the process determines the subject's
visual recognition memory based upon the gaze location and duration
with respect to time, or on a time based metric. In an example, the
process stores the subject's visual recognition as metadata in
storage. In an example, the video images and metadata are provided
and stored in a remoter server. The process then determines whether
the test phase is completed, and then manually or automatically
reviews the subject's gaze position data for the test phase.
Further details of a test phase and a gaze process can be found
throughout the present specification and more particularly
below.
[0095] In an example, paired symbol digit is a processing speed and
executive functioning task that utilizes a paired verification or
rejection paradigm (forced choice). Participants are instructed to
determine whether two symbols are equal or unequal utilizing a
legend with nine number/symbol pairs. At the conclusion of the
task, a brief implicit learning trial is administered without the
legend present. Outcome variables include target accuracy, d prime,
reaction time, gaze location, gaze duration, and other eye patterns
indicative of learning and memory.
Paired Arithmetic Comparison
[0096] Referring now to FIG. 10, the process includes a paired
arithmetic comparison. As shown, the process outputs symbol pairs
to a display during a test trial. The process captures video images
of a subject while viewing the symbol pairs to compare in a test
phase. Using the present techniques, the process determines the
subject's gaze position based upon a gaze model for the subject and
video images. In an example, the process determines the subject's
visual recognition memory based upon the gaze location and duration
with respect to time, or on a time based metric. In an example, the
process stores the subject's visual recognition as metadata in
storage. In an example, the video images and metadata are provided
and stored in a remoter server. The process then determines whether
the test phase is completed, and then manually or automatically
reviews the subject's gaze position data for the test phase.
Further details of a test phase and a gaze process can be found
throughout the present specification and more particularly
below.
[0097] In an example, paired arithmetic is a processing speed and
mental arithmetic task that utilizes a paired verification or
rejection paradigm (forced choice). Participants are instructed to
determine whether the two arithmetic equations (e.g., addition,
subtraction, etc.) are equal or unequal. Outcome variables include
target accuracy, d prime, reaction time, gaze location, gaze
duration, and other eye patterns indicative of learning and
memory.
Paired Line Orientation
[0098] Referring now to FIG. 11, the process includes a paired line
orientation. As shown, the process outputs line pairs to a display
during a test trial. The process captures video images of a subject
while viewing the line pairs to compare in a test phase. Using the
present techniques, the process determines the subject's gaze
position based upon a gaze model for the subject and video images.
The process transfers the video images and metadata through a
network into a remote server. In an example, the process determines
the subject's visual recognition performance based upon the gaze
location and duration with respect to time, or on a time based
metric. In an example, the process stores the subject's visual
recognition as metadata in storage. In an example, the video images
and metadata are provided and stored in a remoter server. The
process then determines whether the test phase is completed, and
then manually or automatically reviews the subject's gaze position
data for the test phase. Further details of a test phase and a gaze
process can be found throughout the present specification and more
particularly below.
[0099] In an example, paired line orientation is a speeded visual
discrimination and spatial working memory task utilizing a paired
comparison paradigm. The task requires participants to choose which
of two angled lines is parallel to a model line exposed for a brief
period of time followed by a brief delay. Outcome variables include
target accuracy, reaction time, gaze location and gaze
duration.
Paired Line Length
[0100] Referring now to FIG. 12, the process includes a paired line
length. As shown, the process outputs line pairs to a display
during a test trial. The process captures video images of a subject
while viewing the line pairs to compare in a test phase. Using the
present techniques, the process determines the subject's gaze
position based upon a gaze model for the subject and video images.
The process transfers the video images and metadata thorugh a
network into a remote server. In an example, the process determines
the subject's visual recognition performance based upon the gaze
location and duration with respect to time, or on a time based
metric. In an example, the process stores the subject's visual
recognition as metadata in storage. In an example, the video images
and metadata are provided and stored in a remoter server. The
process then determines whether the test phase is completed, and
then manually or automatically reviews the subject's gaze position
data for the test phase. Further details of a test phase and a gaze
process can be found throughout the present specification and more
particularly below.
[0101] In an example, paired line length is a speeded visual
discrimination and spatial working memory task utilizing a paired
comparison paradigm. The task requires participants to choose which
of two lines is longer with lines offset to one another at
different positions. Lines are presented for a period of time and
participants are asked to respond. Outcome variables include target
accuracy, reaction time, gaze location and gaze duration.
Paired Feature Binding
[0102] Referring now to FIG. 13, the process includes a paired
feature binding. As shown, the process outputs figure pairs to a
display during a test trial. The process captures video images of a
subject while viewing the line pairs to compare in a test phase.
Using the present techniques, the process determines the subject's
gaze position based upon a gaze model for the subject and video
images. The process transfers the video images and metadata through
a network into a remote server. In an example, the process
determines the subject's visual recognition performance based upon
the gaze location and duration with respect to time, or on a time
based metric. In an example, the process stores the subject's
visual recognition as metadata in storage. In an example, the video
images and metadata are provided and stored in a remoter server.
The process then determines whether the test phase is completed,
and then manually or automatically reviews the subject's gaze
position data for the test phase. Further details of a test phase
and a gaze process can be found throughout the present
specification and more particularly below.
[0103] In an example, paired feature binding is a speeded visual
discrimination and spatial working memory task. The task requires
participants to choose whether two images have remained the same
with respect to features such as location, color, shape, glyph, or
number. Various figures (or drawings) are presented during a
familiarization phase, then a brief delay followed by a test
phase.
Paired Price Comparison
[0104] Referring now to FIGS. 14 and 15, the process includes a
learning phase and a test phase for a paired price comparison. In
the learning phase, the process outputs an item with seven price
pairs to display for a subject. The process captures video images
of the subject while the subject is viewing the item price pairs to
learn. The process determines the subject's gaze position based
upon a gaze model for the subject and video images. In an example,
the video images and metadata are transferred and stored to a
remote server. The process determines whether the learning phase is
complete, or repeats any of the aforementioned steps. The process
outputs the learning item price pairs and novel pairs to display
during a test phase. Further details of a test phase and a gaze
process can be found throughout the present specification and more
particularly below.
[0105] In the test phase, video images of the subject are captured
while the subject is viewing the learned pairs and novel pairs. The
process determines the subject's gaze position based upon a gaze
model for the subject and the video images. In an example, the
process determines the subject visual recognition memory based upon
the gaze location and duration with respect to time or other time
frame. In an example, the process determines whether the test is
complete, or repeats any of the aforementioned steps. The process
manually or automatically reviews the subject's gaze position data
in association with the learned pairs and/or novel pairs. Further
details of a test phase and a gaze process can be found throughout
the present specification and more particularly below.
[0106] In an example, paired price comparison is a brief visual
paired associate paradigm. This task requires participants to learn
eight (8) food/price pairs and discriminate between target and foil
pairs during twenty-four (24) paired recognition trials, although
there can be other variations. Outcome variables include target
accuracy, d prime, reaction time, gaze location, gaze duration, and
other eye patterns indicative of learning and memory.
Sequencing
[0107] In an example, sequencing comprises of two parts (I&II)
in which the participant is instructed to connect a series of dots
as quickly as possible while still maintaining accuracy. In part
one the participant needs to correctly connect lines in numerical
order. In the second part the participant is required to sequence
numbers and letters in alternating order while preserving numerical
and alphabetical order. The test provides information about visual
search speed, scanning, speed of processing, mental flexibility,
and executive functioning. Outcome variables include gaze location,
gaze duration, completion time, and errors.
[0108] In the test phase in FIG. 16, the process provides for
sequencing. In an example, the process outputs alphanumeric stimuli
to a display during a test trial. In an example, video images of
the subject are captured while the subject is viewing the stimuli
to sequence. The process determines the subject's gaze position
based upon a gaze model for the subject and the video images. In an
example, the process transfers the video images and metadata to a
remote server for storage. In an example, the process determines
the subject's visual recognition performance based upon the gaze
location and duration with respect to time or other time frame. The
process transfers the subject's visual recognition data as metadata
to a remove server or memory for storage. In an example, the
process determines whether the test is complete, or repeats any of
the aforementioned steps. The process manually or automatically
reviews the subject's gaze position data. Further details of a test
phase and a gaze process can be found throughout the present
specification and more particularly below.
Mazes
[0109] In an example, mazes comprise of two types of maze
completion tasks. In the first type, a participant is required to
complete a maze that has no route choices. In the second type, a
participant is required to complete a maze that has choice points
in order to successfully complete the maze. Outcome variables
include gaze location, gaze duration, completion time, and
errors.
[0110] In the test phase in FIG. 17, the process provides for a
maze. In an example, the process outputs a maze to a display during
a test trial. In an example, video images of the subject are
captured while the subject is viewing the maze to complete. The
process determines the subject's gaze position based upon a gaze
model for the subject and the video images. In an example, the
process transfers the video images and metadata to a remote server
for storage. In an example, the process determines the subject's
visual recognition performance based upon the gaze location and
duration with respect to time or other time frame. The process
transfers the subject's visual recognition data as metadata to a
remove server or memory for storage. In an example, the process
determines whether the test is complete, or repeats any of the
aforementioned steps. The process manually or automatically reviews
the subject's gaze position data. Further details of a test phase
and a gaze process can be found throughout the present
specification and more particularly below.
[0111] Saccades-In an example, saccades is an eye movement-based
task. The participant is administered three blocks of trials in
which subjects look at a fixation point in the center of the tablet
screen and move their eyes upon presentation of a presented
stimulus. In the first block participants are instructed to follow
a stimulus traveling in a pattern on the screen (smooth pursuit),
in the second block participants are instructed to move their eyes
in the direction of the presented stimulus (pro-saccade). In the
third block (anti-saccade), participants are instructed to move
their eyes in the opposite direction of the presented stimulus.
[0112] In the test phase in FIG. 18, the process provides for a
saccades. In an example, the process outputs a visual stimuli to a
display during a test trial. In an example, video images of the
subject are captured while the subject is viewing the visual
stimuli on which to fixate. The process determines the subject's
gaze position based upon a gaze model for the subject and the video
images. In an example, the process transfers the video images and
metadata to a remote server for storage. In an example, the process
determines the subject's visual recognition performance based upon
the gaze location and duration with respect to time or other time
frame. The process transfers the subject's visual recognition data
as metadata to a remove server or memory for storage. In an
example, the process determines whether the test is complete, or
repeats any of the aforementioned steps. The process manually or
automatically reviews the subject's gaze position data. Further
details of a test phase and a gaze process can be found throughout
the present specification and more particularly below.
Sustained Attention
[0113] In an example, in this task participants are shown a series
of either letters or numbers and are asked to respond via screen
touch or key press when a specific number, letter, or number or
letter combination has been displayed. Multiple series of numbers
and letters are presented over a sustained period of time requiring
the participant to remained focused and attentive. Outcome
variables include gaze location, gaze duration, task accuracy, task
errors, and reaction time.
[0114] In the test phase in FIG. 19, the process provides for a
sustained attention. In an example, the process outputs an
alphanumeric stimuli to a display during a test trial. In an
example, video images of the subject are captured while the subject
is viewing the picture to describe. The process determines the
subject's gaze position based upon a gaze model for the subject and
the video images. In an example, the process transfers the video
images and metadata to a remote server for storage. In an example,
the process determines the subject's visual recognition performance
based upon the gaze location and duration with respect to time or
other time frame. The process transfers the subject's visual
recognition data as metadata to a remove server or memory for
storage. In an example, the process determines whether the test is
complete, or repeats any of the aforementioned steps. The process
manually or automatically reviews the subject's gaze position data.
Further details of a test phase and a gaze process can be found
throughout the present specification and more particularly
below.
Verbal Fluency
[0115] In an example, in this task participants are shown rules for
the kinds of words they should generate as rapidly as possible over
the course of a minute, or other length of time. The location and
duration of gaze is recorded while the participant generates words
belonging to specific category, alternating between categories, or
beginning with a specific letter. Outcome variables include gaze
location, gaze duration, number of words, number of pauses, length
of words, length of pauses, and relationships of words to one
another.
Picture Naming
[0116] In an example, in this task participants are shown images of
objects of varying lexical frequencies and are asked to name the
objects aloud under a time constraint. The location and duration of
gaze is recorded while the participant generates words. Outcome
variables include gaze location, gaze duration, number of words,
number of pauses, length of words, and length of pauses.
Picture Description
[0117] In an example, In this task participants are shown a picture
and asked to describe everything they see over the course of 1-3
minutes, or other length of time. The location and duration of gaze
is recorded while the participant is describing the picture.
Outcome variables include gaze location, gaze duration, number of
words, number of pauses, length of words, length of pauses, and
syntax of words.
[0118] In the test phase in FIG. 20, the process provides for a
picture description. In an example, the process outputs a picture
to a display during a test trial. In an example, video images of
the subject are captured while the subject is viewing the picture
to describe. The process determines the subject's gaze position
based upon a gaze model for the subject and the video images. In an
example, the process transfers the video images and metadata to a
remote server for storage. In an example, the process determines
the subject's visual recognition performance based upon the gaze
location and duration with respect to time or other time frame. The
process transfers the subject's visual recognition data as metadata
to a remove server or memory for storage. In an example, the
process determines whether the test is complete, or repeats any of
the aforementioned steps. The process manually or automatically
reviews the subject's gaze position data. Further details of a test
phase and a gaze process can be found throughout the present
specification and more particularly below.
[0119] Further details of certain hardware elements and the system
can be found throughout the present specification and more
particularly below.
[0120] FIG. 3 illustrates a functional block diagram of various
embodiments of the present invention. A computer system 600 may
represent a desktop or lap top computer, a server, a smart device,
tablet, a smart phone, or other computational device. In FIG. 3,
computing device 600 may include an applications processor 610,
memory 620, a touch screen display 630 and driver 640, an image
acquisition device (e.g. video camera) 650, audio input/output
devices 660, and the like. Additional communications from and to
computing device are typically provided by via a wired interface
670, a GPS/Wi-Fi/ Bluetooth interface 680, RF interfaces 690 and
driver 700, and the like. Also included in some embodiments are
physical sensors 710.
[0121] In various embodiments, computing device 600 may be a
hand-held computing device (e.g. Apple iPad, Amazon Fire, Microsoft
Surface, Samsung Galaxy Noe, an Android Tablet); a smart phone
(e.g. Apple iPhone, Motorola Moto series, Google Pixel, Samsung
Galaxy S); a portable computer (e.g. Microsoft Surface, Lenovo
ThinkPad, etc.), a reading device (e.g. Amazon Kindle, Barnes and
Noble Nook); a headset (e.g. Oculus Rift, HTC Vive, Sony
PlaystationVR) (in such embodiments, motion tracking of the head
may be used in place of, or in addition to eye tracking); or the
like.
[0122] Typically, computing device 600 may include one or more
processors 610. Such processors 610 may also be termed application
processors, and may include a processor core, a video/graphics
core, and other cores. Processors 610 may be a processor from Apple
(e.g. A9, A10), NVidia (e.g. Tegra), Intel (Core, Xeon), Marvell
(Armada), Qualcomm (Snapdragon), Samsung (Exynos), TI, NXP, AMD
Opteron, or the like. In various embodiments, the processor core
may be based upon an ARM Holdings processor such as the Cortex or
ARM series processors, or the like. Further, in various
embodiments, a video/graphics processing unit may be included, such
as an AMD Radeon processor, NVidia GeForce processor, integrated
graphics (e.g. Intel) or the like. Other processing capability may
include audio processors, interface controllers, and the like. It
is contemplated that other existing and/or later-developed
processors may be used in various embodiments of the present
invention.
[0123] In various embodiments, memory 620 may include different
types of memory (including memory controllers), such as flash
memory (e.g. NOR, NAND), pseudo SRAM, DDR SDRAM, or the like.
Memory 620 may be fixed within computing device 600 or removable
(e.g. SD, SDHC, MMC, MINI SD, MICRO SD, CF, SIM). The above are
examples of computer readable tangible media that may be used to
store embodiments of the present invention, such as
computer-executable software code (e.g. firmware, application
programs), application data, operating system data, images to
display to a subject, or the like. It is contemplated that other
existing and/or later-developed memory and memory technology may be
used in various embodiments of the present invention.
[0124] In various embodiments, touch screen display 630 and driver
640 may be based upon a variety of later-developed or current touch
screen technology including resistive displays, capacitive
displays, optical sensor displays, electromagnetic resonance, or
the like. Additionally, touch screen display 630 may include single
touch or multiple-touch sensing capability. Any later-developed or
conventional output display technology may be used for the output
display, such as IPS-LCD, OLED, Plasma, or the like. In various
embodiments, the resolution of such displays and the resolution of
such touch sensors may be set based upon engineering or
non-engineering factors (e.g. sales, marketing). In some
embodiments of the present invention, a display output port may be
provided based upon: HDMI, DVI, USB 3.X, DisplayPort, or the
like.
[0125] In some embodiments of the present invention, image capture
device 650 may include a sensor, driver, lens and the like. The
sensor may be based upon any later-developed or convention sensor
technology, such as CMOS, CCD, or the like. In some embodiments,
multiple image capture devices 650 are used. For example, smart
phones typically have a rear-facing camera, and a front-facing
camera (facing the user, as the viewer views the display. In
various embodiments of the present invention, image recognition
software programs are provided to process the image data. For
example, such software may provide functionality such as: facial
recognition, head tracking, camera parameter control, eye tracking
or the like as provided by either the operating system, embodiments
of the present invention, or combinations thereof.
[0126] In various embodiments, audio input/output 660 may include
conventional microphone(s)/speakers. In some embodiments of the
present invention, three-wire or four-wire audio connector ports
are included to enable the user to use an external audio device
such as external speakers, headphones or combination
headphone/microphones. In some embodiments, this may be performed
wirelessly. In various embodiments, voice processing and/or
recognition software may be provided to applications processor 610
to enable the user to operate computing device 600 by stating voice
commands. Additionally, a speech engine may be provided in various
embodiments to enable computing device 600 to provide audio status
messages, audio response messages, or the like.
[0127] In various embodiments, wired interface 670 may be used to
provide data transfers between computing device 600 and an external
source, such as a computer, a remote server, a storage network,
another computing device 600, or the like. Such data may include
application data, operating system data, firmware, embodiments of
the present invention, or the like. Embodiments may include any
later-developed or conventional physical interface/protocol, such
as: USB 2.x or 3.x, micro USB, mini USB, Firewire, Apple Lightning
connector, Ethernet, POTS, or the like. Additionally, software that
enables communications over such networks is typically
provided.
[0128] In various embodiments, a wireless interface 680 may also be
provided to provide wireless data transfers between computing
device 600 and external sources, such as remote computers, storage
networks, headphones, microphones, cameras, or the like. As
illustrated in FIG. 3, wireless protocols may include Wi-Fi (e.g.
IEEE 802.11x, WiMax), Bluetooth, IR, near field communication
(NFC), ZigBee and the like.
[0129] GPS receiving capability may also be included in various
embodiments of the present invention, however is not required. As
illustrated in FIG. 3, GPS functionality is included as part of
wireless interface 680 merely for sake of convenience, although in
implementation, such functionality may be performed by circuitry
that is distinct from the Wi-Fi circuitry and distinct from the
Bluetooth circuitry.
[0130] Additional wireless communications may be provided via
additional RF interfaces 690 and drivers 700 in various
embodiments. In various embodiments, RF interfaces 690 may support
any future-developed or conventional radio frequency communications
protocol, such as CDMA-based protocols (e.g. WCDMA), G4, GSM-based
protocols, HSUPA-based protocols, or the like. In the embodiments
illustrated, driver 700 is illustrated as being distinct from
applications processor 610. However, in some embodiments, these
functionality are provided upon a single IC package, for example
the Marvel PXA330 processor, and the like. It is contemplated that
some embodiments of computing device 600 need not include the RF
functionality provided by RF interface 690 and driver 700.
[0131] FIG. 3 also illustrates that various embodiments of
computing device 600 may include physical sensors 710. In various
embodiments of the present invention, physical sensors 710 are
multi-axis Micro-Electro-Mechanical Systems (MEMS). Physical
sensors 710 may include three-axis sensors (linear, gyro or
magnetic); three-axis sensors (linear, gyro or magnetic); six-axis
motion sensors (combination of linear, gyro, and/or magnetic);
ten-axis sensors (linear, gyro, magnetic, pressure); and various
combinations thereof. In various embodiments of the present
invention, conventional physical sensors 710 from Bosch,
STMicroelectronics, Analog Devices, Kionix, Invensense, mCube, or
the like may be used.
[0132] In some embodiments, computing device 600 may include a
printer 740 for providing printed media to the user. Typical types
of printers may include an inkjet printer, a laser printer, a
photographic printer (e.g. Polaroid-type instant photos), or the
like. In various embodiments, printer 740 may be used to print out
textual data to the user, e.g. instructions; print out photographs
for the user, e.g. self-portraits; print out tickets or receipts
that include custom bar codes, e.g. QR codes, URLs, etc.; or the
like.
[0133] In various embodiments, any number of future developed or
current operating systems may be supported, such as iPhone OS (e.g.
iOS), Windows, Google Android, or the like. In various embodiments
of the present invention, the operating system may be a
multi-threaded multi-tasking operating system. Accordingly, inputs
and/or outputs from and to touch screen display 630 and driver 640
and inputs/or outputs to physical sensors 710 may be processed in
parallel processing threads. In other embodiments, such events or
outputs may be processed serially, or the like. Inputs and outputs
from other functional blocks may also be processed in parallel or
serially, in other embodiments of the present invention, such as
image acquisition device 650 and physical sensors 710.
[0134] In some embodiments, such as a kiosk-type computing device
600, a dispenser mechanism 720 may be provided, as well as an
inventory of items 730 to dispense. In various examples any number
of mechanisms may be used to dispense an item 730, such as: a
gum-ball-type mechanism (e.g. a rotating template), a snack-food
vending-machine-type mechanism (e.g. rotating spiral, sliding
doors, etc.); a can or bottle soft-drink dispensing mechanism; or
the like. Such dispensing mechanisms are under the control of
processor 610. With such embodiments, a user may walk up to the
kiosk and interact with the process described in FIGS. 1A-1E. Based
upon the test results, processor may activate the dispenser
mechanism 720 and dispense one or more of the items 730 to the user
in FIG. 1E, steps 410, 430, 440 or 450.
[0135] FIG. 3 is representative of one computing device 600 capable
of embodying the present invention. It will be readily apparent to
one of ordinary skill in the art that many other hardware and
software configurations are suitable for use with the present
invention. Embodiments of the present invention may include at
least some but need not include all of the functional blocks
illustrated in FIG. 3. For example, in various embodiments,
computing device 600 may lack touchscreen display 1130 and
touchscreen driver 1140, or RF interface 690 and/or driver 700, or
GPS capability, or the like. Additional functions may also be added
to various embodiments of computing device 600, such as a physical
keyboard, an additional image acquisition device, a trackball or
trackpad, a joystick, an internal power supply (e.g. battery), or
the like. Further, it should be understood that multiple functional
blocks may be embodied into a single physical package or device,
and various functional blocks may be divided and be performed among
separate physical packages or devices.
[0136] In some embodiments of the present invention, computing
device 600 may be a kiosk structure. Further, in some instances the
kiosk may dispense an item, such as a placebo drug, a drug study
medication, different types of foods (e.g. snacks, gum, candies),
different types of drinks (e.g. placebo drink, drug study drink),
and the like. In some instances, an item may be informational data
printed by printer 740 related to the performance of the user (e.g.
life style advice, eating well information, etc.). In additional
instances, an item (e.g. a ticket or stub) may include a custom
URL, bar code (e.g. 2D bar code, QR code), or the like, that links
to a web site that is has access to the user's test results. It is
contemplated that the linked site may be associated with a testing
organization, a drug study site associated with a pharmaceutical
company, a travel web site, an e-commerce web site, or the like. In
such cases, for privacy purposes, it is contemplated that the user
will remain anonymous to the linked site, until the user chooses to
register their information. In still other instances, an item may
be a picture of the user (e.g. a souvenir photo, a series of candid
photographs, or the like), in some instances in conjunction with
the informational data or link data described above. In yet other
embodiments, the kiosk may be mobile, and the kiosk may be wheeled
up to users, e.g. non-ambulatory users.
[0137] Having described various embodiments and implementations, it
should be apparent to those skilled in the relevant art that the
foregoing is illustrative only and not limiting, having been
presented by way of example only. For example, in some embodiments,
a user computing device may be a tablet or a smart phone, and a
front facing camera of such a device may be used as the video
capture device described herein. Additionally, the various
computations described herein may be performed by the tablet or
smart phone alone, or in conjunction with the remote server. Many
other schemes for distributing functions among the various
functional elements of the illustrated embodiment are possible. The
functions of any element may be carried out in various ways in
alternative embodiments.
[0138] Also, the functions of several elements may, in alternative
embodiments, be carried out by fewer, or a single, element.
Similarly, in some embodiments, any functional element may perform
fewer, or different, operations than those described with respect
to the illustrated embodiment. Also, functional elements shown as
distinct for purposes of illustration may be incorporated within
other functional elements in a particular implementation. Also, the
sequencing of functions or portions of functions generally may be
altered. Certain functional elements, files, data structures, and
so one may be described in the illustrated embodiments as located
in system memory of a particular computer. In other embodiments,
however, they may be located on, or distributed across, computer
systems or other platforms that are co-located and/or remote from
each other. For example, any one or more of data files or data
structures described as co-located on and "local" to a server or
other computer may be located in a computer system or systems
remote from the server. In addition, it will be understood by those
skilled in the relevant art that control and data flows between and
among functional elements and various data structures may vary in
many ways from the control and data flows described above or in
documents incorporated by reference herein. More particularly,
intermediary functional elements may direct control or data flows,
and the functions of various elements may be combined, divided, or
otherwise rearranged to allow parallel processing or for other
reasons. Also, intermediate data structures of files may be used
and various described data structures of files may be combined or
otherwise arranged.
[0139] Further embodiments can be envisioned to one of ordinary
skill in the art after reading this disclosure. For example, some
embodiments may be embodied as a turn-key type system such as a
laptop or kiosk with executable software resident thereon. The
software is executed by the processor of the laptop and provides
some, if not all, of the functionality described above in FIGS.
1A-1E, such as: calibration of the web camera to the subject's face
and/or eyes, determination of a gaze model, outputting of the
familiar and test images at the appropriate times, determining the
gaze of the subject for familiar and test images with respect to
time during the test using the gaze model, and the like. The
subject test data may be stored locally on the laptop and/or be
uploaded to a remote server.
[0140] In various embodiments, features, other than just the gaze
position for the user may be utilized. For example, facial
expressions (e.g. eyebrows, lip position, etc.) as well as hand
placements and gestures of a user may also be considered (e.g.
surprise, puzzled, anger, bewilderment, etc.) when determining
whether the cognitive performance of a user. In other embodiments,
additional eye-related factors may also be detected and used, such
as: blink rate of the user, pupil dilation, pupil responsiveness
(e.g. how quickly the pupil dilates in response to a flash on the
display), saccadic movement, velocity, and the like.
[0141] In still other embodiments, the method performs a treatment
or further analysis using information from the analysis and/or
diagnostic methods described above. In an example, the treatment or
further analysis can include, an MRI scan, CAT scan, x-ray
analysis, PET scans, a spinal tap (cerebral spinal fluid) test
(amyloid plaque, tau protein), a beta-amyloid test, an MRT blood
test, and others. In order to initiate any of the analysis and/or
diagnostic method includes using the information to open a lock or
interlock to initiate the analysis and/or diagnostic method. In an
example, treatment can include automated or manual administration
of a drug or therapy.
[0142] As an example discussed above, in kiosk embodiments, the
treatment includes using the user's cognitive performance
information to access the drug, which is under a lock or secured
container, and to dispense the drug. In another example, a PET/MRI
scans are provided for an amyloid plaque. In an example, the
treatment can include a spinal tap for cerebral spinal fluid to
measure amyloid plague and tau (protein believed to be involved in
Alzheimer's disease). Of course, there can be other variations,
modifications, and alternatives. In yet another example, treatment
can include a physician that places the patient on an Alzheimer's
drug such as Namenda.TM., Exelon.TM., among others. In an example,
the patient can be treated using wearable devices such as a
Fitbit.TM. to track exercise, movement, and sleep activity.
[0143] In an example, the method provides results that are
preferably stored and secured in a privileged and confidential. In
an example, the results and/or information is secured, and subject
to disclosure only by unlocking a file associated with the
information. In an example, a physician or health care expert can
access the results after the security is removed.
[0144] In an example, the image can also be configured to capture
another facial element. The facial element can include a mouth,
nose, cheeks, eye brows, ears, or other feature, or any relations
to each of these features, which can be moving or in a certain
shape and/or place, to identify other feature elements associated
with an expression or other indication of the user. Of course,
there can be other variations, modifications, and alternatives.
[0145] In an example, the image can also be configured to capture
another element of known shape and size as a reference point. In an
example, the element can be a fixed hardware element, a piece of
paper, or code, or other object, which is fixed and tangible. As an
example, a doctor, pharmaceutical, or the like may provide the user
with a business card or other tangible item, that has a unique QR
code imprinted thereon. In various embodiments, the user may
display the QR code to the camera, for example in FIG. 1A, step
120. In other embodiments, the remote server may use the QR code to
determine a specific version of the cognitive test described
therein and provide specific prizes, gifts, and information to the
user. As an example, Pharmaceutical A may have a 6 minute visual
test, based upon colored images, whereas Researchers B may have a 5
minute visual test, based upon black and white images, etc. It
should be understood that many other adjustments may be made to the
process described above, and these different processes may be
implemented by a common remote server. Of course, there can be
other variations, modifications, and alternatives.
[0146] In other examples, the present technique can be performed
multiple times. In an example, the multiple times can be performed
to create a baseline score. Once the base line score is stored, and
other tests can be performed at other times to reference against
the base line score. In an example, the base line score is stored
into memory on a secured sever or client location. The base line
score can be retrieved by a user, and then processed along with new
test scores to create additional scores. Of course, there can be
other variations, modifications, and alternatives.
[0147] In an alternative example, the present technique can be used
to identify other cognitive diseases or other features of the user,
such as: anxiety, stress, depression, suicidal tendencies,
childhood development, and the like. In an example, the technique
can be provided on a platform for other diseases. The other
diseases can be provided on various modules, which are included. In
still other embodiments, the disclosed techniques may be used as a
platform for other user metrics, e.g. user motion or gait capture
and analysis, user pose or posture analysis. Embodiments may be
located at hospitals and when users take the test and fail to show
sufficient novelty preference, a directory of specific doctors or
departments may become unlocked to them. Users who show sufficient
novelty preference may not have access to such providers.
[0148] In some embodiments, user response to different pictures may
be used for security purposes (e.g. TSA, CIA, FBI, police) or the
like. As an example, during a testing phase, the user may be
displayed a familiar image that is neutral, such as a flower or
stop sign, and be displayed a novel image that illustrates
violence, such as an AK-47 gun, a bomb, a 9-11 related image, or
the like. In some cases, deliberately avoids looking at the novel
image may be considered a security risk. Other embodiments may be
used in motor vehicle department for determining whether older
drivers have sufficient cognitive performance to safely handle a
vehicle.
[0149] In some examples, algorithms may be implemented to determine
whether a user is attempting to fool the system. For example, gaze
analysis maybe used to determine if the user is trying to cover up
a cognitive shortcoming.
[0150] In an example, the present technique can be implemented on a
stand-alone kiosk. In an example, the camera and other hardware
features, can be provided in the kiosk, which is placed
strategically in a designated area. The kiosk can be near a
pharmacy, an activity, or security zone, among others. In an
example, the technique unlocks a dispenser to provide a drug, the
technique unlocks a turnstile or security gate, or the like, after
suitable performance of the technique. Of course, there can be
other variations, modifications, and alternatives.
[0151] In an example, the technique can also be provided with a
flash or other illumination technique into each of the eyes.
[0152] In other embodiments, combinations or sub-combinations of
the above disclosed invention can be advantageously made. The block
diagrams of the architecture and flow charts are grouped for ease
of understanding. However it should be understood that combinations
of blocks, additions of new blocks, re-arrangement of blocks, and
the like are contemplated in alternative embodiments of the present
invention. Further examples of embodiments of the present invention
are provided below.
[0153] FIG. 5 is a simplified flow diagram of a process according
to an embodiment of the present invention. This diagram is merely
an example, which should not unduly limit the scope of the claims
herein. Referring to the Figure, in an example, the present
invention provides a method for identifying a feature of an eye of
a human user. The method includes initiating an image capturing
device, such as a camera or other imaging device. In an example,
the image capturing device comprises a plurality of sensors
arranged in an array. In an example, the method includes capturing
video information from a facial region of the human user using the
image capturing device. In an example, the video information is
from a stream of video comprising a plurality of frames.
[0154] In an example, the image capturing device is provided in a
computer, a mobile device, a smart phone, or other end user
computing device. In an example, the plurality of video frames is
one of a plurality of images. In an example, each of the plurality
of images that have been parsed comprises RGB information
associated with the human user.
[0155] In an example, the method includes processing the video
information to parse the video information, frame by frame, into
the plurality of images. In an example, each of the plurality of
images has a time stamp from a first time stamp, a second time
stamp, to an Nth time stamp, where N is greater than 10, or other
number. Of course, there can be other variations, modifications,
and alternatives.
[0156] In an example, the method includes processing each of images
to identify a location of the facial region and processing each of
the images with the location of the facial region to identify a
plurality of landmarks associated with the facial region. In an
example, the facial region can be identified using a matching or
processing technique. Landmarks can include other facial features,
such as a mouth, cheeks, nose, ears, and other facial features. In
an example, the method includes processing each of the images with
the location of the facial regions and the plurality of landmarks
to isolation a region including each of the eyes. In an example,
the processing identifies the region including the eyes. The method
includes processing each of the regions, frame by frame, to
identify a pupil region for each of the eyes. In an example, the
region is configured as a rectangular region having an x-axis and a
y-axis to border each of the eyes of the human user. In an example,
the spatial location of the pupil is desirable for gaze
analysis.
[0157] In an example, the processing comprises a variety of steps.
In an example, processing includes processing the region using a
grayscale conversion to output a grayscale image; processing the
grayscale image using an equalization process to output an
equalized image. In an example, the equalized image uses a
histogram equalization, which is a method in image processing of
contrast adjustment using the image's histogram. The processing
also includes processing the equalized image using a thresholding
process to output a thresholded image; processing the thresholded
image using a dilation and erosion (e.g., stripping away border
pixels) process to output a dilated (e.g., adding border pixels)
and eroded image; and processing the dilated and eroded image using
a contour and moment process to output a finalized processed image.
Of course, there can be other variations, modifications, and
alternatives.
[0158] In an example, the method includes processing the finalized
processed image to identify (using a spatial coordinate system) a
spatial location of each pupil in the region, each pupil being
identified by a two-dimensional spatial coordinate. The method
includes processing information associated with each pupil
identified by the two-dimensional spatial coordinate to output a
plurality of two-dimensional spatial coordinates, each of which is
in reference to a time, in a two-dimensional space. That is, each
two dimensional spatial coordinate in reference to time, in a two
dimensional space represent a location of the pupils. The plurality
of the two dimensional coordinates each of which is in reference to
time represents gaze information. The method then includes
outputting a gaze information about the human user. The gaze
information includes the two dimensional spatial coordinates each
of which is in reference to a time in a two dimensional space. In
an example, each of the steps can be stored in temporary or
permanent memory, and can be accessed from time to time. In an
example, the processing device coordinates the processing in
conjunction with an image processing device or other processor. Of
course, there can be other variations, modifications, and
alternatives.
[0159] In an example, the method further includes associating the
gaze information with one or more cognitive assessment constructs.
In an example, each of the cognitive assessment constructs being
stored in memory of a computing device.
[0160] In an example, the method further includes transferring the
video information through a network to a server device, the server
device being coupled to the network.
[0161] In an example, the processing the region using the grayscale
conversion retrieved from a library provided in memory to output a
grayscale image; wherein the processing the grayscale image using
the equalization process retrieved from a library provided in the
memory to output an equalized image; wherein processing the
equalized image using the thresholding process retrieved from a
library provided in the memory to output a thresholded image;
wherein the processing the thresholded image using a dilation and
erosion process retrieved from a library from memory to output a
dilated and eroded image; and wherein the processing the dilated
and eroded image using a contour and movement process retrieved
from a library from the memory to output a finalized processed
image.
[0162] In an example, the method further comprising using the gaze
information to associate the gaze information with a cognitive
learning feature. In an example, the method further comprising
using the gaze information to associate the gaze information with a
cognitive learning feature, the cognitive learning feature being
one of a plurality of learning disorders.
[0163] In an example, the invention includes an alternative method
for identifying a feature of an eye of a human user. The method
includes initiating an image capturing device, the image capturing
device comprising a plurality of sensors arranged in an array. The
method includes capturing information from a facial region of the
human user using the image capturing device, the information
comprising a plurality of frames. The method includes processing
the information to parse the information into the plurality of
images, each of the plurality of images having a time stamp from a
first time stamp, a second time stamp, to an Nth time stamp, where
N is greater than 10.
[0164] In an example, the method includes processing each of images
to identify a location of the facial region and processing each of
the images with the location of the facial region to identify a
plurality of landmarks associated with the facial region. The
method includes processing each of the images with the location of
the facial regions and the plurality of landmarks to isolation a
region including each of the eyes and processing each of the
regions, frame by frame, to identify a pupil region for each of the
eyes.
[0165] In an example, the processing comprises at least: processing
the region using a grayscale conversion to output a grayscale
image; processing the grayscale image using an equalization process
to output an equalized image; processing the equalized image using
a thresholding process to output a thresholded image; processing
the thresholded image using a dilation and erosion process to
output a dilated and eroded image; and processing the dilated and
eroded image using a contour and moment process to output a
finalized processed image.
[0166] In an example, the method includes processing the finalized
processed image to identify a spatial location of each pupil in the
region, each pupil being identified by a two-dimensional spatial
coordinate and processing information associated with each pupil
identified by the two d-dimensional spatial coordinate to output a
plurality of two-dimensional spatial coordinates, each of which is
in reference to a time, in a two-dimensional space. The method
includes outputting a gaze information about the human user. In an
example, the gaze information includes the plurality of two
dimensional spatial coordinates, each of which is in reference to a
time, in a two dimensional space.
[0167] FIG. 6 is a simplified flow diagram of a process according
to an alternative embodiment of the present invention. In an
example, the process is provided to process information associated
with a gaze of a subject or human user. In an example, once the
video has been captured, the process exams the video. The process
forms a trial video, and parses out trial frames, as shown. In an
example, the process identifies pupil position for the gaze
process. The pupil positions are plotted, and filtered to remove
outliers. Each of the pupil positions are provided in a spatial
coordinate with a selected range. The process scores the trial and
then the exam. Of course, there can be other variations,
modifications, and alternatives.
[0168] In an example referring to FIG. 21, the present technique
provides an apparatus for identifying a feature of an eye of a
human user. In apparatus has a processing device, a memory device
coupled to the processing device, and an image capturing device
coupled to the processing device. In an example, the image
capturing device comprises a plurality of sensors arranged in an
array. In an example, the image capturing device is initiated by a
processing device and is configured to capture information from a
facial region of the human user using the image capturing device,
the information comprising a plurality of frames.
[0169] In an example, the apparatus has an image processing module
coupled to the processing device. In an example, the image
processing module or device, also known as an image processing
engine, image processing unit (IPU), or image signal processor
(ISP), is a type of media processor or specialized digital signal
processor (DSP) used for image processing, in digital cameras or
other devices. In an example, image processors often employ
parallel computing to increase speed and efficiency. The digital
image processing engine can perform a range of tasks. To increase
the system integration on embedded devices, often it is a system on
a chip with multi-core processor architecture. In an example, the
image processing module configured to: process the information to
parse the information into the plurality of images, each of the
plurality of images having a time stamp from a first time stamp, a
second time stamp, to an Nth time stamp, where N is greater than 10
or other time frame; processing each of images to identify a
location of the facial region; process each of the images with the
location of the facial region to identify a plurality of landmarks
associated with the facial region; process each of the images with
the location of the facial regions and the plurality of landmarks
to isolation a region including each of the eyes.
[0170] In an example, the image processing module is configured to
process each of the regions, frame by frame, to identify a pupil
region for each of the eyes, the process of each of the regions,
frame by frame, to identify the pupil region for each of the eyes
comprising at least: processing the region using a grayscale
conversion to output a grayscale image; processing the grayscale
image using an equalization process to output an equalized image;
processing the equalized image using a thresholding process to
output a thresholded image; processing the thresholded image using
a dilation and erosion process to output a dilated and eroded
image; and processing the dilated and eroded image using a contour
and moment process to output a finalized processed image.
[0171] In an example, the module is further configured to process
the finalized processed image to identify a spatial location of
each pupil in the region, each pupil being identified by a
two-dimensional spatial coordinate; and process information
associated with each pupil identified by the two d-dimensional
spatial coordinate to generate a plurality of two-dimensional
spatial coordinates, each of which is in reference to a time, in a
two-dimensional space.
[0172] In an example, the apparatus has an output handler to output
a gaze information associated the plurality of two-dimensional
spatial coordiates, each of which is in reference to the time, in
the two dimensional space, about the human user. In an example, the
gaze information is stored in the memory device. Of course, there
can be other variations, modifications, and alternatives.
[0173] In an example, the present technique includes a method for
processing information from a user using paired learning and
testing of objects. In an example, the objects can be selected from
a group consisting of a paired symbol digit comparison, a paired
arithmetic, a paired line orientation, a paired line length, a
paired feature binding, or a paired price comparison, or
others.
[0174] In an example, in a learning phase, the method includes
displaying a pair of learning objects on a display coupled to a
computing device, and then repeating a learning process of
displaying a pair of other learning objects numbered from 2 to N on
the display of the computing device, where N is an integer from 6
to 16, although there can be variations in the integer.
[0175] In an example, in a testing phase, the method includes
displaying a pair of testing objects on the display of the
computing device, and then repeating a testing process of
displaying a pair of other testing objects numbered from 2 to M,
where M is an integer from 6 to 16, each pair of testing objects
being categorized as either (1) equal to a pair of learning
objects, (2) different from a pair of learning objects where each
of the learning objects is different from each testing word or one
of the learning objects is the same as one of the testing objects
and the other learning word is different form the testing word in
the pair.
[0176] In an example, for each pair of objects displayed in the
learning phase and the testing phase; the method includes capturing
eye tracking information, including gaze information, from a user;
capturing an input response information from the user for each pair
of objects in the testing phase; and storing the eye tracking
information and the input response information in a memory device
coupled to the computing device.
[0177] In an example, the method further comprising processing the
eye tracking information and the input response information during
the learning phase to form a base line composite; and processing
the eye tracking information and the input response information
during the testing phase to form a test phase composite. In an
example, the method includes processing the base line composite and
the test phase composite to output a resulting composite. In an
example, the resulting composite is associated with a
characteristic of the user, the characteristic being selected from
an emotional state, a mental state, and a state of depression, a
state of anxiety, a state of fatigue, and a state of
confidence.
[0178] In an example, the capturing of the eye tracking information
comprises: initiating an image capturing device, the image
capturing device comprising a plurality of sensors arranged in an
array; capturing information from a facial region of the user using
the image capturing device, the information comprising a plurality
of frames; processing the information to parse the information into
the plurality of images, each of the plurality of images having a
time stamp from a first time stamp, a second time stamp, to an Nth
time stamp, where N is greater than 10, although there be others;
processing each of images to identify a location of the facial
region; processing each of the images with the location of the
facial region to identify a plurality of landmarks associated with
the facial region; processing each of the images with the location
of the facial regions and the plurality of landmarks to isolation a
region including each of the eyes; processing each of the regions,
frame by frame, to identify a pupil region for each of the eyes,
the processing comprising at least: processing the region using a
grayscale conversion to output a grayscale image; processing the
grayscale image using an equalization process to output an
equalized image; processing the equalized image using a
thresholding process to output a thresholded image; processing the
thresholded image using a dilation and erosion process to output a
dilated and eroded image; and processing the dilated and eroded
image using a contour and moment process to output a finalized
processed image; processing the finalized processed image to
identify a spatial location of each pupil in the region, each pupil
being identified by a two-dimensional spatial coordinate;
processing information associated with each pupil identified by the
two d-dimensional spatial coordinate to output a plurality of
two-dimensional spatial coordinates, each of which is in reference
to a time, in a two-dimensional space; and outputting a gaze
information about the user.
[0179] In an example, the region is configured as a rectangular
region having an x-axis and a y-axis to border each of the eyes of
the human user. In an example, the method further comprising
associating the gaze information with one or more cognitive
assessment constructs, each of the cognitive assessment constructs
being stored in memory of the computing device. In an example, the
image capturing device is provided in a computer, a mobile device,
a smart phone, or other end user computing device. In an example,
the plurality of video frames is one of a plurality of images. In
an example, each of the plurality of images that have been parsed
comprises RGB information associated with the human user. In an
example, the method further comprising transferring the video
information through a network to a server device, the server device
being coupled to the network. Further details of the paired images
are shown in the Figures described below.
[0180] FIGS. 22 through 27 are illustrations of screen shots of
paired objects for gaze tracking according to an embodiment of the
present invention. The specification and drawings are, accordingly,
to be regarded in an illustrative rather than a restrictive sense.
It will, however, be evident that various modifications and changes
may be made thereunto without departing from the broader spirit and
scope of the invention as set forth in the claims.
* * * * *