U.S. patent application number 17/164745 was filed with the patent office on 2022-08-04 for systems and methods for spo2 classification using smartphones.
This patent application is currently assigned to University of Washington. The applicant listed for this patent is University of Washington. Invention is credited to Jason Hoffman, Shwetak N. Patel, Varun Viswanath, Edward Wang.
Application Number | 20220240861 17/164745 |
Document ID | / |
Family ID | |
Filed Date | 2022-08-04 |
United States Patent
Application |
20220240861 |
Kind Code |
A1 |
Hoffman; Jason ; et
al. |
August 4, 2022 |
SYSTEMS AND METHODS FOR SPO2 CLASSIFICATION USING SMARTPHONES
Abstract
Examples of systems and methods for classifying SpO2 levels
using smartphones are described. A wideband light source (e.g., a
flash) may be used to illuminate a finger. A wideband imaging
sensor (e.g., a camera) may be used to capture images of the
illuminated finger. The smartphone may apply per-color channel gain
adjustments to the captured images. The adjusted pixel data may be
used as the basis of input to a classifier (e.g., a deep learning
model). The classifier may be trained on ground truth data, such as
from an induced hypoxia study. The classifier may output an SpO2
level blood in the finger.
Inventors: |
Hoffman; Jason; (Seattle,
WA) ; Patel; Shwetak N.; (Seattle, WA) ; Wang;
Edward; (Seattle, WA) ; Viswanath; Varun;
(Seattle, WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
University of Washington |
Seattle |
WA |
US |
|
|
Assignee: |
University of Washington
Seattle
WA
The Regents of the University of California
Oakland
CA
|
Appl. No.: |
17/164745 |
Filed: |
February 1, 2021 |
International
Class: |
A61B 5/00 20060101
A61B005/00; H04M 1/21 20060101 H04M001/21; H04N 5/225 20060101
H04N005/225; H04N 9/68 20060101 H04N009/68; G06T 7/00 20060101
G06T007/00; G06T 7/90 20060101 G06T007/90; A61B 5/1455 20060101
A61B005/1455; A61B 5/103 20060101 A61B005/103 |
Claims
1. A method comprising: illuminating a finger with a wideband light
source including wavelengths in multiple color channels; imaging
the finger with a wideband imaging sensor to obtain pixel data;
adjusting the pixel data using per-channel color gain values
configured to maintain data in the multiple color channels within a
digitization threshold, to provide adjusted pixel data; and
classifying the adjusted pixel data using a deep learning model to
predict an SpO.sub.2 level of blood in the finger.
2. The method of claim 1, wherein said illuminating comprises using
a flash of a smartphone, and said imaging comprises using a camera
of a smartphone, and wherein said per-channel color gain values are
particular to a model of the smartphone.
3. The method of claim 1, wherein the multiple color channels
comprise a red channel, a green channel, and a blue channel, and
wherein the color gain values are different for each of the red
channel, the green channel, and the blue channel.
4. The method of claim 3, wherein the color gain values are larger
for the blue channel and the green channel than for the red
channel.
5. The method of claim 3, wherein the color gain values are
selected based on empirical study data.
6. The method of claim 1, wherein the color gain values are
selected based on feedback from signals generated using initial
color gain values.
7. The method of claim 1, wherein the deep learning model is
trained using data from an induced hypoxemia study.
8. The method of claim 1, wherein said classifying is configured to
predict an SpO.sub.2 level lower than 85 percent.
9. A non-transitory computer readable media encoded with
instructions which, when executed by a processor cause a system to
perform actions comprising: gain-adjust pixel data received from a
smartphone camera, the pixel data corresponding to an illuminated
finger, such that multiple color channels in the pixel data are not
clipped; provide the gain-adjusted pixel data to a deep learning
model; and predict an SpO.sub.2 level of blood in the finger using
the deep learning model.
10. The non-transitory computer readable media of claim 9, wherein
the deep learning model is trained using data from an induced
hypoxia study.
11. The non-transitory computer readable media of claim 9, wherein
said gain-adjust pixel data comprises applying a gain adjustment
particular to a model of the smartphone.
12. The non-transitory computer readable media of claim 9, wherein
the predict an SpO.sub.2 level is accurate below 85 percent
SpO.sub.2.
13. The non-transitory computer readable media of claim 9, wherein
the multiple color channels comprise a red channel, a green
channel, and a blue channel, and wherein the green channel and the
blue channel are adjusted more than the red channel.
14. The non-transitory computer readable media of claim 9, wherein
the pixel data corresponds to the finger illuminated using a flash
of the smartphone.
15. A smartphone comprising: a flash; a camera; a processor; memory
encoded with executable instructions which, when executed by the
processor cause the smartphone to: illuminate a finger with the
flash; capture pixel data with the camera; gain-adjust the pixel
data in accordance with a model of the smartphone; and predict an
SpO.sub.2 level of blood in the finger based on the gain-adjusted
pixel data using a deep learning model.
16. The smartphone of claim 15, wherein the deep learning model is
trained using data from an induced hypoxia study.
17. The smartphone of claim 15, wherein the flash comprises a
wideband light source.
18. The smartphone of claim 15, wherein the camera comprises a
wideband imaging sensor.
19. The smartphone of claim 15, wherein the deep learning model is
configured to predict the SpO.sub.2 level below 85 percent.
20. The smartphone of claim 15, wherein the executable instructions
further cause the smartphone to gain-adjust green and blue channels
of the pixel data more than a red channel of the pixel data.
Description
TECHNICAL FIELD
[0001] Examples described herein relate generally to measurement of
SpO2 levels. Examples of SpO2 measurement using a smartphone camera
and flash and a machine learning model are described.
BACKGROUND
[0002] Blood-oxygen saturation, reported as Sp02 percentage (e.g.,
SpO2 level), is the clinical measure that informs a physician of
the ability of the body to distribute oxygen by revealing the
proportion of hemoglobin in the blood currently carrying oxygen.
While a healthy SpO2 level is different for each individual,
everybody needs an adequate supply of oxygen in their tissues.
Respiratory illnesses such as asthma, Chronic Obstructive Pulmonary
Disease (COPD), and COVID-19 can cause significant declines in
SpO2, recurrent hypoxemia, and subsequent hypoxia, and serious
health complications, such as organ damage, brain damage, and
death, can occur if SpO2 stays low for an extended period of time.
Recently, in COVID-19 patients, in-hospital mortality rate has been
shown to increase when a patient's SpO2 level cannot be maintained
above 90%, a level that has also been used in primary care to
indicate the need to consult a physician for further care. Frequent
measurements of SpO2 can allow for identification of the severity
of asthma and COPD, predict mortality amongst COVID-19 patients,
and detect presence of other illnesses including Idiopathic
Pulmonary Fibrosis, Congestive Heart Failure, Diabetic
Ketoacidosis, and pulmonary embolism.
[0003] Pulse oximetry for monitoring blood oxygen saturation may be
performed through a variety of techniques, including direct
arterial blood analysis and purpose-built devices for the detection
of specific wavelengths of light. Perhaps the `gold standard` for
measuring oxygen saturation is the Arterial Blood Gas analysis
device, which takes a blood sample to measure the amounts of
oxygenated and deoxygenated hemoglobin. As this technique is too
invasive and expensive for most use cases, clinics primarily rely
on optical pulse oximeters, which take noninvasive readings of
SpO2.
[0004] Clinical pulse oximeters typically perform oxygenation
measurement via transmittance photoplethysmography (PPG) sensing at
the finger tip, clamping around the end of the finger a finger clip
device, which measures the light absorption properties through the
tissue of the finger to infer blood composition. The clip includes
a light source and photodiode sensors on opposite sides of the
finger to measure and calculate the light absorption of the
pulsatile blood in the finger. The same measurement has been
demonstrated, and made available for clinical use, on the toes, ear
lobe, and forehead. The measurement at the forehead differs from
the others in that it performs reflectance measurements, in which
the emitter and receiver are on the same side of the device,
relying on the reflectance of some portion of the light from
different layers of the tissues, such as the walls of blood
vessels. These monitors are used to assess and monitor patients in
clinical checkups, in-clinic patient monitoring, and monitoring
during surgery.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is a schematic illustration of a smartphone arranged
in accordance with examples described herein;
[0006] FIG. 2 is a schematic illustration of a smartphone in use
during reflectance photoplethysmography (PPG);
[0007] FIG. 3 depicts visualizations of color data from a device
with standard color gain settings and using custom settings in
accordance with examples described herein.
[0008] FIG. 4 is a schematic illustration of data processing
arranged in accordance with examples described herein.
[0009] FIG. 5 is a schematic illustration of classification
techniques arranged in accordance with examples described
herein.
[0010] FIG. 6 is a schematic illustration of the training and
operation of a classifier in accordance with examples described
herein.
DETAILED DESCRIPTION
[0011] Hypoxemia, a medical condition that occurs when the blood is
not carrying enough oxygen to adequately supply the tissues, is a
leading indicator for dangerous complications of respiratory
diseases like asthma, COPD, and COVID-19. While purpose-built pulse
oximeters can provide accurate SpO2 readings that allow for
diagnosis of hypoxemia, enabling this SpO2 sensing capability in
smartphone cameras could give more people access to important
information about their health, as well as improve their
physicians' ability to remotely diagnose and treat respiratory
conditions. Herein are described examples of smartphone-based SpO2
sensing system which may use a varied inspired fractional oxygen
(FiO2) protocol, creating a clinically relevant validation dataset
for smartphone-based methods on a large range of SpO2 values (e.g.,
65%-100%). Previous systems were generally only evaluated on
smaller ranges (e.g., 85%-100%). Examples of deep learning models
are described which may be built (e.g., trained) using this data to
demonstrate accurate reporting of SpO2 level with an overall
MAE<5.0% and identifying positive cases of low SpO2 with >93%
recall rate in some implemented examples.
[0012] Monitoring SpO2 with a smartphone, particularly an
unmodified smartphone, if provided in an accurate and unobtrusive
manner, may improve health outcomes for those with respiratory
illnesses by providing access to rapid risk assessment outside the
clinic. Smartphone-based SpO2 monitors may offer the ubiquity and
precision necessary to increase access to detection and treatment
of respiratory diseases. Examples described herein provide
smartphone-based SpO2 monitors which may operate on a full range of
clinically-relevant SpO2 values, such as from 65%-100%.
[0013] Currently, clinicians measure SpO2 levels using FDA-cleared,
purpose-built devices called pulse oximeters during regular clinic
visits, which allows them to asses a patient's condition and
evaluate how that condition has changed since a prior visit. While
purpose-built pulse oximeters are accurate, non-invasive, and
robust across skin colors and SpO2 levels, they possess undesirable
characteristics that inhibit use outside the clinic. Users need to
(1) purchase the device and (2) have the device with them whenever
they need monitoring. These factors reduce the accessibility of
more frequent and widespread SpO2 measurements, as patients can
forget their devices, fail to charge them, or misplace them, if
they can even afford them in the first place. These observations
reveal a significant gap in respiratory monitoring, in which
sudden, undetected, and dangerous deterioration can occur. This
possibility has become more clear in the context of the COVID-19
pandemic, during which it has been shown that hypoxemia can be
present in potentially dangerous but otherwise asymptomatic
patients.
[0014] Smartphone-based SpO2 monitors present the opportunity to
detect and monitor respiratory conditions in contexts where pulse
oximeters may be inaccessible. Smartphones are widely owned because
of their multi-purpose utility, and contain increasingly powerful
sensors, including a camera with a LED flash. Some existing
smartphone-based SpO2 sensing only present proof-of-concept studies
which may produce data in a limited range of 85%-100% SpO2 through
techniques like breath-holding. Lower SpO2 percentages may be more
difficult to measure using commodity hardware and also more
expensive to collect. Nonetheless, to realize ubiquitous SpO2
sensing with smartphones, it may be desirable for the
smartphone-based SpO2 monitors to meet the standards for data
breadth to which current pulse oximeters are held.
[0015] Examples described herein include smartphone-based SpO2
monitoring system which may be designed, built, and validated with
a balanced dataset that covers a clinically relevant breadth of
SpO2 validation. Data used for validation was collected by
delivering controlled medical grade oxygen-nitrogen mixtures of
varied Fractional Inspired Oxygen (FiO2) levels to subjects while
they were monitored by both an example smartphone device as
described herein and a traditional pulse oximeter. This type of
study and dataset allowed for an assessment of techniques described
herein on low SpO2 examples (e.g., below 85%).
[0016] An implemented analysis on 6 subjects revealed that a
convolutional neural network was a able to achieve MAE<5.0 on
predicting a new subject's SpO2 level, after it had been trained on
5 other subjects' labeled data, and an average precision to recall
tradeoff of 76.0% to 93.2% on classifying a new subject's SpO2 as
below 90%.
[0017] Certain details are set forth herein to provide an
understanding of described embodiments of technology. However,
other examples may be practiced without various of these particular
details. In some instances, well-known circuits, control signals,
timing protocols, and/or software operations have not been shown in
detail in order to avoid unnecessarily obscuring the described
embodiments. Other embodiments may be utilized, and other changes
may be made, without departing from the spirit or scope of the
subject matter presented here.
[0018] From the foregoing it will be appreciated that, although
specific embodiments have been described herein for purposes of
illustration, various modifications may be made while remaining
with the scope of the claimed technology.
[0019] Examples described herein may refer to various components as
"coupled" or signals as being "provided to" or "received from"
certain components. It is to be understood that in some examples
the components are directly coupled one to another, while in other
examples the components are coupled with intervening components
disposed between them. Similarly, signal may be provided directly
to and/or received directly from the recited components without
intervening components, but also may be provided to and/or received
from the certain components through intervening components.
[0020] FIG. 1 is a schematic illustration of a smartphone arranged
in accordance with examples described herein. The smartphone 102
includes wideband light source 104 and wideband imaging sensor 106.
The smartphone 102 includes display 118, circuitry for color
channel gain 112, processor(s) 108, and memory 110. The memory 110
includes executable instructions for setting color gain values 114
and executable instructions for classification 116, which may
include deep learning model 120. Additional, fewer, and/or
different components may be present in other examples. For example,
the smartphone 102 may include one or more communication
interface(s), networking interface(s), additional memory and/or
electronic storage, and/or additional software. The processor(s)
108 may execute instructions stored in memory 110 and/or in other
computer readable media accessible to the smartphone 102 and/or
processor(s) 108 to perform the setting of color gain values for
the smartphone 102 and/or classification of SpO2 levels in blood of
users.
[0021] Examples of systems described herein may accordingly include
smartphones. Smartphone 102 is shown in FIG. 1. Generally, a
smartphone may include any consumer electronic device in
communication with a wideband light source and/or wideband imaging
sensor as described herein and with one or more processors and/or
communication interfaces to conduct the classification described
herein to predict an SpO2 level of blood as described herein. A
smartphone may or may not have cellular phone capability, which
capability may be active or inactive. While smartphones are
described, examples of techniques described herein may be
implemented in some examples using other electronic devices such
as, but not limited to, tablets, laptops, computers, appliances, or
vehicles. Generally, any device having a light source, imaging
sensor, and processor(s) may be used.
[0022] Smartphones described herein may come in a variety of
models. A model of a smartphone may generally refer to a particular
set of hardware components (e.g., flash, camera, processor, etc.)
and/or software components (e.g., operating system) used to
implement the smartphone. The particulars of these hardware
components may vary across smartphone models. Smartphones may also
have a make, which may in some examples be included in the model.
Examples of models include iPhone 11, iPhone 10, Galaxy S20, Galaxy
Note 20, Google Nexus 6P, etc. Other models may also be used. The
make of a smartphone may refer to the brand of smartphone (e.g.,
Samsung, Apple, Nokia, Sony).
[0023] Smartphones described herein may include one or more
wideband light sources, such as wideband light source 104 of FIG.
1. For example, the wideband light source 104 may be implemented
using a flash of the smartphone 102. The wideband light source 104
may be implemented using, for example, one or more light emitting
diodes (LEDs). Generally, a wideband light source may emit energy
over multiple color wavelengths (e.g., red, green, and blue). These
may be referred to as color channels. The wideband light source 104
may be used, e.g., under the control of processor(s) 108 in some
examples, to illuminate a finger (including a portion of a finger).
The finger, such as a fingertip, may be placed in contact with the
wideband light source 104 for illumination in some examples.
[0024] Smartphones described herein may include one or more
wideband imaging sensors, such as wideband imaging sensor 106. The
wideband imaging sensor 106 may be implemented using a camera of
the smartphone 102. The wideband imaging sensor 106 may generally
refer to a sensor which may be sensitive to incident energy over
multiple color wavelengths (e.g., red, green, and blue), which may
also be referred to as color channels. The wideband imaging sensor
106 may be used, e.g., under the control of processor(s) 108 in
some examples, to capture pixel data. In some examples, pixel data
of an illuminated finger (e.g., a fingertip) may be captured by the
wideband imaging sensor 106.
[0025] The wideband light source 104 and wideband imaging sensor
106 may be positioned in a variety of locations in or on the
smartphone 102. In some examples, the wideband light source 104
and/or wideband imaging sensor 106 may not be integral to the
smartphone 102 but may be in electronic communication with the
smartphone 102. In some examples, the wideband light source 104
and/or wideband imaging sensor 106 may be integral to the
smartphone 102. In some examples, the wideband light source 104
and/or wideband imaging sensor 106 may be positioned on a front of
the smartphone 102, a back of the smartphone 102, and/or along an
edge of the smartphone 102. In some examples, the wideband light
source 104 and wideband imaging sensor 106 are positioned proximate
one another. For example, the wideband light source 104 and
wideband imaging sensor 106 may be positioned such that a finger of
a user may contact both the wideband light source 104 and wideband
imaging sensor 106.
[0026] In some examples, in addition to wideband light source 104
and wideband imaging sensor 106, or incorporated in those
components, an infrared (IR) source and sensor may be used. The IR
proximity sensor, for example, may provide an avenue to measure the
blood's IR absorption property simultaneously. Some example
smartphones may include an infrared based focus system that uses a
proximity sensor that houses a IR LED and IR optical sensor. By
using this pair, which is placed next to the camera and flash, when
the finger covers the camera, flash, and IR proximity sensor, the
blood's absorption at R/G/B/IR can be measured simultaneously or in
an overlapping fashion. The IR channel may be used as another color
channel described herein, and a color gain values may be set for
that channel as well.
[0027] Smartphones described herein may include circuitry for color
channel gain 112. The circuitry for color channel gain 112 may
include any of a variety of hardware components which manipulate
data from the wideband imaging sensor 106 in particular color
channels. The circuitry for color channel gain 112 may be coupled
to wideband imaging sensor 106. The circuitry for color channel
gain 112 may include one or more filters, amplifiers,
analog-to-digital converters, and/or logic circuits. The circuitry
for color channel gain 112 may selectively operate on particular
color channels of data (e.g., red, green, blue). Per-channel color
gain values may be specified for each color channel. The circuitry
for color channel gain 112 may, for example, include an amplifier
associated with each color channel, and that amplifier may have a
particular gain value. In some examples, the color gain values may
be larger for the blue channel and the green channel than for the
red channel. In some examples, the circuitry for color channel gain
112 may apply gain which may vary across particular color
channels--e.g., a gain value may be specified for a red channel, a
gain value may be separately specified for a blue channel, and a
gain value may be separately specified for a green channel. The
gain value may be different for each channel in some examples,
while in some examples one or more channels may share a same gain
value. In some examples, the circuitry for color channel gain 112
may gain-adjust green and blue channels of the pixel data more than
a red channel of the pixel data. For example the circuitry for
color channel gain 112 may provide more gain-adjustment to the
green and blue channels than to the red channel. In one example,
the gain value may be 1.times. for the red channel, 3.times. for
the green channel, and 18.times. for the blue channel. Other values
may also be used. The gain values in some examples may be selected
based on the model of the smartphone. For example, particular gain
levels may be used based on the hardware and/or software components
included in a particular model of smartphone. The gain values in
some examples may be selected based on empirical study data.
Examples described herein generally provide for setting the
per-channel color gain values such that multiple color channels in
the pixel data are not clipped--e.g., variation in each of the
multiple color channels is available for use in classifying the
adjusted pixel data into a predicted SpO2 level.
[0028] In some examples, certain parameters of the circuitry for
color channel gain 112 may be set in hardware, software (e.g., in
accordance with executable instructions for setting color gain
values 114), or combinations thereof. The gain of each amplifier
may in some examples be set using a software interface. Generally,
in examples described herein, the circuitry for color channel gain
112 may adjust pixel data received from the wideband imaging sensor
106 using per-channel color gain values that may be selected to
maintain data in the multiple color channels within a digitization
threshold. The circuitry for color channel gain 112 may output
adjusted pixel data, which may be used by classification software
described herein.
[0029] Smartphones described herein may include one or more
processors, such as processor(s) 108 of FIG. 1. Any number or kind
of processing circuitry may be used to implement processor(s) 108
such as, but not limited to, one or more central computing units
(CPUs), graphical processing units (GPUs), logic circuitry, field
programmable gate arrays (FPGAs), application specific integrated
circuits (ASICs), controllers, or microcontrollers. While certain
activities described herein may be described as performed by the
processor(s) 108 it is to be understood that in some examples, the
activities may wholly or partially be performed by one or more
other processor(s) which may be in communication with processor(s)
108. That is, the distribution of computing resources may be quite
flexible and the smartphone 102 may be in communication with one or
more other computing devices, continuously or intermittently, which
may perform some or all of the processing operations described
herein in some examples.
[0030] Smartphones described herein may include memory, such as
memory 110 of FIG. 1. While memory 110 is depicted as integral with
smartphone 102, in some examples, the memory 110 may be external to
smartphone 102 and may be in communication with processor(s) 108
and/or other processors in communication with processor(s) 108.
While a single memory 110 is shown in FIG. 1, generally any number
of memories may be present and/or used in examples described
herein. Examples of memory which may be used include read only
memory (ROM), random access memory (RAM), solid state drives,
and/or SD cards.
[0031] Smartphones described herein may operate in accordance with
software (e.g., executable instructions stored on one or more
computer readable media, such as memory, and executed by one or
more processors). Examples of software may include executable
instructions for setting color gain values 114 of FIG. 1. The
executable instructions for setting color gain values 114 may
provide instructions and/or settings for controlling the circuitry
for color channel gain 112 as described herein. For example, the
executable instructions for setting color gain values 114 may
provide one or more amplifier settings for use in adjusting pixel
data for particular color channels. These may be referred to as
per-channel color gain values.
[0032] Examples of software may include executable instructions for
classification 116 of FIG. 1. The executable instructions for
classification 116 may provide instructions for predicting an SpO2
level of blood using a machine learning model, such as deep
learning model 120 of FIG. 1. Examples described herein may
accordingly provide one or more machine learning models, such as
deep learning model 120 of FIG. 1. Generally, a machine learning
model may refer to a mathematical model which is able to classify
input data into a particular outcome. The mathematical model may in
some examples be represented as a set of weights and/or connections
between nodes in a multi-layered neural network. In some examples,
the machine learning model may have been trained on sample data. In
some examples, the deep learning model 120 may be trained using
data from one or more induced hypoxia studies. The deep learning
model 120 may be trained prior to or after being stored in memory
110. In some examples, training of deep learning model 120 may be
ongoing during use of smartphone 102. In some examples, the
executable instructions for classification 116 may include
instructions for predicting the SpO2 level below 85 percent in some
examples, below 80 percent in some examples. Generally, the
executable instructions for classification 116 may predict SpO2
levels between 85 and 100 percent in some examples, between 80 and
100 percent in some examples, between 75 and 100 percent in some
examples, between 70 and 100 percent in some examples, or between
65 and 100 percent in some examples. Other ranges may also be used.
In some examples, because the deep learning model 120 had been
trained on data from one or more induced hypoxia studies, a
resulting prediction may be able to be more accurate, particularly
at lower levels, such as below 85 percent. In some examples, the
deep learning model 120 may be particular to the smartphone and/or
model of smartphone used. For example, a model may be trained based
on the response and performance of a particular phone and/or model
for phone. The deep learning model 120 loaded on the smartphone 102
may be selected in accordance with the model of the smartphone 102
and/or the particular hardware present on the smartphone 102.
[0033] During operation, a finger (such as a fingertip) may be
positioned to receive illumination from the wideband light source
104 (e.g., may be placed in contact with the wideband light source
104). The finger may also be positioned to be imaged by the
wideband imaging sensor 106, such as by being placed in contact
with the wideband imaging sensor 106. The smartphone 102 may, in
accordance with the processor(s) 108 executing executable
instructions, illuminate the finger with the wideband light source
104, and capture pixel data with the wideband imaging sensor 106.
The pixel data from the wideband imaging sensor 106 may be provided
to the circuitry for color channel gain 112. The circuitry for
color channel gain 112 may gain-adjust the pixel data to provide
gain-adjusted pixel data. The manner in which the circuitry for
color channel gain 112 may gain-adjust the pixel data may be set,
e.g., using executable instructions for setting color gain values
114, in accordance with a model of the smartphone. The smartphone
102 may utilize the executable instructions for classification 116
including deep learning model 120 to predict an SpO2 level of blood
in the finger based on the gain-adjusted pixel data.
[0034] In some examples, the smartphone 102 may have hardware
and/or software to detect excessive motion of the finger and
provide feedback (e.g., audio, visual, and/or haptic feedback) to
the user to keep still, or keep the finger still and/or discard
high motion segments. For example, motion may be detected by the
processor(s) 108 analyzing pixel data captured by the wideband
imaging sensor 106 for anomalies consistent with a moving finger.
If a moving finger was detected, the display could display a
reminder to remain still and/or reposition the finger, an audio
tone or instruction could be played by the smartphone 102, and/or
the smartphone 102 may vibrate.
[0035] The predicted SpO2 level may be used in a variety of ways.
The SpO2 level may be displayed, e.g., on display 118 of the
smartphone 102. The SpO2 level may be sent to another software
program operating on smartphone 102 and/or to another computing
device from the smartphone 102. SpO2 levels may be monitored using
the smartphone 102 at generally any frequency, including continuous
and/or semi-continuous monitoring. The SpO2 level may be used to
take actions to increase a user's blood oxygen level--for example,
a decision to seek further care, provide supplemental oxygen, or
take medication may be based on the predicted SpO2 level.
[0036] In this manner, examples of smartphones described herein
which may predict SpO2 levels may be used as a complete or partial
replacement for traditional pulse oximeters by regressing a
continuous SpO2 value in some examples.
[0037] Examples of smartphones described herein may be used as an
at-home screening tool to inform the need for a follow-up with a
physician by classifying regression results as below a particular
threshold. For example, the smartphone 102 may generate a predicted
SpO2 level and generate an alert (e.g., a visual, audio, and/or
tactile alert) when the predicted SpO2 level is below a threshold
(e.g., 90 percent in some examples).
[0038] Note that a smartphone camera and flash may be used to
generate data that may be used to predict an SpO2 level in
accordance with techniques described herein. In this manner, an
unmodified smartphone (e.g., a smartphone without special-purpose
attachments or peripherals) may be used to measure SpO2 levels.
[0039] FIG. 2 is a schematic illustration of a smartphone in use
during reflectance photoplethysmography (PPG). FIG. 2 depicts
smartphone 206, having camera 202 and flash 204. A finger 208 may
be placed in contact with camera 202 and flash 204. The components
shown in FIG. 2 are exemplary only. Additional, fewer, and/or
different components may be used in other examples. The smartphone
102 of FIG. 1 may be implemented by and/or used to implement the
smartphone 206 of FIG. 2. The wideband light source 104 of FIG. 1
may be implemented by and/or used to implement the flash 204 of
FIG. 2. The wideband imaging sensor 106 of FIG. 1 may be
implemented by and/or used to implement the camera 202 of FIG.
2.
[0040] The flash 204 may illuminate the finger 208, such as by
illuminating a fingertip or other portion of the finger 208. The
camera 202 may receive incident energy that is reflected and/or
otherwise received from the finger 208 responsive to the
illumination. The incident energy received at the camera 202 may be
provided as an output of the camera 202 as pixel data. Reflectance
PPG techniques may generally be used to obtain SpO2 levels from the
pixel data and/or other measurements of the incident energy.
[0041] In general, the principle behind reflectance
photoplethysmography in a human finger is to evaluate the
attenuation, or reduction in intensity, of light through multiple
layers of tissue and fluid inside the human finger, and record the
pulse-like waveform on a photo-detector. Light is absorbed and
reflected differently by different layers of blood, tissue, and
bone. The pulse-like waveform of recorded light is characterized by
two characteristics: (1) the trough, or DC component, which
represents the intensity of light reflected by the static
components of the finger, and (2) the trough to peak variance, or
AC component, which represents the intensity of light reflected by
the time-varying pulsatile blood components. The pulsatile blood
components are composed of hemoglobin in two forms, oxyhemoglobin
and deoxyhemoglobin, which differ in that oxyhemoglobin is
hemoglobin bound to oxygen molecules. Oxygen saturation is
determined as the ratio of the concentration of oxyhemoglobin to
the concentration of total hemoglobin in the blood, as defined in
Equation 1:
S p .times. O 2 = .rho. O .times. .times. 2 .rho. O .times. .times.
2 + .rho. Hb , Equation .times. .times. 1 ##EQU00001##
[0042] where .rho.O2 is the concentration of oxyhemoglobin and
.rho.Hb is the concentration of deoxyhemoglobin in the blood and is
typically reported as a percentage value.
[0043] In a healthy adult, it is expected that over 92% of the
hemoglobin in arterial blood is carrying oxygen at any given time,
though this threshold can vary with pre-existing conditions. To
compute this ratio non-invasively, light attenuation can be
measured as indicated by the Beer-Lambert Law in Equation 2, which
states that light intensity I.sub.0 diminishes exponentially when
traveling distance d through a medium with a extinction coefficient
at wavelength .lamda..
I.sub.measured=I.sub.0e.sup.-.alpha.[C]d Equation 2
[0044] Because oxyhemoglobin and deoxyhemoglobin have different
extinction coefficients, .alpha., at the red (660 nm) and infrared
(940 nm) wavelengths, the ratio of the variance in the pulsatile
signals at these two wavelengths correlates to oxygen saturation.
The DC components in this ratio is used to normalize for the effect
of the tissue and other static components on the light. The result
from one wavelength may be divided by the other to reveal the
absorption ratio in Equation 3:
SpO 2 = A - B .times. AC RED / DC RED AC IR / DC IR Equation
.times. .times. 3 ##EQU00002##
[0045] where AC.sub.RED and DC.sub.RED refer to the AC and DC
components, respectively, of signal at the red wavelength; and
AC.sub.1R and DC.sub.1R refer to the AC and DC components,
respectively, of signal at the infrared (IR) wavelength. Equation 3
may be used by transmittance pulse oximeters to compute SpO2 after
calibrating for different sensor types with a linear fit, but there
may be challenges of applying it to reflectance
photoplethysmography using a smartphone.
[0046] While finger clip pulse oximeters can apply these principles
in analyzing the relative attenuation of light on dedicated
hardware built to produce and sense narrow-band wavelengths of
light at the red and infrared spectra smartphone-based pulse
oximetry as described herein may analyze reflected light in only
the visible band. In some cases, this may be due to the use of an
infrared filter which may typically be included over the smartphone
camera in common smartphone hardware. The difference in the
extinction coefficients between oxy- and deoxy-hemoglobin in the
blue and green bands is not as differentiable as in the infrared
band. Also, the wideband light source (e.g., LED) and imaging
sensor (e.g., camera) found in smartphones produce and sense light
in the visible spectrum. The noisier and less-desirable signal may
be the trade-off exchanged for the improved ubiquity and
accessibility of a multi-purpose device.
[0047] In order to validate a new pulse oximeter system for
clinical safety, devices should be tested for accuracy in a study
where subjects are given medical grade oxygen-nitrogen mixtures in
different levels of varied Fractional Inspired Oxygen (FiO2). The
test subjects are expected to have a variety of skin tones.
Reflectance and ear-based devices should achieve root mean squared
accuracy of <3.5% while transmittance devices should achieve
<3% RMS.
[0048] Examples described herein utilize data from varied FiO2
experiments which allow for collection of samples in the 65% to 80%
SpO2 range. This is physiologically possible because the test
subject has time for their body to adjust to breathing in less
oxygen at each SpO2 level. At least in part due to this, subject's
body is able to tolerate breathing in an oxygen-nitrogen mixture
near 70% for an extended period of time. In contrast,
breath-holding causes the subjects body to suddenly drop in SpO2
once the subject uses up all the oxygen in the breath that he or
she has been holding. When the subject's SpO2 drops to 90%,
sometimes lower depending on their health, the subject will
physically no longer be able to hold their breath with
light-headedness and discomfort. This leads to relatively few
samples below 90% SpO2 in data collected from breath-holding
experiments.
[0049] FIG. 3 depicts visualizations of color data from a device
with standard color gain settings and using custom settings in
accordance with examples described herein. The left-hand graph,
graph 302, illustrates a visualization of color data from a device
(e.g., a smartphone, such as smartphone 102 of FIG. 1) using
standard color gain settings. The right-hand graph, graph 304,
illustrates a visualization of color data from a device (e.g., a
smartphone, such as smartphone 102 of FIG. 1) using custom hardware
gain settings, such as those specified by circuitry for color
channel gain 112 in accordance with executable instructions for
setting color gain values 114. The graphs illustrate samples (e.g.,
time) on the x-axis and pixel value (e.g., intensity) on the
y-axis. Values for red, green, and blue channels are shown in each
graph. The values shown in the graph refer to values of the pixel
data after adjustment by the circuitry for color channel gain, such
as circuitry for color channel gain 112, described herein.
Accordingly, adjusted pixel data is shown in FIG. 3.
[0050] In the left image, graph 302, the resolution on green
channel is so low that the heartbeat cannot be seen/detected from
the green channel data. In the right image, graph 304 however, the
pulsation is visible in all three channels. That is, the gain
values have been selected such that pulsation is detectible in each
color channel. This generally illustrates how standard settings may
lead to poorer data quality, and examples described herein that set
particular per-channel gain values may improve an ability to
predict SpO2 from the adjusted pixel data.
[0051] Generally, a camera sensor, such as wideband imaging sensor
106 of FIG. 1, may be exposed based on three factors: exposure
time, sensor sensitivity, and aperture. For an RGB camera, all
three color channels have the same exposure time and aperture. Both
oxygenated and deoxygenated hemoglobin have a much higher
absorption coefficient in the blue and green wavelengths than for
the red wavelengths by about two orders of magnitude. Thus, it may
not be possible to measure all three wavelengths simultaneously
under the same exposure. If the hardware sensor's sensitivity to a
particular color is too high or too low, pixel values for that
color may `clip` by recording the minimum or maximum value of 0 or
255. Because phones use an 8-bit precision scheme for storing pixel
data, if the gain is too low for a certain color, the pixels may
all be rounded to 0 and small changes in that color will be lost
and/or be undetectable. In applications described herein, red may
be by far the most dominant color, and with the use of white
balance presets for incandescent light, the tones between blue and
green may be amplified. So, for example, the circuitry for color
channel gain 112 and/or executable instructions for setting color
gain values 114 may utilize white balance presets for the wideband
imaging sensor 106 to adjust a gain of color channels.
[0052] Examples of smartphones may include software which allows
for independent control of each color channel's exposure through
independent amplifier gain settings (e.g., executable instructions
for setting color gain values 114). By having control of
independent amplifier gain settings, the exposure settings may be
balanced to amplify the blue and green channels more significantly.
Different operating systems may allow for a different granularity
in the gain control settings. For example, the Android Camera2 API
provides access to manual setting of sensitivity, exposure, and
individual color gains.
[0053] In this manner, rather than relying on a smartphone's
auto-balancing feature for the camera and/or allowing the phone to
auto-balance itself, exposure parameters may be controlled for SpO2
measurement--e.g., using per-channel color gain values and/or
exposure time and aperture selected for SpO2 measurement.
[0054] FIG. 4 is a schematic illustration of data processing
arranged in accordance with examples described herein. FIG. 4
includes smartphone 402 which may be used to illuminate and record
pixel data from finger 404. FIG. 4 illustrates how pixel data may
be generated and processed prior to classification. Illumination
and recording may provide pixel data 406. The pixel data 406 may be
adjusted using color gain values 408. The adjusted pixel values may
be provided to pre-processor 410. The pre-processor 410 may perform
a variety of pre-processing operations to generate PPG signals 412.
The components and techniques described with reference to FIG. 4
are exemplary, and additional, fewer, and/or pre-processing
manipulations may be performed in other examples.
[0055] The operations discussed with reference to FIG. 4 may be
performed by smartphones described herein, such as by smartphone
102 of FIG. 1. For example, the wideband light source 104, wideband
imaging sensor 106, processor(s) 108, circuitry for color channel
gain 112, and/or software executing on smartphone 102 may be used
to implement the pre-processing described and depicted with
reference to FIG. 4.
[0056] By illuminating finger 404 and imaging the finger 404
responsive to illumination, pixel data 406 may be obtained by the
imaging sensor (e.g., wideband imaging sensor 106 of FIG. 1). The
pixel data may include a set of image frames (e.g., set of pixel
data). Any of a variety of frame rates may be used. In some
examples, the pixel data may be captured as a video.
[0057] The pixel data may be adjusted by color gain values 408. The
color gain values 408 may be implemented by hardware of the
smartphone 402, such as by circuitry for color channel gain 112 of
FIG. 1. The color gain values may be set per-channel. In the
example of FIG. 4, the color channel gain for the red channel is
shown as 1--this may refer to the pixel data in the red channel
being unmodified by color gain values 408. The color gain value for
the green channel is shown as 3. This may refer to the pixel values
associated with the green color channel being multiplied by a
factor of 3. The color gain value for the blue channel is shown as
18. This may refer to the pixel values associated with the blue
color channel being multiplied by a factor of 18. Other gain values
may be used in other examples. Gains for the R, G, and B channels
may be empirically determined.
[0058] In some examples, gains for the R. G. and B channels may be
automatically and/or programmatically determined. For example,
initial values may be selected, and an output may be examined based
on certain metrics--such as a difference between the minimum and
maximum value of the signal and an indication of whether the signal
has clipped (e.g., hit a highest possible or lowest possible value,
such as 0 or 255 in the example using 256 pixel values to encode
the data). Gain values may be selected in this manner based on
feedback from PPG signals modified by initial gain values. The
feedback may be used to modify and/or select gain values, such as
by maximizing the minimum-to-maximum signal value for each channel
and/or eliminating or reducing clipping. The gains may be selected
to avoid and/or reduce clipping or biasing towards one channel.
[0059] The adjusted pixel data may receive a variety of
pre-processing, such as averaging or smoothing. In the example of
FIG. 4, this is shown as being implemented by pre-processor 410.
The pre-processor 410 may be implemented using a smartphone, such
as smartphone 102 of FIG. 1 or smartphone 402 of FIG. 4. For
example, the smartphone 102 may include hardware and/or software
for performing the pre-processing described herein. The
pre-processing may occur per-color channel. In some examples, an
average pixel value for each color channel may be calculated for
each frame. In the example of FIG. 4, a data point may be generated
for each frame. The data point may in some examples include three
values--one for each color channel. This manipulation may generate
PPG signals 412. If n frames are taken, the data representing PPG
signals may be a 3.times.n matrix--with 3 values for each frame
(one for each color channel). Each frame is represented as an
average red channel value, an average green channel value, and an
average blue channel value. In some examples, only selected pixels
of the frame may be used in calculating the average. In some
examples, a weighted average or other combination may be used to
generate PPG signals.
[0060] In some examples, the PPG signals, and/or metrics based on
the PPG signals may be used as feedback to set, change, and/or
adjust gain values. For example, the gain values may in some
examples be calibrated to generate color gain values which may
achieve usable results over a range of lighting conditions. In some
examples, an average level of each color channel (e.g., R, G, and
B) of the PPG signals 412 may be calculated and used as feedback to
adjust the color gain values. For example, the executable
instructions for setting color gain values 114 of FIG. 1 may
include instructions for setting and/or adjusting color gain values
based on feedback, such as an average value of one or more channels
in the PPG signals. In some examples, initial color gain values may
be those selected by the smart phone in accordance 3 with an
auto-balance procedure. The color gain values may be adjusted based
on feedback from the output of the auto-balance procedure to attain
predetermined goals for the channel values in the PPG signals.
Generally, the executable instructions or calibration process may
aim to adjust and/or set the color gain values such that the pixels
values in each channel are not clipping or saturating and occur
within the same range of the color spectrum e.g., (+/-30%). Other
tolerances may be used in other examples.
[0061] Examples described herein may refer to classification based
on adjusted pixel data. It is to be understood that classification
based on adjusted pixel data may utilize adjusted pixel data as an
input to a classification technique and/or may utilize data which
has been pre-processed in some way, such as PPG signals 412.
[0062] FIG. 5 is a schematic illustration of classification
techniques arranged in accordance with examples described herein.
Classification techniques may be used to predict SpO2 levels based
on adjusted pixel data as described herein (e.g., adjusted pixel
data and/or PPG data). Classification may be performed, by example,
using smartphone 102 of FIG. 1. The smartphone 102 of FIG. 1 may
perform classification in accordance with executable instructions
for classification 116, including deep learning model 120. FIG. 5
illustrates two examples of classification techniques--logistic
regression 502 and convolutional neural network 504.
[0063] Logistic regression 502 may generally refer to a statistical
model that uses a logistic function to model a dependent variable.
The logistic regression 502 may not use training data that has been
normalized across each color channel. Some examples may use the
standard deviation of the data (e.g., adjusted pixel data or data
based on adjusted pixel data, such as PPG signals) to calculate an
AC component of a signal for use in SpO2 classification, but that
may not be used in other examples. Logistic regression 502 may use
the SK-learn library.
[0064] Logistic regression 502 may receive as input 3-channel RGB
data computed from multiple frames of pixel data. In the example,
of FIG. 5, 30 samples of data are mentioned, representing 1 second
each. Other frames and time periods may be used in other examples.
The logistic regression 502 may be configured to output a predicted
SpO2 level, such as between 60-100 in the example of FIG. 5,
although other ranges may be used. The logistic regression 502 may
be configured to arrive at the SpO2 level by minimizing L2 loss,
and may apply an L2 regularization term to the weights with a
power, such as .lamda.=0.001 in the example of FIG. 5 which is
shown in Equation 4. In Equation 4, F(X) and f(x.sub.i) represent
the output of the model on a batch or single sample, Y and y.sub.i
represent the ground truth for a batch or single sample, .theta. is
the parameters or the weights of the model, and n is the size of
the batch.
Loss .function. ( F .function. ( X ) , Y ; .theta. ) = i = 0 n
.times. ( f .function. ( x i ) - y i ) 2 + 1 .lamda. .times. j
.times. .theta. j 2 Equation .times. .times. 4 ##EQU00003##
[0065] In some examples, convolutional neural network 504 may be
used. The convolutional neural network 504 may be used instead of
logistic regression 502 in some examples. The convolutional neural
network 504 may receive adjusted pixel data and/or data based on
adjusted pixel data (e.g., PPG signals) as input and provide a
predicted SpO2 level as output. The convolutional neural network
504 of FIG. 5 is depicted as receiving 3 features, e.g., three
channels (e.g., an R value, G value, and B value) for each of 270
samples, representing 9 seconds of video at 30 frames per second.
Other frame rates, sample sizes, and features may be used in other
examples. The convolutional neural network 504 may apply a
convolution kernel to the input data to produce an predicted SpO2
level as output, such as in a range between 60-100. In another
example, three input channels (e.g., R, G, B) may be used for each
of 90 frames, which may be taken, for example from 3 seconds of
video data taken at 30 frames per second. Other numbers of frames
or frame rates may be used in other examples.
[0066] The convolutional neural network 504 may be trained on
ground truth training data in some examples. Compared to NN-based
image recognition tasks, the 1-D, 3-channel RGB data used as input
to convolutional neural network 504 may be considered to have low
dimensionality. Therefore, a neural network solution (e.g.,
convolutional neural network 504) may be used with fewer
parameters, so as to improve the likelihood of the model's ability
to generalize. The convolutional neural network 504 may have a
single convolutional layer with a number of output channels (e.g.,
10) followed by a dense layer. For the first convolution, the RGB
channel components of the input signals may be treated as a second
dimension and kernel sizes of 3.times.3 may be used with no
padding. Training and validation data sets may be normalized and
standardized based on a weighted channel-wise mean and standard
deviation of the training dataset, where the weights may be scaled
by the length each subject's data collection. So, if subject 1 was
recorded for 12 minutes and subject 2 was recorded for 10 minutes,
both subjects would be equally weighted in the training set mean
and standard deviation calculation. The model may be trained using
the Adam optimizer with a learning rate of 0.01 and an L2
regularization of strength 0.1, although other training techniques
may be used. Mean Absolute Error (MAE) may be optimized as a loss
function, although other optimization criteria may be used. The
convolutional neural network 504 may be built and trained using the
PyTorch library. The channel size and size of input window may be
chosen through a hyperparameter grid search, although other
selection criteria may be used.
[0067] Other implementations of a convolutional neural network 504
may also be used. In some examples, the convolutional neural
network 504 may be implemented using a deep learning model having 2
convolutional layers and 1 linear layer (e.g., combinations of
computations) operating on the input data (e.g., 3 seconds of RGB
video data, representing 90 frames for 3 seconds at 30 frames per
second).
[0068] An output of the convolutional neural network 504 may be a
predicted SpO2 level of an individual, which may be evaluated using
a mean average error (MAE). The MAE may be compared to ground truth
data, such as standalone pulse oximeter readings.
[0069] FIG. 6 is a schematic illustration of the training and
operation of a classifier in accordance with examples described
herein. The classifier 616 may be a trained classifier, and may be
implemented using and/or may be used to implement the executable
instructions for classification 116 of FIG. 1, together with the
processor(s) 108 of FIG. 1 in some examples. Training of the
classifier 616 may be performed generally by any computing system.
The training may occur prior to use of a smartphone to classify
SpO2 levels in some examples. The training may occur to provide a
deep learning model, such as deep learning model 120 of FIG. 1,
which may be a trained model. In some examples, some or all of the
training may be provided by the smartphone itself, such as by
smartphone 102. The trained model may then be used to classify user
data to predict an SpO2 level associated with the data.
[0070] FIG. 6 includes PPG signals 602, some or all of which may be
used as training data 606. The training data 606 may, for example,
be data from subjects having known SpO2 levels (e.g., from an
induced hypoxia study also referred to as an FiO2 study). In the
example, there may be five sets of training data, each of which may
be from a different subject and/or hand of a subject. Each set of
training data may, for example, represent a video of PPG signals
where the SpO2 level varies over a range (e.g., 65-100). The
training data 606 may be subject to sampling 608 and normalization
610 before being used to train convolutional neural network 612.
For example, each set of training data 606 may be sampled in that
each set may include data representative of multiple SpO2 levels
over time (e.g., from an induced hypoxemia study). Accordingly, a
segment of the training data may be sampled which may generally
correspond to a single SpO2 level (e.g., 9 seconds of data, or some
other amount of time). The convolutional neural network 612 may be
implemented using, for example, the convolutional neural network
504 of FIG. 5.
[0071] The training data 606 may also be provided to statistic
calculator 614 to calculate statistics based on the training data
606, such as weighted mean and standard deviation. For example,
statistics may be calculated for each color channel in the training
data, such as red, green, and blue color values. In this manner,
some statistics about the training data 606 may be generated. The
training actions, such as sampling 608, normalization 610,
convolutional neural network 612, and/or statistic calculator 614
may be performed by one or more processor(s) which may access a
computer readable media and execute instructions for performing the
same. The training process may results in a trained classifier,
e.g., classifier 616. The training process may ensure, for example
that the convolutional neural network 612 may be iteratively
updated such that predictions from the neural network model
correspond with ground truth SpO2 levels recorded for the training
data. Accordingly, weights for the convolutional neural network 612
may be calculated during the training process. Those weights are
then shown implemented as classifier 616.
[0072] In some examples, such as the "implemented examples"
described below, performance of the trained classifier 616 may be
evaluated by providing another set of PPG signals 602, not used
during the training, as an input to the trained classifier 616. For
example, a set of PPG signals 602 not used during the training
process may be provided as user data 604. The user data may be
sampled 618, and normalized and/or standardized 620 and provided as
normalized, standardized inputs to the classifier 616. In some
examples, the statistics calculated based on the training data may
be used to normalize and/or standardize the new inputs in 620. An
SpO2 level predicted by the trained classifier 616 may be compared
to any available known ground truth SpO2 data associated with the
user data 604 to evaluate the performance of the trained classifier
616, as reported in the "implemented examples" section.
[0073] During operation, new user data 604 may be obtained (e.g.,
by illuminating and imaging a user's finger). The data may not have
ground truth SpO2 data associated with it--it may be new subject
data for which an SpO2 prediction is desired. The data may be
subject to sampling 618 and normalization 620. The normalization
620 may occur with reference to the statistics calculated based on
the training data, e.g., in statistic calculator 614. The
normalized user data may be provided to classifier 616, which may
output an SpO2 prediction 622.
IMPLEMENTED EXAMPLES
[0074] Varied FiO2 Study.
[0075] A varied FiO2 study was performed using the varied
fractional inspired oxygen protocol administered by a clinical
validation laboratory, Clinimark, a group that performs validation
services for medical devices. This experiment was approved by the
Internal Review Board. Six subjects were administered controlled
fractional mixtures of medical grade oxygen-nitrogen in a
controlled hospital setting. The subjects rested comfortably in a
reclined position while the gas mixture was given to induce
hypoxemia in a stair-stepped manner. During this time, the
subjects' fingers were instrumented with transmittance pulse
oximeter clips and placed on two smartphone devices, with one
smartphone device on the index finger of each hand. Ground truth
data was recorded using multiple purpose-built pulse oximeters,
including a tight-tolerance transfer standard pulse oximeter, the
Masimo Radical-7. Subject characteristics and data statistics can
be seen in Table 1.
TABLE-US-00001 TABLE 1 Subject breakdown for the FiO.sub.2 study.
SpO.sub.2 (%) Duration Subject Mean Median Min Max (sec.) Skin Tone
Sex Age Subject 1 87.15 91 65 100 1090 White Male 31 Subject 2
88.66 89 73 99 1121 Black Male 34 Subject 3 86.71 90 66 100 1066
White Female 23 Subject 4 90.29 90 78 100 1015 White Male 20
Subject 5 85.80 87 66 99 926 White Female 24 Subject 6 83.86 85 61
99 833 White Female 23 Mean/Range 87.08 88.67 68.17 99.50 1008.50
1B/5W 3F/3M 20-34 Ground truth data statistics (in SpO.sub.2 %) for
each subject. The average difference between mean and median for
each subject is 1.58, showing little skew. The average length of
each run is about 16 minutes.
[0076] Noteworthy observations were also recorded, including the
observation that one subject, Subject 1 in the analysis, had
particularly callused hands, which we might have interfered with
examples described herein.
[0077] Smartphone Device Configuration and Setup. Smartphone data
was collected with a Google Nexus 6P, recording video at 30 frames
per second. The device was specifically configured so that hardware
camera settings did not change throughout the entire study. This
was done by locking auto-balancing and enhancing color gain, a
unique step in this system. The color gains were set to 1.times.
for the red channel, 3.times. for the green channel, and 18.times.
for the blue channel. These values were chosen based on an
empirical study with 20 health individuals the best gain values
were manually analyzed to avoid data loss due to compression and
obtain optimal signal quality. During the Varied FiO2 study,
because the device could overheat from recording continuous video
for too long, clay ice packs were placed around the device to keep
its temperature down. The ice packs were placed strategically to
avoid contact with the hand.
[0078] Manual Hardware Sensitivity, Exposure, and Whitebalance
Settings. To ensure that the blue and green signals are not lost, a
fixed color gain was empirically determined and assigned, ensuring
that a usable signal is recorded by the camera for all 3 color
channels. The empirically determined gains were 1, 3, 18 for R, G,
and B in this example. After setting the color channel, the use of
1.2 ms for exposure time and a sensor sensitivity of 300 ISO was
also determined empirically to perform well in evenly exposing R,
G, and B color channel PPG signals at the middle of the 0-255 value
range.
[0079] As shown in FIG. 3, the left side shows an example of
directly measuring the PPG signal with standard auto-balancing
algorithm. It can be seen that the red PPG clips at the top of the
range while the green and blue channels are close to 0. In
comparison, using our custom hardware gain settings controlled
through the Android Camera2 API, all three color channel PPGs are
well represented in the 8 bit range on the right side graph (graph
304) of FIG. 3.
[0080] Data Preprocessing. For each hand on each subject, an
ordered list of a number, n, of RGB image frames were obtained,
each with 176.times.144 pixels. To obtain a PPG signal, we take the
mean pixel value for each color channel and obtain a 3.times.n
shaped matrix of values. Because humans are asymmetrical, different
internal arm and body structures can lead to differences in blood
flow to the right and left arms. These differences were seen in the
data collected. Therefore, each hand subject pair was treated as a
unique subject. However, predictions were visualized for the right
and left arm adjacently for ease of comparison. Finally, the data
was divided into 1 sample for each 1-second (30 frames) window,
combining the 3 seconds (90 frames) of sample RGB data centered on
1 ground truth SpO2 reading as one sample. This provides over 5000
training examples (5 subjects) to the models, with about 1000
samples (1 subject) held out for the test set for each round of
LOOCV
[0081] Pulse Oximetry Validation. A regression analysis was
performed to compare smartphone measurements taken in accordance
with examples described herein (e.g., with reference to FIG. 1-FIG.
6 to a purpose-built pulse oximeter with error and Bland-Altman
metrics. In the performance assessment, models were evaluated using
Leave-One-Subject-Out cross validation (LOOCV). Specifically,
training and testing were performed on six validation splits, with
a different subject (both hands) held out for validation in each
split. The ground truth distributions of the splits were visually
examined to ensure there is not heavy imbalance in the dataset. The
performance of algorithms was compared using Mean Absolute
Error.
[0082] Hypoxemia Screening Tool. A classification analysis was
performed, thresholding the ground truth recordings below 3
different SpO2 levels (95%, 90%, and 85%) and compared to the
thresholded regression result. We examine how the true positive and
false positive rates at different screening decision boundaries
perform to illustrate the potential performance for triaging
depending on the needs of a use case. To interrogate the potential
to adjust this decision boundary to bias towards recall vs.
precision, we vary the decision boundary across the range of
80%-100% and plot ROC curves for each subject using LOOCV.
[0083] SpO2 Prediction via Pulse Oximetry
[0084] The logistic regression model achieved a mean absolute error
of 5.40% with L2 regularization. During development, the
introduction of L2 regularization strongly improved the regressor's
performance. The 1-layer convolutional neural network model
produced a mean average error (MAE) of 4.13 when analyzed via LOOCV
against all the data that was gathered in the varied FiO2 study.
While this performance is slightly better than that of the logistic
regression model, the convolutional network has similar behavior in
predictions across different SpO2 values to the logistic
regression. The model performed best on Subject 2 achieving a mean
absolute error of 2.89%, a mean and std. difference of -2.34% and
4.65%. For this subject, in the SpO2 ranges of 65%-80%, 81%-90%,
and 91%-100%, the model achieved mean differences of -0.47%,
-3.07%, and -2.53%, and standard deviations of differences of
2.34%, 3.57%, and 5.25%. Excluding subject 1, the model performed
worst on Subject 5 achieving a mean absolute error of 4.29%, a mean
and std. difference of -0.68% and 10.11%. For this subject, in the
SpO2 ranges of 65%-80%, 81%-90%, and 91%-100%, the model achieved
mean differences of 3.75%, -4.31%, and -1.06%, and standard
deviations of differences of 9.97%, 7.0%, and 6.34%.
[0085] The mean difference and standard deviation of difference
statistics in each ground truth range, as well as overall, are
summarzied in Table 2. The average mean difference across all 6
subjects on all ground truth values is -0.77.
TABLE-US-00002 TABLE 2 .mu..sub.d/LOA .mu..sub.d/LOA .mu..sub.d/LOA
Subject MAE .mu..sub.d/LOA (65%-80%) (81%-90%) (91%-100%) 1 6.46
4.52/15.02 15.35/7.27 5.2/3.65 -1.72/3.43 2 2.89 -2.34/4.65
-0.47/2.34 -3.07/3.57 -2.53/5.25 3 3.57 1.66/9.7 6.91/9.3
-1.43/8.41 0.3/3.31 4 3.87 -1.69/7.73 5.08/2.3 -1.21/8.68
-3.14/4.25 5 4.29 -0.68/10.11 3.75/9.97 -4.31/7.0 -1.06/6.34 6 3.68
3.16/6.85 5.36/7.5 2.4/3.63 1.37/5.81 Average 4.13 0.77/9.01
6.0/6.45 -0.4/5.82 -1.13/4.73 indicates data missing or illegible
when filed
[0086] When we separate examples into ground truth ranges, 65%-80%,
81%-90%, and 91%-100%, respectively, the average mean differences
across subjects in each range are 6.0, -0.4, -1.13. Further, for
Subjects 3, 5, and 6, from 65%-80%, there was a negative trend in
predictions and the mean difference is above the limits of
agreement for some ground truth values. Accordingly, the model
shows a pattern of consistently over-predicting on SpO2 samples
below 80% in this example. It is worth noting that without the
varied FiO2 study such as the one we carried out, we may not
observe the model performance below 85% at all.
[0087] To explore the potential of using the example smartphone
camera oximeter system as a screening tool for hypoxemia, the
classification accuracy of an example model was calculated in
providing an accurate indication about whether an individual has an
SpO2 level below three different thresholds: 95%, 90%, and 85%. A
reading below 90% SpO2 is a common threshold below which it is
recommended to reach out for immediate medical attention, but other
thresholds could be clinically useful to screen for different
individuals based on their condition. Thus, the ability of an
example system to classify samples from the test set by
thresholding the regression result from a CNN at different decision
boundaries and comparing it to whether the ground truth pulse
oximeter also reads less than that value. Precision and recall was
computed across all combinations of LOOCV to compute an average
result. This simulates the device screening a subject it has never
seen before, as the model was trained only on the other 5 subjects
in our dataset.
[0088] The results of this classification algorithm can be seen in
Table 3. For the 90% classification problem, the model correctly
classifies 93% of the samples that were below 90% (recall) in the
data set, while 76% of its overall classifications were correct
(precision). For 95%, those numbers go up to 97% recall with 86%
precision, on average across all 6 test subjects.
TABLE-US-00003 TABLE 3 Subject <95% (P/R) <90% (P/R) P/R <
85% (P/R) 1 .76/.93 .65/.89 .93/.68 2 .81/1.0 .80/.99 .76/.66 3
.92/.99 .66/.95 .74/.92 4 .85/1.0 .67/.94 .24/.45 5 .88/.99 .80/.93
.67/.61 6 .92/.92 .99/.91 1.0/.89 Mean .86/.97 .76/.93 .72/.70
[0089] Not all combinations of test and train subjects displayed
the same level of accuracy. In order to visualize classification
accuracy across our entire dataset, the classification threshold
was varied for each classification goal between 80%-100% and the
results averaged across all 6 combinations of LOOCV. For <90%
classification, the highest precision to recall ratio was 0.76 to
0.93 with an F-score of 0.84, when using a regression decision
threshold of 91. This means that the model classified a truly low
SpO2 level correctly 93% of the time with this threshold, while
being correct on 76% of all of the classification decisions.
However, it may be preferable to choose a threshold that enables
higher recall to bias the system towards classifying low SpO2 cases
correctly more often. For example, with the current model and
validation data, choosing a decision threshold of 93 on the
regression result allowed for greater than 98% accuracy at
identifying positive cases (ground truth <90% SpO2), while only
resulting in 29% false positives.
* * * * *