U.S. patent application number 16/644888 was filed with the patent office on 2020-09-10 for a system and method for automated labeling and annotating unstructured medical datasets.
The applicant listed for this patent is THE GENERAL HOSPITAL CORPORATION. Invention is credited to Synho Do.
Application Number | 20200286614 16/644888 |
Document ID | / |
Family ID | 1000004859274 |
Filed Date | 2020-09-10 |
View All Diagrams
United States Patent
Application |
20200286614 |
Kind Code |
A1 |
Do; Synho |
September 10, 2020 |
A SYSTEM AND METHOD FOR AUTOMATED LABELING AND ANNOTATING
UNSTRUCTURED MEDICAL DATASETS
Abstract
Supervised and unsupervised learning schemes may be used to
automatically label medical images for use in deep learning
applications. Large labeled datasets may be generated from a small
initial training set using an iterative snowball sampling scheme. A
machine learning powered automatic organ classifier for imaging
datasets, such as CT datasets, with a deep convolutional neural
network (CNN) followed by an organ dose calculation is also
provided. This technique can be used for patient-specific organ
dose estimation since the locations and sizes of organs for each
patient can be calculated independently.
Inventors: |
Do; Synho; (Lexington,
MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
THE GENERAL HOSPITAL CORPORATION |
Boston |
MA |
US |
|
|
Family ID: |
1000004859274 |
Appl. No.: |
16/644888 |
Filed: |
September 10, 2018 |
PCT Filed: |
September 10, 2018 |
PCT NO: |
PCT/US18/50177 |
371 Date: |
March 5, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62555799 |
Sep 8, 2017 |
|
|
|
62555767 |
Sep 8, 2017 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06N 3/084 20130101;
G16H 30/40 20180101; G06K 9/6259 20130101; G06N 7/005 20130101;
G06K 9/6277 20130101; G06K 9/00671 20130101; G06K 9/628 20130101;
G06K 9/6202 20130101 |
International
Class: |
G16H 30/40 20060101
G16H030/40; G06K 9/00 20060101 G06K009/00; G06K 9/62 20060101
G06K009/62; G06N 3/08 20060101 G06N003/08; G06N 7/00 20060101
G06N007/00 |
Claims
1. A method for automatically processing unstructured medical
imaging data to generate classified images, the method comprising:
a) acquiring medical image data of a subject; b) subjecting the
medical image data of the subject to a neural network to generate
classified image data; c) comparing the classified image data to a
confidence test; d) upon determining that the classified image data
does not pass the confidence test, subjecting the classified image
data to a variational autoencoder (VAE) that implements a snowball
sampling algorithm to refine the classified image data by
representing features of the classified image data into latent
space with a Gaussian distribution; e) repeating steps c) and d)
until the classified image data passes the confidence test; and f)
generating annotated images from the classified image data.
2. The method of claim 1 wherein the Gaussian distribution is
achieved using a Gaussian mixture models (GMMs) to perform binary
clustering within each class across the classified image data.
3. The method of claim 1 wherein the confidence test includes a
threshold of at least one of speed, classification accuracy,
reproducibility, or efficacy.
4. The method of claim 1 wherein the classified image data includes
at least one of body region labels, a body organ labels, an organ
label, an image feature label, or a condition label.
5. The method of claim 1 further comprising training the neural
network by selecting an initial seed sample size to generate the
training dataset.
6. A method for automatic labeling and annotation for unstructured
medical datasets with snowball sampling comprising: a) acquiring
images of a region of a subject and labeling the images to generate
a training dataset with the images; b) training a convolutional
neural network with the training dataset; c) classifying unlabeled
images using the trained network; d) determining if a performance
threshold is exceeded for the classified images; and e) refining
the dataset if the threshold is not exceeded by using a variational
autoencoder to label the unlabeled images to create labeled images
and updating the dataset with the labeled images.
7. The method of claim 6 wherein the variational autoencoder
generates latent vectors approximating a unit Gaussian distribution
in order to minimize generative losses.
8. The method of claim 6 wherein the variational autoencoder
includes an encoder with a multilayer perceptron neural network
allowing it to map an input to a latent representation, and a
decoder that maps the latent representation to a reconstructed
input value.
9. The method of claim 6 further comprising selecting an initial
seed sample size to generate the training dataset.
10. The method of claim 6 wherein the performance threshold
includes a threshold of at least one of speed, classification
accuracy, reproducibility, or efficacy.
11. The method of claim 6 wherein classifying unlabeled images
includes at least one of identifying a body region, a body organ,
an organ, an image feature, or a condition.
12. The method of claim 6 wherein snowball sampling includes
refining the dataset at least twice.
13. A system for automatic labeling and annotation for unstructured
medical datasets from medical images with snowball sampling, the
system comprising: a) a computer system configured to: i) acquire
images of a region of a subject and label the images to generate a
training dataset with the images; ii) train a convolutional neural
network with the training dataset; iii) classify unlabeled images
using the trained network; iv) determine if a performance threshold
is exceeded for the classified images; and v) refine the dataset if
the threshold is not exceeded by using a variational autoencoder to
label the unlabeled images to create labeled images and updating
the dataset with the labeled images.
14. The system of claim 13 wherein the variational autoencoder
generates latent vectors approximating a unit Gaussian distribution
in order to minimize generative losses.
15. The system of claim 13 wherein the variational autoencoder
includes an encoder with a multilayer perceptron neural network
allowing it to map an input to a latent representation, and a
decoder that maps the latent representation to a reconstructed
input value.
16. The system of claim 13 further comprising selecting an initial
seed sample size to generate the training dataset.
17. The system of claim 13 wherein the performance threshold
includes a threshold of at least one of speed, classification
accuracy, reproducibility, or efficacy.
18. The system of claim 13 wherein classifying unlabeled images
includes at least one of identifying a body region, a body organ,
an organ, an image feature, or a condition.
19. The system of claim 13 wherein snowball sampling includes
refining the dataset at least twice.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional
Patent Application Ser. No. 62/555,799 filed on Sep. 8, 2017, and
entitled "A methodology for automated labeling and annotation for
unstructured big medical datasets," and U.S. Provisional Patent
Application Ser. No. 62/555,767 filed on Sep. 8, 2017, and entitled
"Method and apparatus of machine learning based personalized organ
dose estimation.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
[0002] N/A
BACKGROUND
[0003] Diagnostic medical imaging has become central to the
practice of modern medicine and diagnostic examination volume has
increased during the past decade. Additionally, as systems have
become more advanced with higher resolution, the number of images
in a given study has also increased. The increased demand for
diagnostic imaging also presents a growing risk for human error and
delayed diagnosis. While computer aided detection (CADe) and
diagnosis (CADx) systems can reduce such problems, they remain
limited due to their reliance upon hand-crafted features.
Deep-learning approaches sidestep this problem by extracting these
features on their own. Recent advances in deep learning technology
have enabled data-driven learning of nonlinear image filters and
classifiers have improved detection and segmentation of multiple
medical applications including brain infarcts, automated bone age
analysis, and skin lesion classification. Despite these advances,
large-scale and well-labeled training datasets for deep learning
are essential for the networks to learn representative and
hierarchical abstractions.
[0004] This labeling requirement is inherently difficult to meet in
the medical domain where medical expertise is expensive, labeling
is tedious and time-consuming, and examples of certain disease
pathologies may be rare. Several automated annotation approaches
have been attempted on brain CT, brain MR, and other biomedical
image modalities with various feature representation, clustering,
and classification algorithms. These approaches are limited because
only low-level visual features such as color, edges, and color
layouts are extracted. Even with higher-level feature extraction
from MR voxels by hierarchical learning using two-layer random
forests, segmentation performance is not generally better than
features extracted by deep convolutional neural networks.
Furthermore, all current methods of annotating medical images still
require mid- to large-sized labeled image datasets for obtaining
the trained model.
[0005] Axial image location classification is a fundamental step in
multiple initial classification processes to classify the location
of an image in a volumetric CT examination. Classifying the
location is a challenging problem because the details of body
regions can vary dramatically between patients, such as with brain
gyral patterns, cervical vertebral anatomy, pulmonary vessels, and
bowel distribution. Degenerative changes can also distort bony
anatomy enough to confuse the network. As a result, large training
data sets are common for algorithms to achieve sufficient
accuracy.
[0006] Body-part recognition is also important in automatic medical
image analysis as it is a prerequisite step for anatomy
identification and organ segmentation. Accurate body-part
classification facilitates organ detection and segmentation by
reducing the search range for an organ of interest. Multiple
techniques have been developed using multi-class random regression
and decision forests to classify multiple anatomical structures
ranging from 6-10 organs on tomographic (CT) scans. These
classifiers can discriminate between similar structures such as the
aortic arch and heart. However, these prior works focus on a
general anatomical body part classification.
[0007] Thus, high-quality training data is important to training
neural networks and unlock the potential for neural networks to
truly improve the clinical use of medical images. However, creating
high-quality training datasets is expensive and time-consuming.
SUMMARY OF THE DISCLOSURE
[0008] The present disclosure addresses the aforementioned
drawbacks by providing a system and method for using supervised and
unsupervised learning schemes to automatically label medical images
for use in subsequent deep learning applications. The system can
generate a large labeled dataset from a small initial training set
using an iterative snowball sampling scheme. A machine-learning
powered, automatic organ classifier for imaging datasets, such as
CT datasets, with a deep convolutional neural network (CNN)
followed by an organ dose calculation is also provided. This
technique can be used for patient-specific organ dose estimation
because the locations and sizes of organs for each patient can be
calculated independently, rather than other simulation based
methods.
[0009] In one configuration, a method is provided for automatically
processing unstructured medical imaging data to generate classified
images. The method includes acquiring medical image data of a
subject and subjecting the medical image data of the subject to a
neural network to generate classified image data. The method may
also include comparing the classified image data to a confidence
test, and upon determining that the classified image data does not
pass the confidence test, subjecting the classified image data to a
variational autoencoder (VAE) that implements a snowball sampling
algorithm to refine the classified image data by representing
features of the classified image data into latent space with a
Gaussian distribution. In some configurations, this is repeated
until the classified image data passes the confidence test.
Annotated images may then be generated from the classified image
data.
[0010] In one configuration, a method is provided for automatic
labeling and annotation for unstructured medical datasets with
snowball sampling. The method includes acquiring images of a region
of a subject and labeling the images to generate a training dataset
with the images. The method also includes training a network, such
as a convolutional neural network, with the training dataset and
classifying unlabeled images using the trained network. The method
may also include determining if a performance threshold is exceeded
for the classified images. The dataset may be refined if the
threshold is not exceeded by using a variational autoencoder to
label the unlabeled images to create labeled images and updating
the dataset with the labeled images.
[0011] In one configuration, a system is provided for automatic
labeling and annotation for unstructured medical datasets from
medical images with snowball sampling. The system includes a
computer system configured to: i) acquire images of a region of a
subject and label the images to generate a training dataset with
the images; ii) train a convolutional neural network with the
training dataset; iii) classify unlabeled images using the trained
network; iv) determine if a performance threshold is exceeded for
the classified images; and v) refine the dataset if the threshold
is not exceeded by using a variational autoencoder to label the
unlabeled images to create labeled images and updating the dataset
with the labeled images.
[0012] In one configuration, a method is provided for organ
classification for unstructured medical datasets. The method
includes acquiring images of a region of a subject and labeling the
images to generate a training dataset with the images. The method
may also include training a network, such as a convolutional neural
network, with the training dataset. A region in the images may be
classified using the trained network. The classified images may be
segmented using the convolutional neural network to generate
segmented images that distinguish between at least two different
organs in the classified regions in the images. A report may be
generated of a calculated radiation dose for at least one of the
organs in the segmented images.
[0013] The foregoing and other aspects and advantages of the
present disclosure will appear from the following description. In
the description, reference is made to the accompanying drawings
that form a part hereof, and in which there is shown by way of
illustration a preferred embodiment. This embodiment does not
necessarily represent the full scope of the invention, however, and
reference is therefore made to the claims and herein for
interpreting the scope of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 is a schematic diagram of one system in accordance
with the present disclosure.
[0015] FIG. 2 is a schematic diagram showing further details of
one, non-limiting example of the system of FIG. 1.
[0016] FIG. 3 is a flowchart setting forth some examples of steps
for a process in accordance with one aspect of the disclosure.
[0017] FIG. 4 is a flowchart setting forth some non-limiting
examples of steps for a process for utilizing an autoencoder
network with four convolutional and deconvolution layers in
accordance with one aspect of the present disclosure.
[0018] FIG. 5 is a graphic illustration of forward and backward
propagation using one configuration of an autoencoder where latent
and generative losses are minimized in accordance with the present
disclosure.
[0019] FIG. 6A is an image of a coronal reconstruction of a
whole-body CT with each body region identified in accordance with
the present disclosure.
[0020] FIG. 6B is a panel of axial image slices corresponding to
the body regions identified in FIG. 6A in accordance with the
present disclosure.
[0021] FIG. 7 is a graph providing a scatterplot of feature
representations projected onto 2D latent space of a convolutional
variational autoencoder in accordance with the present
disclosure.
[0022] FIG. 8 is a series of correlated graphs of 6 example
snowball sampling method reflecting increasing accuracy for
increasing iterations in accordance with the present
disclosure.
[0023] FIG. 9A is a graph of examples of classification accuracy
versus a number of snowball sampling iterations in accordance with
the present disclosure.
[0024] FIG. 9B is a graph of classification accuracy versus
training data size for comparing one configuration of a tuned
convolutional network with and without snowball sampling in
accordance with the present disclosure.
[0025] FIG. 10A is an image of a circle whose area is the same as
that of a patient cross section from FIG. 10B and which may be used
to measure a patient effective diameter in accordance with the
present disclosure.
[0026] FIG. 10B is an example CT image of a patient cross
section.
[0027] FIG. 11 is a flowchart setting forth some non-limiting
examples of steps for one configuration of an organ dose estimation
method in accordance with the present disclosure.
DETAILED DESCRIPTION
[0028] The present disclosure provides systems and method for
supervised and unsupervised learning schemes that may be used to
automatically label medical images for use in deep learning
applications. Large labeled datasets may be generated from a small
initial training set using an iterative snowball sampling scheme. A
machine learning powered automatic organ classifier for imaging
datasets, such as CT datasets, with a deep convolutional neural
network (CNN) followed by an organ dose calculation is also
provided. This technique can be used for patient-specific organ
dose estimation since the locations and sizes of organs for each
patient can be calculated independently.
[0029] In one configuration, a desired classification accuracy may
be achieved with a minimal labeling process. Using an iterative
snowball sampling approach, a large medical image dataset may be
annotated automatically with a smaller training subset. The
automatic labeling system may include a variational autoencoder
(VAE) for the purpose of feature representation, Gaussian mixture
models (GMMs) for clustering and refining of mislabeled classes,
and deep convolutional neural network (DCNN) for classification.
The system and method can also quickly and efficiently identify an
organ of interest at a higher accuracy when compared to current
text-based body part information in digital imaging and
communications in medicine (DICOM) headers. In one configuration,
the method selects candidates, classifies them by the DCNN, and
then fully refines them by learning features from a VAE and
clustering the features by GMM.
[0030] Referring to FIG. 1, an example of a system 100 is shown for
automatically labeling images using image data in accordance with
some aspects of the disclosed subject matter. As shown in FIG. 1, a
computing device 110 can receive multiple types of image data from
an image source 102. In some configurations, the computing device
110 can execute at least a portion of an automatic image labelling
system 104 to automatically determine whether a feature is present
in images of a subject.
[0031] Additionally or alternatively, in some configurations, the
computing device 110 can communicate information about image data
received from the image source 102 to a server 120 over a
communication network 108, which can execute at least a portion of
the automatic image labelling system 104 to automatically determine
whether a feature is present in images of a subject. In such
configurations, the server 120 can return information to the
computing device 110 (and/or any other suitable computing device)
indicative of an output of the automatic image labelling system 104
to determine whether a feature is present or absent.
[0032] In some configurations, the computing device 110 and/or
server 120 can be any suitable computing device or combination of
devices, such as a desktop computer, a laptop computer, a
smartphone, a tablet computer, a wearable computer, a server
computer, a virtual machine being executed by a physical computing
device, etc. In some configurations, the automatic image labelling
system 104 can extract features from labeled (e.g., labeled as
including a condition or disease, or normal) image data using a CNN
trained as a general image classifier, and can perform a
correlation analysis to calculate correlations between the features
corresponding to the image data and a database. In some
embodiments, the labeled data can be used to train a classification
model, such as a support vector machine (SVM), to classify features
as indicative of a disease or a condition, or as indicative of
normal. In some configurations, the automatic image labelling
system 104 can provide features for unlabeled image data to the
trained classification model.
[0033] In some configurations, the image source 102 can be any
suitable source of image data, such as an MRI, CT, ultrasound, PET,
SPECT, x-ray, or another computing device (e.g., a server storing
image data), and the like. In some configurations, the image source
102 can be local to the computing device 110. For example, the
image source 102 can be incorporated with the computing device 110
(e.g., the computing device 110 can be configured as part of a
device for capturing and/or storing images). As another example,
the image source 102 can be connected to the computing device 110
by a cable, a direct wireless link, or the like. Additionally or
alternatively, in some configurations, the image source 102 can be
located locally and/or remotely from the computing device 110, and
can communicate image data to the computing device 110 (and/or
server 120) via a communication network (e.g., the communication
network 108).
[0034] In some configurations, the communication network 108 can be
any suitable communication network or combination of communication
networks. For example, the communication network 108 can include a
Wi-Fi network (which can include one or more wireless routers, one
or more switches, etc.), a peer-to-peer network (e.g., a Bluetooth
network), a cellular network (e.g., a 3G network, a 4G network,
etc., complying with any suitable standard, such as CDMA, GSM, LTE,
LTE Advanced, WiMAX, etc.), a wired network, etc. In some
configurations, the communication network 108 can be a local area
network, a wide area network, a public network (e.g., the
Internet), a private or semi-private network (e.g., a corporate or
university intranet), other suitable type of network, or any
suitable combination of networks. Communications links shown in
FIG. 1 can each be any suitable communications link or combination
of communications links, such as wired links, fiber optic links,
Wi-Fi links, Bluetooth links, cellular links, etc.
[0035] FIG. 2 shows an example of hardware 200 that can be used to
implement the image source 102, computing device 110, and/or server
120 in accordance with some aspects of the disclosed subject
matter. As shown in FIG. 2, in some configurations, the computing
device 110 can include a processor 202, a display 204, one or more
inputs 206, one or more communication systems 208, and/or memory
210. In some configurations, the processor 202 can be any suitable
hardware processor or combination of processors, such as a central
processing unit (CPU), a graphics processing unit (GPU), etc. In
some configurations, the display 204 can include any suitable
display devices, such as a computer monitor, a touchscreen, a
television, etc. In some configurations, the inputs 206 can include
any of a variety of suitable input devices and/or sensors that can
be used to receive user input, such as a keyboard, a mouse, a
touchscreen, a microphone, and the like.
[0036] In some configurations, the communications systems 208 can
include a variety of suitable hardware, firmware, and/or software
for communicating information over the communication network 108
and/or any other suitable communication networks. For example, the
communications systems 208 can include one or more transceivers,
one or more communication chips and/or chip sets, etc. In a more
particular example, the communications systems 208 can include
hardware, firmware and/or software that can be used to establish a
Wi-Fi connection, a Bluetooth connection, a cellular connection, an
Ethernet connection, etc.
[0037] In some configurations, the memory 210 can include any
suitable storage device or devices that can be used to store
instructions, values, etc., that can be used, for example, by the
processor 202 to present content using the display 204, to
communicate with the server 120 via the communications system(s)
208, and the like. The memory 210 can include any of a variety of
suitable volatile memory, non-volatile memory, storage, or any
suitable combination thereof. For example, the memory 210 can
include RAM, ROM, EEPROM, one or more flash drives, one or more
hard disks, one or more solid state drives, one or more optical
drives, etc. In some configurations, the memory 210 can have
encoded thereon a computer program for controlling operation of the
computing device 110. In such configurations, the processor 202 can
execute at least a portion of the computer program to present
content (e.g., MRI images, user interfaces, graphics, tables, and
the like), receive content from the server 120, transmit
information to the server 120, and the like.
[0038] In some configurations, the server 120 can include a
processor 212, a display 214, one or more inputs 216, one or more
communications systems 218, and/or memory 220. In some
configurations, the processor 212 can be a suitable hardware
processor or combination of processors, such as a CPU, a GPU, and
the like. In some configurations, the display 214 can include a
suitable display devices, such as a computer monitor, a
touchscreen, a television, and the like. In some configurations,
the inputs 216 can include a suitable input devices and/or sensors
that can be used to receive user input, such as a keyboard, a
mouse, a touchscreen, a microphone, and the like.
[0039] In some configurations, the communications systems 218 can
include a suitable hardware, firmware, and/or software for
communicating information over the communication network 108 and/or
any other suitable communication networks. For example, the
communications systems 218 can include one or more transceivers,
one or more communication chips and/or chip sets, and the like. In
a more particular example, the communications systems 218 can
include hardware, firmware and/or software that can be used to
establish a Wi-Fi connection, a Bluetooth connection, a cellular
connection, an Ethernet connection, and the like.
[0040] In some configurations, the memory 220 can include any
suitable storage device or devices that can be used to store
instructions, values, and the like, that can be used, for example,
by the processor 212 to present content using the display 214, to
communicate with one or more computing devices 110, and the like.
The memory 220 can include any of a variety of suitable volatile
memory, non-volatile memory, storage, or any suitable combination
thereof. For example, the memory 220 can include RAM, ROM, EEPROM,
one or more flash drives, one or more hard disks, one or more solid
state drives, one or more optical drives, and the like. In some
configurations, the memory 220 can have encoded thereon a server
program for controlling operation of the server 120. In such
configurations, the processor 212 can execute at least a portion of
the server program to transmit information and/or content (e.g.,
MRI data, results of automatic diagnosis, a user interface, and the
like) to one or more computing devices 110, receive information
and/or content from one or more computing devices 110, receive
instructions from one or more devices (e.g., a personal computer, a
laptop computer, a tablet computer, a smartphone, and the like),
and the like.
[0041] In some configurations, the image source 102 can include a
processor 222, imaging components 224, one or more communications
systems 226, and/or memory 228. In some embodiments, processor 222
can be any suitable hardware processor or combination of
processors, such as a CPU, a GPU, and the like. In some
configurations, the imaging components 224 can be any suitable
components to generate image data corresponding to one or more
imaging modes (e.g., T1 imaging, T2 imaging, fMRI, and the like).
An example of an imaging machine that can be used to implement the
image source 102 can include a conventional MRI scanner (e.g., a
1.5 T scanner, a 3 T scanner), a high field MRI scanner (e.g., a 7
T scanner), an open bore MRI scanner, a CT system, an ultrasound
scanner and the like.
[0042] Note that, although not shown, the image source 102 can
include any suitable inputs and/or outputs. For example, the image
source 102 can include input devices and/or sensors that can be
used to receive user input, such as a keyboard, a mouse, a
touchscreen, a microphone, a trackpad, a trackball, hardware
buttons, software buttons, and the like. As another example, the
image source 102 can include any suitable display devices, such as
a computer monitor, a touchscreen, a television, etc., one or more
speakers, and the like.
[0043] In some configurations, the communications systems 226 can
include any suitable hardware, firmware, and/or software for
communicating information to the computing device 110 (and, in some
embodiments, over the communication network 108 and/or any other
suitable communication networks). For example, the communications
systems 226 can include one or more transceivers, one or more
communication chips and/or chip sets, and the like. In a more
particular example, the communications systems 226 can include
hardware, firmware and/or software that can be used to establish a
wired connection using any suitable port and/or communication
standard (e.g., VGA, DVI video, USB, RS-232, and the like), Wi-Fi
connection, a Bluetooth connection, a cellular connection, an
Ethernet connection, and the like.
[0044] In some configurations, the memory 228 can include any
suitable storage device or devices that can be used to store
instructions, values, image data, and the like, that can be used,
for example, by the processor 222 to: control the imaging
components 224, and/or receive image data from the imaging
components 224; generate images; present content (e.g., MRI images,
a user interface, and the like) using a display; communicate with
one or more computing devices 110; and the like. The memory 228 can
include any suitable volatile memory, non-volatile memory, storage,
or any of a variety of other suitable combination thereof. For
example, the memory 228 can include RAM, ROM, EEPROM, one or more
flash drives, one or more hard disks, one or more solid state
drives, one or more optical drives, and the like. In some
configurations, the memory 228 can have encoded thereon a program
for controlling operation of the image source 102. In such
configurations, the processor 222 can execute at least a portion of
the program to generate images, transmit information and/or content
(e.g., MRI image data) to one or more the computing devices 110,
receive information and/or content from one or more computing
devices 110, receive instructions from one or more devices (e.g., a
personal computer, a laptop computer, a tablet computer, a
smartphone, and the like), and the like.
[0045] Referring to FIG. 3, a flowchart is provided setting forth
some non-limiting example steps for a method of automatically
classifying unstructured imaging data in accordance with the
present disclosure. As will be described, the present disclosure
provides an iterative snowball sampling that allows for the
accurate classification of unstructured imaging data without the
need for extensive training datasets that include costly
human-annotated information.
[0046] In particular, an initial seed sample size is selected at
step 310. A training dataset is generated at step 320 with sampled
data, which is used to train the convolutional neural network at
step 330. In some configurations, the convolutional network is a
deep convolutional neural network (DCNN). The trained network
classifies unlabeled data at step 340 with the performance
evaluated at step 350. If the desired performance is achieved, then
the process may end. For example, at step 350, the system may
evaluate whether the network's ability to identify a feature in an
image exceeds a defined threshold of speed, classification
accuracy, reproducibility, efficacy, or other performance
metric.
[0047] If a desired level of performance is not achieved, then the
labeled data may be refined at step 360, by determining if a
confidence value for the data is above a certain threshold at step
370. A confidence value may be the same as the desired performance
and use the same metrics, or a confidence value may be a
classification accuracy that describes the percentage of the time
or the frequency with which an image feature or region is
identified correctly. If the confidence value does not exceed the
threshold, then the network may be used to re-classify the data by
repeating the process at step 340. If the confidence value is
exceeded, then a VAE with a model, such as a GMM, may be used at
step 380 to add the data back into the training dataset and repeat
the process from step 320.
[0048] In some configurations, the initial seed annotated data sets
used to train the DCNN may be small. In these cases, the generated
labeled data may contain errors in classification or may be
generally unstructured. To prevent this, steps 340-380 may be
implemented in an architecture that includes the VAE and GMMs and
implements a snowball sampling algorithm to refine the
classification. The VAE represents features of the candidate
annotated data set (having m data size) into latent space with a
Gaussian distribution. The GMMs may then conduct binary clustering
within each class across the annotated candidate datasets. Between
two clusters consisting of mean and variance vectors, a user may
choose the cluster (c*) which is closest to the cluster center (c)
of the selected seed sample. The data set with the size m closest
to the cluster center (c*) may be selected. The VAE extracts
generic features from each cluster and GMM improves clustering
accuracy. This iterative data curation process can increase the
quantity of annotated dataset and improve the quality of dataset.
This may be repeated for each annotated class.
[0049] Specifically, deep learning classification accuracy is
historically dependent on the size of the initial training
datasets. Quantifying the size of a dataset required to achieve a
target accuracy is important when trying to decide the feasibility
of a system. In many cases, the limitation of data size presents us
from developing robust AI algorithms. Learning curve analysis is
one such approach to model classification performance and predict
the sample size needed. The learning curve can be conceptualized as
an inverse power law function. Classification accuracy (y) is
expressed as a function of the training set size (x) and a given
unknown parameter (b=(b.sub.1, b.sub.2, b.sub.3)), expressed as the
following equation
y=f(x;b)=b.sub.1+b.sub.2x.sup.b.sup.3 (1)
where x=[x.sub.1, x.sub.2, . . . , x.sub.N].sup.T, y=[y.sub.1,
y.sub.2, . . . , y.sub.N].sup.T, b=[b.sub.1, b.sub.2,
b.sub.3].sup.T, and N is the number of classes; b.sub.1, b.sub.2,
and b.sub.3 represent the bias, learning rate, and decay rate,
respectively. The model fit assumes that the classification
accuracy (y) grows asymptotically to b.sub.1, the maximum
achievable classification performance. With the observed
classification accuracy at six different sizes of training sets (5,
10, 20, 50, 100, and 200), unknown parameters (b=[b.sub.1, b.sub.2,
b.sub.3].sup.T) may be estimated using weighted nonlinear
regression,
E ( b ) = p = 1 m w p ( t p - y p ) 2 = p = 1 m w p ( t p - f ( x p
, y p ) ) 2 = p = 1 m w p r p ( b ) 2 = RWR ( 2 ) ##EQU00001##
where t.sub.p is the desired output when the input is x.sub.p;
y.sub.p=f(x.sub.p; b) is the model's output when input is x.sub.p;
r.sub.p(b) is the residual between t.sub.p and y.sub.p; and R is
the matrix form. The weight terms w.sub.p in the diagonal matrix W
can be determined by applications. In some settings, the weighted
nonlinear least-squares estimator may be more appropriate than a
regular nonlinear regression method to fit the learning curve when
measurement errors do not all have the same variance.
[0050] Classification accuracy using relatively large sizes of
training sets (such as 100 and 200) may have a lower variance than
when using smaller sample sizes (such as 5, 10, 20, and 50). The
learning curve may therefore be fitted by higher weighting values
at the points of larger data set sizes. For example, the weights
may be chosen as w.sub.p=[1, 1, 1, 1, 100, 150] for a large dataset
with a learning curve, but may be w.sub.p=[1, 1, 1, 1, 1, 1] for an
unweighted nonlinear least-squares estimator.
[0051] In one configuration, an autoencoder can be created that has
two complementary networks consisting of an encoder and a decoder.
The encoder has a multilayer perceptron neural network allowing it
to map input x to a latent representation z, and the decoder maps
the latent variable z back to a reconstructed input value
{circumflex over (x)}:
z.about.f(x)=q.sub..phi.(z|x)
{circumflex over (x)}.about.g(z)=p.sub..theta.(x|z) (3)
where the tunable parameters .phi. as encoder and .theta. as
decoder of artificial neural networks are optimized for below the
variational lower bound, L(.theta., .phi., x):
L(.theta.,.phi.,x)=E.sub.q.sub..phi..sub.(z|x)=[log
p.sub..theta.(x|z)]-D.sub.KL[q.sub..phi.(z|x).parallel.p.sub..theta.(z)]
(4).
[0052] The objective of this cost function is to minimize both the
generative and the latent losses. Generative loss describes how
accurately the decoder network reconstructs images ({circumflex
over (x)}) from a latent vector z, and latent loss is derived from
q.sub..phi.(z|x) so that D.sub.KL[ . . . ] is close to zero. One
difference between a typical autoencoder (also called vanilla
autoencoder) and a variational autoencoder (VAE) is that
variational autoencoders generate latent vectors approximating a
unit Gaussian distribution (i.e. z.about.N(0,I)), whereas vanilla
autoencoders generate deterministic latent variables z.
[0053] In some configurations, the iterative snowball sampling
process for automatic labeling of training data may also be
expressed as:
[0054] i.rarw.0
[0055] x.sub.s,.rarw.Select training sample (x) with initial
labeled seed size (s)
[0056] if i=0: x.sub.m.sup.i.rarw.x.sub.s
[0057] else: x.sub.m.sup.i.rarw.Classify unlabeled training set (M)
by d(x.sub.m.sup.i) into m labeled candidate set
repeat
[0058] d(x.sub.m.sup.i).rarw.Train deep convolutional neural
network with x.sub.m.sup.i
[0059] f(x.sub.m.sup.i), g(z).rarw.Train encoder and decoder of
VAE
[0060] z.about.f(x.sub.m.sup.i).rarw.Feature representation on
latent space of encoder
[0061] c.rarw.Binary clustering using GMMs
[0062] c*.rarw.Select cluster center closest distance to seed data
(x.sub.s)
[0063] x.sub.m.sup.i.rarw.Select m data set size closest to c*
[0064] m=m+s.rarw.Add new label sets into initial seed
[0065] i.rarw.i+1
[0066] until M.rarw.m
[0067] In some configurations, the system learns features of input
images using a VAE as an unsupervised learning representation,
clusters the features by Gaussian mixtures models, and annotates
the images by refining candidates pre-labeled or preclassified from
the deep convolutional neural network as supervised learning. The
DCNN may be trained for body region classification using a small
seed training dataset to create larger annotated training datasets
by snowball iterative sampling, leading to higher final accuracy.
The system may be used to classify images for any part of the human
anatomy, and may be used to classify images beyond restricted
regions.
Example of Data Set Annotation
[0068] In one example of the method and system, experiments were
conducted using six different training seed sizes (5, 10, 20, 50,
100, 200/class) on whole body CT images. The fine-tuned DCNN model
with snowball sampling was compared with other two common learning
methods, the DCNN model trained from scratch and the fine-tuned
DCNN model with transfer learning method, to evaluate
classification performances. It was observed that the method gave
comparable accuracy results (98.79%) from only 100 labeled seed
size with the accuracy from 1,000 seed size by the fine-tuned DCNN
(98.71%). In the results of this example, the automatic labeling
method contributes to saving 90% of labeling efforts in body part
classification while preserving the high accuracy.
[0069] A database of CT images was compiled from the clinical PACS
at a quaternary referral hospital. Preprocessing software was
developed to annotate and categorize these images into 6 different
body regions: brain, neck, shoulder, chest, abdomen, and pelvis.
Only images that could be clearly defined as one of the
aforementioned body regions were used. The intervening areas were
excluded from training due to their lack of clear regional
definition. Each CT examination has a different noise level because
of varying radiation dosages, image reconstruction filters, and CT
vendors. Image voxels may also have varying pitches because of the
differences in the image reconstruction fields. Image slice
thickness was thicker than axial voxel pitch, so voxels are
anisotropic.
[0070] Four training datasets with varying members per class were
prepared-unlabeled (M=5000/class), test (1000/class), validation
(1000/class), and initial seed data (s=5, 10, 20, 50, 100, and
200/class). The initial seed datasets represent the number of
labeled data used to train the DCNNs the first time. An aim was to
define the minimum number of cases per class required to annotate
larger data sets with comparable accuracy to results from manually
labeled conventional training datasets.
[0071] Any of a variety of D CNN may be used. In the present
example, GoogLeNet was selected as it is an efficient, highly
performing DCNN. Testing was performed using the NVIDIA Deep
Learning GPU Training System (DIGITS) on a DevBox to train the
model using each experimental dataset. GoogLeNet uses 22
convolutional layers including 9 Inception modules and 4 different
kernel filters (7.times.7, 5.times.5, 3.times.3, and 1.times.1).
The convolutional filters were trained using a stochastic gradient
descent (SGD) algorithm with 0.001 of the base learning rate
decreased by three steps based on a stable convergence of loss
function. Comparison of the effect of transfer learning on the
snowball sampling method was made by training one instance of the
DCNN from scratch with random weight initialization and another
instance with a preloaded, fine-tuned ImageNet pre-trained model.
The snowball sampling procedure was iterated 10 times with each
initial seed sample size so that the DCNNs were trained and tested
a total of 60 times. During each training step, the validation sets
(1000/class) were evaluated and the trained GoogLeNet model with
the highest accuracy in the third step of learning rate decay was
selected.
[0072] Referring to FIG. 4, at each snowball sampling iteration,
the customized VAE was constructed to represent the features of the
selected samples. In the illustrated, non-limiting example, the VAE
contains four convolutional and deconvolutional layers functioning
as encoders and decoders, respectively. One skilled in the art will
appreciate that other examples may use more or fewer convolutional
or deconvolutional layers, and that any number of layers may be
used. Each convolutional layer in the current example had 64 kernel
filters (3.times.3 size) followed by max pooling (2.times.2). Input
images (downscaled to 64.times.64 from 512.times.512 for
computational efficiency) were compressed to 128-dimensional
feature spaces in a Gaussian distribution, ultimately
reconstructing the input image using deconvolutional layers and
up-sampling layers. The convolutional VAE was implemented using the
Keras deep learning library running on a TensorFlow backend. After
training the VAE, only the encoder was used as a feature
representation, feeding the features into the inputs of Gaussian
mixture models (GMMs). Two clusters each having 128-dimensional
Gaussian distributions were generated for each snowball iteration.
The cluster (c*) which had the closest distance to the cluster
center of the Gaussian distribution of the selected seed sample was
selected.
[0073] Unlabeled 5,000/class datasets were initially annotated by
the DCNNs that were trained with labeled seed data (e.g. 5 examples
per class). The labeled candidates were refined by binary
clustering using GMMs and finally 500/class (m=500) at each
snowball iteration. Through ten iterations, all unlabeled data sets
were labeled. This automatic labeling procedure was conducted
according to six different seed sizes (5, 10, 20, 50, 100, and
200/class). During each iteration, annotated image data was added
to the next training data pool so that classification accuracy
increased gradually. Mislabeled classes were significantly reduced
after refining the training. The size of the initial seed
influences the overall classification performance, with diminishing
returns after 50 cases per class. Each experiment was repeated 10
times by randomly selecting seed samples from labeled training
datasets. The trained model was then tested by introducing 1,000
new images of each body class. A total of 6,000 images were used
for the performance evaluation in the present example.
[0074] For all defined body parts, classification accuracy was at
or near 100%. Although the system was not trained on images of
transition regions, it was able to infer these areas with
considerable accuracy. The network was able to extract and identify
similar features at the level of the Inception module despite wide
ranges of normal variation in the same anatomic region.
[0075] Referring to FIG. 5, one non-limiting configuration for an
autoencoder is shown, where generative loss
.parallel.(x-{circumflex over (x)}).parallel..sup.2 510 and latent
loss 520 are controlled and, in some configurations,
D.sub.KL[N(.mu.(x), .sigma.(x)).parallel.N(0, I)] minimized. With
equations 3 and 4, generative loss describes how accurately the
decoder network g(z) 560 reconstructs images ({circumflex over
(x)}) 540 from input x 530 and encoder network f(x) 500 through a
latent vector z 550, where latent loss is derived from
q.sub..phi.(z|x) so that D.sub.KL[ . . . ] latent loss 520 is close
to zero. Even though the composed neural network has many unknown
weights to estimate, the simple cascade structure of multilayer
neural network makes it possible to improve the accuracy by
iteration. The input can call forward function and calculate loss
function. Then the prediction errors are backpropagated to improve
system performance.
[0076] Referring to FIGS. 6A and 6B, an example of body part
classification created by the above-described systems and processes
is shown. FIG. 6A depicts a whole-body CT image 600, such as may be
acquired and provided to the above-described systems. Using the
above-described techniques, body part regions 610 can be classified
and labeled. Labeling of the data may be performed with a neural
network, and refining of the dataset used to train the neural
network may be as described above. Then, as illustrated in FIG. 6B,
axial CT images corresponding to the body part regions 610 can be
selected and labeled. As examples, a brain region 620 has a
corresponding axial image 625, a neck 630 has axial image 635, a
shoulder 640 has axial image 645, a chest 650 has axial image 655,
an abdomen 660 has axial image 665, and a pelvis 670 has axial
image 675. Any number of regions 610 may be identified for a
subject, and any number of corresponding axial images may be
used.
[0077] Referring to FIG. 7, a scatter plot of 2D latent space is
shown where each cluster of data represents different body regions.
For the example data shown in FIG. 7, 128-dimensional latent
representations of 6000 cases classified by the convolutional VAE
using 200 cases per class were visualized and resulted in 6 body
region clusters with areas of overlap. This form of scatter plot
display may be used to aid an automated routine in identifying what
data corresponds to what body region by accounting for data
clustering.
[0078] Referring to FIG. 8, an example of fine-tuned DCNN
classification accuracy and number of mislabeled classes during ten
snowball sampling iterations before and after a refining process by
GMMs with six different initial seed sizes are shown. Varying
training datasets were annotated using ten snowball iterations from
varying initial seeds (5, 10, 20, 50, 100, and 200). During each
iteration, annotated image data was added to the next training data
pool so that classification accuracy increases gradually, as can be
seen in FIG. 8.
[0079] Referring to FIGS. 9A and 9B, examples of classification
accuracy are plotted with respect to size of training data or
class. FIG. 9A reflects how the fine-tuned DCNN with transfer
learning performs better than the DCNN trained from scratch with
random weight initialization. Classification accuracy increased
rapidly from seed sizes 5 to 50, while accuracy did not increase
significantly from seed size 100 to 200. At this point, the
learning curve reached a steady state and did not significantly
change in accuracy regardless of the seed size. The learning curve
predicted 98% classification accuracy with the observed accuracy at
97.25%. FIG. 9A depicts an example learning curve fit to
classification accuracy. FIG. 9B depicts an example of
classification accuracy by fine-tuned DCNN model without and with
the addition of the snowball sampling iteration.
Example of Organ Classifier
[0080] There are increasing concerns about radiation exposure risk
due to rising computed tomography (CT) exams in medicine. To
measure dose from CT procedures, various CT dosimetry metrics have
been introduced. The computed tomography dose index (CTDI) and its
derivatives, such as volume CT dose index (CTDIvol), is a primary
metric that is measured by polymethyl methacrylate (PMMA) standard
phantoms with either 16 cm or 32 cm in diameter. However, CTDIvol
does not describe the actual dose patient receives in respect to
the different weight, body shapes, and sizes, and also does not
provide organ dose. To estimate organ dose of individual patients,
Monte Carlo simulations have been conducted on phantom models using
mathematical description or image voxels, such as the Imaging
Performance Assessment of CT scanner (ImPACT) CT patient dosimetry.
However, these all organ dose estimate methods do not provide organ
specific dose specific to the organ size and shape. In one
configuration, a method is provided for machine learning powered
personalized patient organ dose, which may take the form of
software to estimate each patient's unique organ size and shape.
The method can be used to enable organ detection, segmentation, and
volume estimation (lungs, liver, kidneys, urinary bladder, muscles,
and the like), which may then be used to control or optimize the
level of radiation dose to the patients.
[0081] Dedicated patient organ dose reports are an important part
of modern radiation safety. Current organ dose estimation
techniques use Monte Carlo simulations based on phantoms and
mathematical description or image voxels. Considering an individual
patient's variance in organ position, orientation, and shape, it is
often challenging to map a given CT slice to the slab number of a
phantom model for accurate organ dose calculation.
[0082] CTDI.sub.vol may be measured to indicate the CT scanner
output. Generally, the CTDI.sub.vol measurement is conducted by
imaging a 16 cm diameter phantom for head and a 32 cm diameter
phantom for body in a given patient CT scan. It is measured by
following standard protocols. The CTDI.sub.vol may be denoted
as:
C T D I v o l = C T D I w pitch C T D I w = 1 3 C T D I 1 0 0 c e n
t e r + 2 3 C T D I 1 0 0 periphery CT DI 1 0 0 = 1 n T .intg. - 5
0 mm + 5 0 mm D ( z ) d z ( 5 ) ##EQU00002##
where n is the number of tomographic sections imaged in a single
axial scan, the number of data channels, T is the width of the
tomographic section along the z-axis imaged by one data channel,
pitch is the ratio of the table feed per rotation.
[0083] In some configurations, CTDI.sub.vol may be provided by
commercial CT scanner manufactures. In one example, the value from
a GE LightSpeed VCT scanner is used to estimate organ dose. The
corresponding scan parameters have 120 kVp tube voltage, 0.98 mm
pitch, 0.5 second(s) rotation time, and 40 mm collimation. At given
scan parameters, the normalized CTDI.sub.w (denoted by
.sub.nCTDI.sub.w) of ImPACT CT dosimetry calculator was 9.5
(mGy/100 mAs) so that CTDI.sub.w is calculated by
.sub.nCTDI.sub.w*mA*s/100 and finally CTDI.sub.vol is determined by
dividing CTDI.sub.w by the pitch in the above equation.
[0084] Referring to FIGS. 10A and 10B, a patient effective diameter
may be measured by automatic calculation based on reconstructed
axial CT images. FIG. 10B shows a CT image of a patient cross
section. FIG. 10A shows the effective diameter, which is defined as
the diameter of the circle whose area is the same as that of the
patient cross section, assuming patient has elliptical cross
sections as indicated in FIGS. 10A and 10B.
Effective_Diameter= {square root over (AP.times.LAT)} (6)
where the anterior posterior (AP) dimension 1010 represents the
thickness of the body part of the patient and lateral (LAT)
dimension 1020 represents the side-to-side dimension of the body
part being scanned, respectively. In some configurations, binary
morphologic image techniques may be used, such as image dilation
and erosion, for estimation of circle of equal area to the patient
diameter.
[0085] The ImPACT CT dosimetry calculator uses the normalized organ
conversion factors, obtained from a Monte Carlo simulation to a
mathematical phantom. It has limitation on the calculation of the
organ dose of the actual patient, who has different weight and size
of body as well as unique organ shape and volumes. In one
configuration, to estimate patient organ dose for the various
patient weight, a correction factor (CF) may be used at each organ
using patient clinical data provided by two different
manufacturers. The CF is calculated as:
C F o r g a n = D T , RD n D T , IM n ( 7 ) ##EQU00003##
where D.sub.T,RD.sup.n and D.sub.T,IM.sup.n is the organ dose
normalized by CTDI.sub.vol, which may be provided by the vendor as
described previously, such as from eXposure organ dose software of
Radimetrics (RD) and ImPACT (IM), respectively.
[0086] Referring to FIG. 11, a flowchart is provided that sets
forth some example steps for one configuration of a personalized
organ dose estimation (PODE) method. An automated program may be
used to extract CT dose information from DICOM metadata and image
data at step 1110. The DICOM dose report may be retrieved from a
PACS, for example. The report may include CTDI.sub.vol, dose-length
product (DLP), tube current, tube voltage, exposure time, and
collimation. The extracted scanner information and scan parameters
at step 1120 along with dose-relevant indexes of CT examinations
and using machine learning to classify a body part at step 1130 may
be used to calculate the organ dose for ImPACT CT dosimetry at step
1140. The DICOM data may be converted to an image at step 1150. The
DICOM image may also be written to a standard 8-bit gray image
format such as PNG or other formats. A patient effective diameter
is calculated at step 1160, and a correction factor as discussed
above may be calculated at step 1170. A course-tuning for organ
dose estimation may be performed at step 1180. The converted images
through scan ranges may be fed to the inputs of a machine learning
network, such as a deep convolutional neural net, for use in
identifying and segmenting patient organs at step 1190. Organ dose
estimates may then be fine-tuned at step 1195 once the organ has
been identified and properly segmented by attributing the dose more
specifically to the appropriate organs.
[0087] In one example, based on the extracted scan parameters, the
ImPACT dosimetry calculated 23 organ doses of each patient. The
estimated organ dose by ImPACT were corrected by the correction
factor (CF) based on a regression model, representing correlation
between the ratio of normalized organ dose and patient effective
diameter. After the correction of organ dose by considering the
patient weights, finally the PODE may be fine-tuned by organ volume
and shape through an organ segmentation step.
[0088] In one example where each image slice was classified as one
of 16 organs, the method may automatically identify which patient
organs were included in that scan region for the ImPACT dosimetry
calculation. These 16 different organs were identified from axial
views of CT images and were labeled. A 22-layer deep CNN using an
NVIDIA Deep Learning GPU Training System (DIGITS) was trained and
validated with a 646 CT scan dataset. The resultant classified
organ was automatically mapped to the slab number of a mathematical
hermaphrodite phantom to determine the scan range of ImPACT CT dose
calculator.
[0089] A dataset of 12,748 CT images of 63 patients was compiled
from the clinical PACS (Picture Archiving and Communication
System). Preprocessing software was developed to annotate and
categorize these images into 16 different body parts in axial
views: Brain; Eye Lens; Nose; Salivary Gland; Thyroid; Upper Lung;
Thymus; Heart; Chest; Abdomen 1; Abdomen 2; Pelvis 1; Pelvis 2;
Urinary Bladder; Genitals; and Leg. Only the scans of regions that
could be clearly defined as one of the aforementioned body parts
were used. This is an optimized organ classification choice for the
organ dose estimation task. The gaps account for transition
regions, which were not used for the training algorithm due to
their lack of clear regional definition. Each scan has different
background image noise because of radiation dosage level, image
reconstruction filter selection, and CT scanner vendors.
[0090] In the present example with 16 organ recognition, a
GoogLeNet network using 22 convolutional layers including 9
inception modules and 4 sizes of basis or kernel filters
(7.times.7, 5.times.5, 3.times.3, and 1.times.1) was used. 75% of
images were used for training and 25% for validation. The GoogLeNet
was trained using the NVIDIA toolchain of DIGITS and the DevBox
with four TITAN GPUs with 7 TFlops of single precision, 336.5 GB/s
of memory bandwidth, and 12 GB of memory. GoogLeNet was trained
using a stochastic gradient descent (SGD) algorithm until 150
training epochs. Validation data sets were presented upon every
epoch during the training process. The initial learning rate was
0.01 and decreased by three steps according to the convergence to
loss function.
[0091] A total of 646 patients were included in this retrospective
study, with a mean age of 66 years old (range 20-95 years). These
patients represented a wide spectrum of body habitus, with a mean
weight of 85.6 kg (range, 45-181 kg). FIG. 2 is a representative
example of classification results of patient organs after chest CT
segmentation. The identified organs were labeled from HEAD
(Thyroid) to TRUNK (Abdomen1), a region including both kidneys and
liver. Based on organ classification, the corresponding scan range
for the ImPACT CT dosimetry calculator was determined. For example,
the identified thyroid region (HEAD5) was mapped to slab number
171/208 of the adult phantom.
[0092] The predicted organ location provided by the deep learning
driven software also gives information about the volume of an organ
in respect to a given scan region. For example, the thyroid (HEAD 5
region) can be identified in slices 10 to 138 of the present
example with 99% accuracy, greatly improving patient-specific
radiation dose estimation.
[0093] In some configurations, the ratio of normalized organ dose
by CTDI.sub.vol according to the patient effective diameter may be
assessed. The identified organs at a given scan region may have a
linear relationship, whereas some organs such as brain, eye lenses,
and salivary glands may not be identified by a convolutional neural
net classifier so that the organ doses are not correlated to the
effective diameter. In the example above, the normalized dose
coefficients were decreased as the effective diameter increased for
all identified organ regions. By assuming linear relationship
(Y=a.sub.1X+a.sub.0) between the normalized dose coefficient and
the effective diameter, a best fit model on each organ may be fit
by the least square estimate (LSE), or by any other appropriate
estimates.
[0094] It will be appreciated by one skilled in the art that the
model may be trained for more organ areas than have been disclosed
in the examples, such as covering all organs used in the CT organ
dose estimator. These organs may include the pancreas, stomach,
gall bladder, and colon, and may facilitate longitudinal
organ-specific dose calculations.
[0095] The present disclosure has described one or more preferred
embodiments, and it should be appreciated that many equivalents,
alternatives, variations, and modifications, aside from those
expressly stated, are possible and within the scope of the
invention.
* * * * *