U.S. patent application number 16/543621 was filed with the patent office on 2020-02-27 for method for verifying training data, training system, and computer program product.
This patent application is currently assigned to HTC Corporation. The applicant listed for this patent is HTC Corporation. Invention is credited to Che-Han Chang, Chih-Yang Chen, Hao-Cheng Kao, Edzer Lienson Wu, Chun-Hsien Yu, Shan-Yi Yu.
Application Number | 20200065706 16/543621 |
Document ID | / |
Family ID | 69584611 |
Filed Date | 2020-02-27 |
![](/patent/app/20200065706/US20200065706A1-20200227-D00000.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00001.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00002.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00003.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00004.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00005.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00006.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00007.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00008.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00009.png)
![](/patent/app/20200065706/US20200065706A1-20200227-D00010.png)
View All Diagrams
United States Patent
Application |
20200065706 |
Kind Code |
A1 |
Kao; Hao-Cheng ; et
al. |
February 27, 2020 |
METHOD FOR VERIFYING TRAINING DATA, TRAINING SYSTEM, AND COMPUTER
PROGRAM PRODUCT
Abstract
The disclosure provides a method for verifying training data, a
training system, and a computer program produce. The method
includes: providing a plurality of raw data to a plurality of
annotators; retrieving a plurality of labelled results, wherein the
labelled results includes a plurality of labelled data, and the
labelled data are generated by the annotators via labelling the raw
data; determining a plurality of consistencies by comparing the
labelled results, and accordingly determining whether the labelled
results are valid for training an artificial intelligence machine;
in response to determining that the labelled results are valid,
determining at least a specific part of the labelled results are
valid for training the artificial intelligence machine.
Inventors: |
Kao; Hao-Cheng; (Taoyuan
City, TW) ; Chen; Chih-Yang; (Taoyuan City, TW)
; Yu; Chun-Hsien; (Taoyuan City, TW) ; Yu;
Shan-Yi; (Taoyuan City, TW) ; Wu; Edzer Lienson;
(Taoyuan City, TW) ; Chang; Che-Han; (Taoyuan
City, TW) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
HTC Corporation |
Taoyuan City |
|
TW |
|
|
Assignee: |
HTC Corporation
Taoyuan City
TW
|
Family ID: |
69584611 |
Appl. No.: |
16/543621 |
Filed: |
August 19, 2019 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62722182 |
Aug 24, 2018 |
|
|
|
62792908 |
Jan 16, 2019 |
|
|
|
62798482 |
Jan 30, 2019 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 11/20 20130101;
G06K 9/6215 20130101; G06T 2207/20081 20130101; G06T 2210/12
20130101; G06K 9/6262 20130101; G06N 20/00 20190101; G06T 11/60
20130101; G06K 9/66 20130101; G06K 9/6227 20130101; G06T 7/70
20170101; G06K 9/6256 20130101; G06N 5/04 20130101 |
International
Class: |
G06N 20/00 20060101
G06N020/00; G06K 9/62 20060101 G06K009/62 |
Claims
1. A method for verifying training data, comprising: providing a
plurality of raw data to a plurality of annotators; retrieving a
plurality of labelled results, wherein the labelled results
comprises a plurality of labelled data, and the labelled data are
generated by the annotators via labelling the raw data; determining
a plurality of consistencies by comparing the labelled results, and
accordingly determining whether the labelled results are valid for
training an artificial intelligence machine; and in response to
determining that the labelled results are valid, determining at
least a specific part of the labelled results are valid for
training the artificial intelligence machine.
2. The method according to claim 1, wherein the labelled results
comprises a first labelled result generated by labelling one of the
raw data with at least one object category chosen by the
annotators, and the step of determining the consistencies by
comparing the labelled results, and accordingly determining whether
the labelled results are valid for training an artificial
intelligence machine comprises: generating a recommend result for
the first labelled result by comparing the first labelled data in
the first labelled result of the annotators; determining a first
consistency score of each annotator on the first labelled result by
comparing each labelled data with the recommend result; determining
a second consistency score of the first labelled result based on
the first consistency score of each annotator; and in response to
determining that the second consistency score of the first labelled
result is higher than a consistency score threshold, determining
that the first labelled result is valid for training the artificial
intelligence machine.
3. The method according to claim 2, wherein the step of generating
the recommend result for the first labelled result by comparing the
first labelled data in the first labelled result of the annotators
comprises: determining a specific object category of the at least
one object category as the recommend result, wherein the specific
object category has a highest number in the first labelled
data.
4. The method according to claim 2, wherein the annotators
comprises a first annotator, the first labelled result comprises a
specific labelled data labelled by the first annotator, and the
step of determining the first consistency score of each annotator
on the first labelled result by comparing each labelled data with
the recommend result comprises: in response to determining that the
specific labelled data is identical to the recommend result,
determining the first consistency score of the first annotator to
be 1; in response to determining that the specific labelled data is
different from the recommend result, determining the first
consistency score of the first annotator to be 0.
5. The method according to claim 2, wherein the step of determining
the second consistency score of the first labelled result based on
the first consistency score of each annotator comprises:
determining an average of the first consistency score of each
annotator as the second consistency score.
6. The method according to claim 1, wherein the annotators comprise
a first annotator and each of the labelled results comprises a
specific labelled data labelled by the first annotator, and the
method further comprises: determining a first consistency score of
the first annotator on each of the labelled results by comparing
the specific labelled data and a recommend result of each labelled
result; determining a third consistency score of the first
annotator based on the first consistency score of the first
annotator on each of the labelled results; in response to
determining that the third consistency score of the first annotator
on each of the labelled results is higher than an annotator score
threshold, determining that the first annotator is reliable for
labelling the raw data or the labelled data labelled by the first
annotator in the labelled results is valid for training the
artificial intelligence machine.
7. The method according to claim 6, wherein the step of determining
the third consistency score of the first annotator based on the
first consistency score of the first annotator on each of the
labelled results comprises: determining an average of the first
consistency score of the first annotator on each of the labelled
results as the third consistency score.
8. The method according to claim 1, wherein the raw data comprises
a first raw data and a second raw data identical to the first raw
data, and a first intra-annotator consistency of the consistencies
is proportional to a first consistency of the first annotator
labelling the first raw data and the second raw data.
9. The method according to claim 1, wherein the labelled data are
generated by labelling at least one region of interest with at
least one bounding region in the raw data; wherein for a first raw
data of the raw data, the first raw data is labelled with at least
one bounding region to generate a first labelled result of the
labelled results.
10. The method according to claim 9, wherein the first labelled
result has at least one tag corresponding to at least one object
category, and the step of determining the consistencies by
comparing the labelled results, and accordingly determining whether
the labelled results are valid for training the artificial
intelligence machine comprises: for the first labelled result,
identifying at least one target object of each object category,
wherein each target object is labelled by at least two of the
annotators; determining a consistency score of the first labelled
result based on the at least one target object of each object
category and the at least one bounding region labelled by the
annotators; and in response to the consistency score of the first
labelled result is higher than a score threshold, determining that
the first labelled result is valid for training the artificial
intelligence machine.
11. The method according to claim 10, wherein the annotators
comprise a first annotator and a second annotator, the first
annotator labels the first raw data with at least one first
bounding region corresponding to a first object category, the
second annotator labels the first raw data with at least one second
bounding region corresponding to the first object category, and the
step of identifying the at least one target object of each object
category comprises: determining a plurality of region pairs,
wherein each region pair comprises one of the at least one first
bounding region and one of the at least one second bounding region;
determining a plurality of correlation coefficients that
respectively corresponds to the region pairs, wherein each
correlation coefficients characterizes a similarity of one of the
region pairs; merging the at least one first bounding region and
the at least one second bounding region into a plurality of groups
based on the correlation coefficients that are higher than a
correlation threshold, wherein each group at least comprises one of
the at least one first bounding region and one of the at least one
second bounding region; for each group, generating a reference
region for identifying one of the at least one first target object
based on the at least one first bounding region and the at least
one second bounding region in each group.
12. The method according to claim 9, wherein the step of merging
the at least one first bounding region and the at least one second
bounding region into the second groups based on the correlation
coefficients that are higher than the correlation threshold
comprises: (a) retrieving a specific correlation coefficient,
wherein the specific correlation coefficient is highest among the
correlation coefficients that are higher than the correlation
threshold; (b) retrieving a specific region pair corresponding to
the specific correlation coefficient from the region pairs, wherein
the specific region pair comprises a first specific region and a
second specific region; (c) determining whether one of the first
specific region and the second specific region belongs to an
existing group; (d) in response to determining that neither of the
first specific region or the second specific region belongs to the
existing group, creating a new group based on the first specific
region and the second specific region; (e) in response to
determining that one of the first specific region and the second
specific region belongs to the existing group, determining whether
another of the first specific region and the second specific region
corresponds to the same annotator with a member of the existing
group; (f) in response to determining that the another of the first
specific region and the second specific region does not correspond
to the same annotator with the member of the existing group, adding
the another of the first specific region and the second specific
region into the existing group; (g) excluding the specific
correlation coefficient from the correlation coefficients and
excluding the specific region pair from the region pairs; and (h)
in response to determining that the region pairs are not empty,
returning to step (a).
13. The method according to claim 9, wherein the step of
determining the consistency score of the first labelled result
based on the at least one target object of each object category and
the at least one bounding region labelled by the annotators
comprises: for each annotator, calculating at least one first
consistency score of the at least one object category based on the
at least one target object of each object category and the at least
one bounding region, and taking an average of the at least one
first consistency score to obtain a second consistency score; and
taking an average of the second consistency score of each annotator
to obtain the consistency score of the first labelled result.
14. The method according to claim 9, further comprising: for a
certain annotator, calculating a first consistency score of the at
least one object category based on the at least one target object
of each object category and the at least one bounding region;
determining whether the certain annotator is reliable for labelling
or whether to exclude the labelled data labelled by the certain
annotator from training the artificial intelligence machine based
on the first consistency score; in response to determining that the
first consistency score is lower than a certain threshold,
determining that the certain annotator is unreliable for labelling
or excluding the labelled data labelled by the certain annotator
from training the artificial intelligence machine; and in response
to determining that the first consistency score is not lower than
the certain threshold, determining that the certain annotator is
reliable for labelling or not excluding the labelled data labelled
by the certain annotator from training the artificial intelligence
machine.
15. The method according to claim 13, wherein the step of
calculating the at least one first consistency score of the at
least one object category based on the at least one target object
of each object category and the at least one bounding region
comprises: for a first annotator, retrieving a first number,
wherein the first number characterizes a number of the at least one
bounding region of the first annotator that matches at least one
identified target object of a first object category; retrieving a
second number, wherein the second number is a sum of the first
number, a third number, and a fourth number, wherein the third
number is a number of the at least one identified target object
that does not match any of the at least one bounding region of the
first annotator, and the fourth number is a number of the at least
one bounding region of the first annotator that does not match any
of the at least one identified target object; and dividing the
first number with the second number to obtain the first consistency
score of the first annotator on the first object category.
16. The method according to claim 9, wherein in response to a first
annotator does not label any bounding region in the first raw data,
generating at least one virtual bounding region outside of the
first raw data, wherein each virtual bounding region corresponds to
one of the object categories.
17. The method according to claim 16, wherein in response to
determining that no target object exists in the first raw data,
determining a first consistency score of the first annotator on
each object category to be 1.
18. The method according to claim 1, wherein in response to
determining that the labelled results are invalid for training the
artificial intelligence machine, the method further comprises
creating a notification related to the labelled results; wherein
the annotators comprise a first annotator, a second annotator, and
a third annotator, the consistencies comprise a first
inter-annotator consistency between the first annotator and the
second annotator, a second inter-annotator consistency between the
first annotator and the third annotator, and a third
inter-annotator consistency between the second annotator and the
third annotator, and after the step of creating the notification
related to the labelled results, further comprises: in response to
determining that the first inter-annotator consistency is higher
than a first threshold, feeding a consistently labelled data
between a first labelled data set and a second labelled data set to
the artificial machine, wherein the first labelled data set
comprises a plurality of first labelled data labelled in the
labelled results by the first annotator, and the second labelled
data set comprises a plurality of second labelled data labelled in
the labelled results by the second annotator.
19. The method according to claim 1, further comprising: training
the artificial intelligence machine with the specific part of the
labelled results to generate an artificial intelligence model.
20. A training system, comprising: a storage circuit, storing a
plurality of modules; a processor, coupled to the storage circuit
and accessing the modules to perform following steps: providing a
plurality of raw data to a plurality of annotators; retrieving a
plurality of labelled results, wherein the labelled results
comprises a plurality of labelled data, and the labelled data are
generated by the annotators via labelling the raw data; determining
a plurality of consistencies by comparing the labelled results, and
accordingly determining whether the labelled results are valid for
training an artificial intelligence machine; in response to
determining that the labelled results are valid, determining at
least a specific part of the labelled results are valid for
training the artificial intelligence machine.
21. A computer program product for use in conjunction with a
training system, the computer program product comprising a computer
readable storage medium and an executable computer program
mechanism embedded therein, the executable computer program
mechanism comprising instructions for: providing a plurality of raw
data to a plurality of annotators; retrieving a plurality of
labelled results, wherein the labelled results comprises a
plurality of labelled data, and the labelled data are generated by
the annotators via labelling the raw data; determining a plurality
of consistencies by comparing the labelled results, and accordingly
determining whether the labelled results are valid for training an
artificial intelligence machine; in response to determining that
the labelled results are valid, determining at least a specific
part of the labelled results are valid for training the artificial
intelligence machine.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the priority benefit of U.S.
provisional application Ser. No. 62/722,182, filed on Aug. 24,
2018, U.S. provisional application Ser. No. 62/792,908, filed on
Jan. 16, 2019, and U.S. provisional application Ser. No.
62/798,482, filed on Jan. 30, 2019. The entirety of each of the
above-mentioned patent applications is hereby incorporated by
reference herein and made a part of this specification.
BACKGROUND OF THE INVENTION
1. Field of the Invention
[0002] The present invention generally relates to the training
mechanism of artificial intelligence, in particular, to a method
for verifying training data, training system, and computer program
product.
2. Description of Related Art
[0003] In the field of artificial intelligence (AI), the quality of
the training data used to train the AI machine plays an important
role. If the training data is accurately labelled, the AI machine
may better learn from the training data, and hence the accuracy of
the generated AI model may be correspondingly improved. However, if
the training data is inaccurately labelled, the learning process of
the AI machine would be sabotaged, and hence the performance of the
generated AI model may be correspondingly degraded.
[0004] Therefore, it is crucial for the people in the art to design
a mechanism for determining whether the training data are good
enough to be used to train the AI machine.
SUMMARY OF THE INVENTION
[0005] Accordingly, the present disclosure provides a method for
verifying training data, including: providing a plurality of raw
data to a plurality of annotators; retrieving a plurality of
labelled results, wherein the labelled results includes a plurality
of labelled data, and the labelled data are generated by the
annotators via labelling the raw data; determining a plurality of
consistencies by comparing the labelled results, and accordingly
determining whether the labelled results are valid for training an
artificial intelligence machine; in response to determining that
the labelled results are valid, determining at least a specific
part of the labelled results are valid for training the artificial
intelligence machine.
[0006] The present disclosure provides a training system including
a storage circuit and a processor. The storage circuit stores a
plurality of modules. The processor is coupled to the storage
circuit and accessing the modules to perform following steps:
providing a plurality of raw data to a plurality of annotators;
retrieving a plurality of labelled results, wherein a first
labelled result of the labelled results includes a plurality of
labelled data, and the labelled data are generated by the
annotators via labelling one of the raw data; determining a
plurality of consistencies based on the labelled results, and
accordingly determining whether the labelled results are valid for
training an artificial intelligence machine; in response to
determining that the labelled results are valid, saving at least a
specific part of the labelled results as a database for training
the artificial intelligence machine.
[0007] The present disclosure provides a computer program product
for use in conjunction with a training system, the computer program
product including a computer readable storage medium and an
executable computer program mechanism embedded therein, the
executable computer program mechanism including instructions for:
providing a plurality of raw data to a plurality of annotators;
retrieving a plurality of labelled results, wherein a first
labelled result of the labelled results includes a plurality of
labelled data, and the labelled data are generated by the
annotators via labelling one of the raw data; determining a
plurality of consistencies based on the labelled results, and
accordingly determining whether the labelled results are valid for
training an artificial intelligence machine; in response to
determining that the labelled results are valid, saving at least a
specific part of the labelled results as a database for training
the artificial intelligence machine.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The accompanying drawings are included to provide a further
understanding of the invention, and are incorporated in and
constitute a part of this specification. The drawings illustrate
embodiments of the invention and, together with the description,
serve to explain the principles of the invention.
[0009] FIG. 1A is a functional diagram of a training system for
verifying training data according to an embodiment of the
disclosure.
[0010] FIG. 1B shows a schematic view of a training system for
verifying training data according to another embodiment of the
disclosure.
[0011] FIG. 2 illustrates the method for verifying training data
according to an embodiment of the disclosure.
[0012] FIG. 3 is a schematic view of labelled results created by
annotators according to an embodiment of the disclosure.
[0013] FIG. 4A shows a schematic view of an annotator with stable
labelling performance according to an embodiment of the
disclosure.
[0014] FIG. 4B shows a schematic view of an annotator with unstable
labelling performance according to an embodiment of the
disclosure.
[0015] FIG. 5 shows a mechanism of obtaining the similarity of
ROIs.
[0016] FIG. 6 is a flow chart of the method for verifying the
annotators according to an exemplary embodiment of the
disclosure.
[0017] FIG. 7 is a schematic view of comparing the labelled results
of the annotators with correct answers according to an embodiment
of the disclosure.
[0018] FIG. 8 shows a conventional way of labelling the data for
training AI machines to generate the AI model.
[0019] FIG. 9 is a flow chart of the method for labelling raw data
based on pre-labelled data according to one embodiment of the
disclosure.
[0020] FIG. 10A and FIG. 10B are schematic views of implementing
the method of FIG. 9.
[0021] FIG. 11 illustrates a labelled result according to an
embodiment of the disclosure.
[0022] FIG. 12 shows a mechanism of determining whether the first
labelled result is valid for training an AI machine.
[0023] FIG. 13A shows the bounding regions labelled by two of the
annotators according to FIG. 11.
[0024] FIG. 13B shows all of the region pairs whose correlation
coefficients are higher than the correlation threshold according to
FIG. 11 and FIG. 13A.
[0025] FIG. 13C shows the mechanism of merging the bounding regions
into the groups.
[0026] FIG. 13D shows the mechanism of merging the bounding regions
into the groups according to FIG. 13B.
[0027] FIG. 14 shows the obtained reference regions of each group
according to FIG. 13D.
[0028] FIG. 15 shows all bounding regions according to FIG. 11 and
FIG. 14.
[0029] FIG. 16 shows a schematic view of handling a situation of no
target object according to an embodiment of the disclosure.
[0030] FIG. 17 shows a schematic view of handling a situation of no
target object according to another embodiment of the
disclosure.
[0031] FIG. 18 shows a schematic view of handling a situation of no
target object according to yet another embodiment of the
disclosure.
DESCRIPTION OF THE EMBODIMENTS
[0032] Reference will now be made in detail to the present
preferred embodiments of the invention, examples of which are
illustrated in the accompanying drawings. Wherever possible, the
same reference numbers are used in the drawings and the description
to refer to the same or like parts.
[0033] See FIG. 1A, which is a functional diagram of a training
system for verifying training data according to an embodiment of
the disclosure. In various embodiments, the training system 100 may
be implemented as, for example, a smart phone, a personal computer
(PC), a notebook PC, a netbook PC, a tablet PC, or other electronic
device, but the disclosure is not limited thereto. In various
embodiments, the training system 100 may be an artificial
intelligence (AI) platform that provides at least two labelling
tools for annotators to perform labelling works. In the following
descriptions, the mechanism related to the labelling tools may be
referred to as image classification and object detection, wherein
the image classification may exemplarily correspond to FIG. 3, and
the object detection may exemplarily correspond to FIG. 11, but the
disclosure is not limited thereto.
[0034] In the present embodiment, the training system 100 includes
a storage circuit 102 and a processor 104. The storage circuit 102
may be one or a combination of a stationary or mobile random access
memory (RAM), read-only memory (ROM), flash memory, hard disk, or
any other similar device, and which records a plurality of programs
or modules that can be executed by the processor 104.
[0035] The processor 104 may be coupled to the storage circuit 102.
In various embodiments, the processor 104 may be, for example, a
general purpose processor, a special purpose processor, a
conventional processor, a digital signal processor (DSP), a
plurality of microprocessors, one or more microprocessors in
association with a DSP core, a controller, a microcontroller,
Application Specific Integrated Circuits (ASICs), Field
Programmable Gate Array (FPGAs) circuits, any other type of
integrated circuit (IC), a state machine, an ARM-based processor,
and the like.
[0036] The processor 104 may access the programs stored in the
storage circuit 102 to perform the method for verifying training
data of the present disclosure, and the detailed discussions will
be provided hereinafter.
[0037] Roughly speaking, the training system 100 may be used to
train an AI machine based on a plurality of training data after
verifying the training data via the method proposed in the
following, wherein the training data may be implemented by a
plurality of labelled results created by a plurality of annotators
A, B, and C via labelling raw data.
[0038] In some other embodiments, the training system 100 may be
modified to be the aspect as shown in FIG. 1B. See FIG. 1B, a
training system 100a may include a plurality of electronic devices
110a, 110b, 110c, and a server 120. The electronic devices
110a-110c (e.g., personal computers or the like) may be
respectively used for the annotators A, B, C to create the labelled
results by labelling raw data and accordingly transmit the labelled
results of the annotators A, B, and C to the server 120.
Thereafter, the server 120 may use the labelled results from the
electronic devices 110a-110c to train the AI machine after
verifying the labelled results via the method proposed in the
following.
[0039] Since the mechanism performed by the processor 104 in FIG.
1A and the server 120 in FIG. 1B are basically the same, the
discussion in the following will be focused on the operations
performed by the processor 104 for brevity.
[0040] Referring to FIG. 2 and FIG. 3, FIG. 2 illustrates the
method for verifying training data according to an embodiment of
the disclosure, and FIG. 3 is a schematic view of labelled results
created by annotators according to an embodiment of the disclosure.
The method shown in FIG. 2 may be implemented by the training
system 100 of FIG. 1A. The details of each step of FIG. 2 will be
described below with reference to the elements shown in FIG. 1A and
the scenario shown in FIG. 3.
[0041] In step S210, a plurality of raw data R01, R02, . . . , R09,
and R10 may be provided to annotators A, B, and C. In the present
embodiment, each of the raw data R01-R10 may be an image of a cat
or a dog for the annotators A-C to identify, but the disclosure is
not limited thereto. In other embodiments, the raw data may by
other types of images, such as human images, medical images,
etc.
[0042] For example, when the raw data R01 is presented to the
annotator A, the annotator A may recognize the raw data R01 as an
image with a cat, and hence the annotator A may use "C" (which may
be regarded as a labelled data LD011) to label the raw data R01.
For the annotators B and C, since they may also recognize the raw
data R01 as an image with a cat, and hence the may both use "C"
(which may be regarded as labelled data LD012 and LD013) to label
the raw data R01. For another example, when the raw data R10 is
presented to the annotator A, the annotator A may recognize the raw
data R10 as an image with a dog, and hence the annotator A may use
"D" (which may be regarded as a labelled data LD101) to label the
raw data R10. For the annotators B and C, since they may recognize
the raw data R10 as an image with a cat, and hence the may both use
"C" (which may be regarded as labelled data LD102 and LD103) to
label the raw data R10. The meanings of other labelled data in FIG.
3 may be deduced from the above teachings, which would not be
repeated herein.
[0043] For the ease of the following discussions, the labelled data
of the annotators on the same raw data will be collectively
referred as a labelled result, and the labelled data of one
annotator on all of the raw data would be collectively referred as
a labelled data set.
[0044] Under this situation, the labelled data LD011, LD012, and
LD013 of the annotators A, B, C on the raw data R01 may be
collectively referred as a labelled result LR01, and the labelled
data LD101, LD102, and LD103 of the annotators A, B, C on the raw
data R10 may be collectively referred as a labelled result LR10.
The labelled data made by the annotator A on the raw data R01-R10
may be referred as a labelled data set LDS1; the labelled data made
by the annotator B on the raw data R01-R10 may be referred as a
labelled data set LDS2; and the labelled data made by the annotator
C on the raw data R01-R10 may be referred as a labelled data set
LDS3, but the disclosure is not limited thereto.
[0045] In step S220, a plurality of labelled results may be
retrieved. In the present embodiment, the labelled results of the
annotators A-C on labelling the raw data R01-R10, such as the
labelled results LR01 and LR10, may be retrieved after the
annotators A-C finish their labelling tasks on the raw data
R01-R10.
[0046] In one embodiment, the confidence level (CL) of the
annotators A-C on labelling one of the raw data may be further
retrieved. The confidence level may depend on the consistency of
the annotators A-C on labelling one of the raw data. For example,
since all of the annotators A-C label "C" for the raw data R01, the
consistency of the annotators A-C on labelling the raw data R01 is
high. Therefore, the confidence level related to the raw data R01
may be "H" (which stands for high). For another example, since the
annotators A-C inconsistently labelled the raw data R10, the
confidence level related to the raw data R10 may be labelled as "M"
(which stands for moderate). In other embodiments, the confidence
level related to each of the raw data may be used as a reference
while the labelled results are used to train an artificial (AI)
machine, which would be discussed later.
[0047] In step S230, a plurality of consistencies may be determined
based on the labelled results, and whether the labelled results are
valid for training the AI machine may be accordingly determined. If
yes, in step S240, at least a specific part of the labelled results
may be saved as a database for training the AI machine; otherwise,
in step S250, a notification related to the labelled results may be
created.
[0048] In one embodiment, the consistencies of the annotators A-C
may include a first inter-annotator consistency (represented by
R.sub.AB) between the annotators A and B. The first inter-annotator
consistency may be obtained by comparing the labelled data set LDS1
with the labelled data set LDS2 and proportional to a first
consistency between the labelled data set LDS1 and the labelled
data set LDS2. As shown in FIG. 3, since the annotators A and B
consistently labelled 6 of the raw data out of all of the raw data
R01-R10, the first inter-annotator consistency (i.e., R.sub.AB) may
be characterized as 0.6.
[0049] In addition, the consistencies of the annotators A-C may
include a second inter-annotator consistency (represented by
R.sub.AC) between the annotators A and C and a third
inter-annotator consistency (represented by R.sub.BC). The second
inter-annotator consistency may be obtained by comparing the
labelled data set LDS1 with the labelled data set LDS3 and
proportional to a second consistency between the labelled data set
LDS1 and the labelled data set LDS3. As shown in FIG. 3, since the
annotators A and C consistently labelled 8 of the raw data out of
all of the raw data R01-R10, the second inter-annotator consistency
(i.e., R.sub.AC) may be characterized as 0.8. Similarly, the third
inter-annotator consistency may be obtained by comparing the
labelled data set LDS2 with the labelled data set LDS3 and
proportional to a third consistency between the labelled data set
LDS2 and the labelled data set LDS3. As shown in FIG. 3, since the
annotators B and C consistently labelled 8 of the raw data out of
all of the raw data R01-R10, the third inter-annotator consistency
(i.e., R.sub.BC) may be characterized as 0.8.
[0050] In one embodiment, an intra-annotator consistency of each of
the annotators may be further calculated to characterize the
labelling consistency of each of the annotators. Specifically, the
raw data may be modified to include a first raw data and a second
raw data identical to the first raw data. In this case, the
intra-annotator consistency of a certain annotator may be
proportional to the consistency of the certain annotator labelling
the first raw data and the second raw data and may be obtained as
the same way of obtaining the inter-annotator consistency. For
example, the intra-annotator consistency may be obtained by the
similarity between the regions of interest (ROI) labelled by the
certain annotator in the first raw data and the second raw data.
The more the ROIs overlapped with each other, the higher the
intra-consistency is. If the intra-annotator consistency of the
certain annotator is high, it represents that the performance of
the certain annotator on labelling the raw data is stable, and vice
versa. See FIGS. 4A and 4B for further discussions.
[0051] FIG. 4A shows a schematic view of a annotator with stable
labelling performance according to an embodiment of the disclosure.
In FIG. 4A, a bunch of raw data 410 are provided to a annotator,
wherein the raw data 410 includes, for example, three identical raw
data 410a for the annotator to perform the labelling task. As shown
in FIG. 4A, since the annotator consistently label the three raw
data as the class "C", the processor 104 may obtain a high
intra-annotator consistency of the annotator after calculation the
intra-class correlation coefficient.
[0052] On the contrary, FIG. 4B shows a schematic view of an
annotator with unstable labelling performance according to an
embodiment of the disclosure. In FIG. 4B, the raw data 410 with
three identical raw data 410a may be provided to another annotator
to perform the labelling task. As shown in FIG. 4B, since the other
annotator inconsistently label the three raw data 410a as different
classes, the processor 104 may obtain a low intra-annotator
consistency of the annotator after calculation the intra-class
correlation coefficient.
[0053] In other embodiments, if the annotators are asked to label
the raw data in different ways, such as labelling a region of
interest (ROI) by labelling a bounding region in each of the raw
data, the way of obtaining/calculating the consistencies may be
correspondingly modified. For example, if the annotators are asked
to label the region of a tumor in a computed tomography (CT) image,
the inter-annotator consistency between any two of the annotators
may be characterized by the similarities of the ROIs labelled by
the two annotators in each of the raw data. In various embodiments,
these similarities may be obtained by algorithms such as Dice
similarity indices/Jaccard index method, Cohen's Kappa method,
Fleiss' Kappa method, Krippendorff's alpha method, and/or the
intraclass correlation coefficient method, but the disclosure is
not limited thereto.
[0054] For further discussion, Dice similarity indices/Jaccard
index method shown in FIG. 5 will be used as an example for
determining the inter-annotator consistency. In FIG. 5, a raw data
500 is provided for the annotators A and B, and the annotators A
and B are asked to label ROIs (e.g., a tumor) in the raw data 500.
The ROIs 510a and 510b are respectively labelled by the annotators
A and B. Based on the principles of Dice similarity indices/Jaccard
index method, the processor 104 may determine the size of the union
520 (represented by P.sub.U pixels) of the ROIs 510a and 510b and
the size of the intersection 530 (represented by P.sub.I pixels) of
the ROIs 510a and 510b. Afterwards, the similarity between the ROIs
510a and 510b may be obtained by P.sub.I/P.sub.U (i.e.,
intersection over union, IoU), but the disclosure is not limited
thereto.
[0055] In one embodiment, if all of the consistencies are higher
than a threshold, the labelled results may be determined to be
valid for training the AI machine and be fed to the AI machine.
Therefore, the AI machine may learn from the labelled results about
how to identify future raw data as, for example, a cat or a dog,
and accordingly generate an AI model for identifying the future raw
data.
[0056] In addition, when the labelled results are fed to the AI
machine, the labelled results may be assigned with different
weightings based the related confidence levels. For example, in
FIG. 3, since the confidence levels related to the raw data R01,
R03, R05, R06, R08, and R09 are "H" (high), the labelled results
related thereto may be assigned with a higher weighting, which
makes the AI machine take more considerations thereto. On the other
hand, since the confidence levels related to the raw data R02, R04,
R07, and R10 are "M" (moderate), the labelled results related
thereto may be assigned with a lower weighting, which makes the AI
machine take less considerations thereto, but the disclosure is not
limited thereto.
[0057] In another embodiment, if only a first number of the
consistencies are higher than the threshold, it may be determined
that only the specific part of the labelled results is valid for
training the AI machine. In one embodiment, the specific part may
include a specific labelled data set of a specific annotator whose
related consistencies are higher than a threshold. For example, if
the consistencies related to the annotator A, such as R.sub.AB,
R.sub.AC, and the intra-consistency of the annotator A, are higher
than a threshold, the labelled data set LDS1 may be provided to
train the AI machine, but the disclosure is not limited thereto. In
addition, when the specific part of the labelled results is fed to
the AI machine, the labelled results therein may also be assigned
with different weightings based on the confidence levels thereto,
which may be referred to the teachings in the above.
[0058] In yet another embodiment, if a second number of the
consistencies are lower than the threshold, it may be determined
that the labelled results are not valid for training the AI
machine, and hence step S250 may be subsequently performed to
create the notification. The notification may be regarded as a
report, which may be shown to, for example, the administrators of
the annotators A-C, such that the administrators may be aware of
the performances of the annotators, but the disclosure is not
limited thereto.
[0059] In different embodiments, the notification may include an
unqualified annotator whose related consistencies are lower than a
threshold. For example, if the consistencies related to the
annotator A, such as R.sub.AB, R.sub.AC, and the intra-consistency
of the annotator A are lower than a threshold, it represents that
the performance of the annotator A on labelling the raw data may be
unsatisfying, and hence the annotator A may be highlighted in the
notification for the administrators to know.
[0060] Additionally, or alternatively, the notification may include
a questionable raw data which is inconsistently labelled by the
annotators. Specifically, sometimes the reason of the annotators
failing to achieve acceptable consistencies is because the
qualities of the raw data are too poor to be recognized (e.g.,
blurry images). Therefore, the questionable raw data that are
(highly) inconsistently labelled by the annotators may be added to
the notification for the administrators to know. After considering
the questionable raw data, the administrators may determine whether
to discard the questionable raw data. In some other embodiments,
the processor 104 may additionally determine the quality of the raw
data based on the resolution, signal-to-noise ratio, and/or the
contrast thereof and accordingly exclude the raw data with low
quality. In one embodiment, if the intra-annotator consistency of a
certain annotator is high, the notification may highlight the
certain annotator for his/her good work. However, if the
intra-annotator consistency of a certain annotator is, the
notification may highlight the certain annotator's labelled results
for the administrators to decide whether to keep the certain
annotator's labelled results, send the certain annotator back to be
trained again, and/or check the quality of the raw data.
[0061] In other embodiments, even if the notification is generated,
the method may further determine that whether there are some of the
labelled results that can be used to train the AI machine.
[0062] In one embodiment, if only the first inter-annotator
consistency (i.e., R.sub.AB) is higher than a threshold (i.e., the
second and the third inter-annotator consistencies are low), a
consistently labelled data between the labelled data set LDS1 and
the labelled data set LDS2 may be fed to the AI machine. That is,
as long as one of the inter-annotator consistencies (e.g.,
R.sub.AB) is high enough, the consistently labelled data of the two
related annotators (e.g., the annotators A and B) may be still
considered as valuable for training the AI machine.
[0063] In another embodiment, if only the first inter-annotator
consistency and the second inter-annotator consistency related to
the annotator A are higher than a threshold, the labelled data set
LDS1 may be fed to the AI machine. Specifically, since only the
consistencies (e.g., R.sub.AB and R.sub.AC) related to the
annotator A are high enough, it represents that the labelled data
set LDS1 may have a higher consistency with each of the labelled
data sets LDS2 and LDS3. Therefore, the labelled data set LDS1 may
be still considered as valuable for training the AI machine. In
addition, since the third inter-annotator consistency (i.e.,
R.sub.BC) failed to meet the threshold, the notification may be
added with another content about the raw data that are
inconsistently labelled by the annotators B and C for the
administrators to check.
[0064] In yet another embodiment, if only the first inter-annotator
consistency and the second inter-annotator consistency related to
the annotator A are lower than a threshold, the labelled data sets
LDS2 and LDS3 may be fed to the AI machine. Specifically, since
only the consistencies (e.g., R.sub.AB and R.sub.AC) related to the
annotator A are unsatisfying, it represents that the labelled data
set LDS1 may have a lower consistency with each of the labelled
data sets LDS2 and LDS3. Therefore, the labelled data sets LDS2 and
LDS3 may be still considered as valuable for training the AI
machine. In addition, since the notification may be added with
another content about the labelled data set LDS1 of the annotator
A, whose works are highly inconsistent with other annotators, for
the administrators to check.
[0065] In other embodiments, the consistencies mentioned FIG. 2 may
be defined in other ways, which may lead to different
implementations of step S230. Detailed discussions would be
provided hereinafter, and FIG. 3 would be regarded as an example
for better understandings.
[0066] For each of the raw data R01-R10 of FIG. 3, each of the
annotators A-C may choose one of a plurality of object categories
therefor. As taught in the previous embodiments, the considered
object categories may include "C" and "D". Taking the raw data R01
as an example, the annotators A-C may respectively choose C, C, C
for the raw data R01, and hence the labelled data LD011, LD012 and
LD013 may be accordingly generated and collectively form the
labelled result LR01. Taking the raw data R10 as another example,
the annotators A-C may respectively choose D, C, C for the raw data
R10, and hence the labelled data LD101, LD102 and LD103 may be
accordingly generated and collectively form the labelled result
LR10.
[0067] Roughly speaking, the processor 104 may determine a
consistency score for each of the raw data R01-R10 based on the
labelled results in FIG. 3, and accordingly determine whether the
raw data R01-R10 are valid for training the AI machine. For ease of
understanding, the labelled result LR02 and the related labelled
data LD021, LD022 and LD023 would be used as an example for the
following discussions, but the disclosure is not limited
thereto.
[0068] In one embodiment, the processor 104 may generate a
recommend result for the labelled result LR02 by comparing the
labelled data LD021, LD022 and LD023 in the labelled result LR 02
of the annotators A-C. For example, the processor 104 may determine
a specific object category of the object categories as the
recommend result, wherein the specific object category has a
highest number in the labelled data LD021, LD022 and LD023. In the
labelled result LR02, the number of the object category "C" is 1,
and the number of the object category "D" is 2. Accordingly, the
processor 104 may take the object category "D" with the highest
number as the recommend result. In some embodiments, the recommend
result may be some deterministic answers that is pre-given for
being used to compared with the labelled data, but the disclosure
is not limited thereto.
[0069] Afterwards, the processor 104 may determine a first
consistency score of each annotator A-C on the labelled result LR02
by comparing each labelled data LD021, LD022 and LD023 with the
recommend result (e.g., "D"). In one embodiment, the processor 104
may determine whether the labelled data LD021 of the annotator A is
identical to the recommend result. In FIG. 3, since the labelled
data LD021 is different from the recommend result, the processor
104 may determine the first consistency score of the annotator A is
0. In one embodiment, the processor 104 may determine whether the
labelled data LD022 of the annotator B is identical to the
recommend result. In FIG. 3, since the labelled data LD022 is
identical to the recommend result, the processor 104 may determine
the first consistency score of the annotator B is 1. Similarly, the
processor 104 may determine the first consistency score of the
annotator C is 1. In other embodiments, the first consistency score
of each annotators A-C may also be obtained in other ways based on
the requirements of the designers.
[0070] Next, the processor 104 may determine a second consistency
score of the labelled result LR02 based on the first consistency
score of each annotator A-C. In one embodiment, the processor 104
may determine an average of the first consistency score of each
annotator A-C as the second consistency score. In FIG. 3, since the
first consistency scores of the annotators A-C are 0, 1, and 1, the
second consistency score of the labelled result LR02 may be
calculated as 0.66 (i.e., (0+1+1)/3=0.66), but the disclosure is
not limited thereto.
[0071] Subsequently, the processor 104 may determine whether the
second consistency score of the labelled result LR02 is higher than
a consistency score threshold. If yes, the processor 104 may
determine that the labelled result LR02 is valid for training the
artificial intelligence machine, and vice versa.
[0072] In other embodiments, the disclosure further proposes a
mechanism for determining whether an annotator is reliable for
performing the labelling tasks or the labelled data labelled by the
annotator in the labelled results is valid for training the
artificial intelligence machine. For ease of understanding, the
annotator A would be used as an example, but the disclosure is not
limited thereto.
[0073] Specifically, the processor 104 may determine a recommend
result for each of the labelled results. Taking FIG. 3 as an
example, the recommend result of the labelled results corresponding
to the raw data R01-R10 are C, D, D, D, D, D, D, C, C, C. The
detail of obtaining the recommend result of each labelled result
may be referred to the previous teachings, which would not be
repeated herein.
[0074] Next, the processor 104 may determine the first consistency
score of the annotator A on each of the labelled results by
comparing the labelled data (labelled by the annotator A in each
labelled result) and the recommend result of each labelled results.
Based on the teaching in the above, the first consistency scores of
the annotator A on the labelled results are 1, 0, 1, 1, 1, 1, 1, 1,
1, 0.
[0075] Afterwards, the processor 104 may determining a third
consistency score of the annotator A based on the first consistency
score of the annotator A on each of the labelled results.
[0076] In one embodiment, the processor 104 may take an average of
the first consistency score of the annotator A on each of the
labelled results as the third consistency score, which may be 0.8
(i.e., (1+0+1+1+1+1+1+1+1+0)/10=0.8).
[0077] With the third consistency score of the third consistency
score of the annotator A on each of the labelled results, the
processor 104 may determine whether the third consistency score of
the annotator A on each of the labelled results is higher than an
annotator score threshold. If yes, the processor 104 may determine
that the annotator A is reliable for labelling the raw data, and
vice versa.
[0078] In other embodiments, the processor 104 may calculate a
fourth consistency score of the dataset that includes the labelled
results of FIG. 3. Specifically, based on the teachings in the
above, the second consistency score of the labelled results
corresponding to the raw data R01-R10 are 1, 0.66, 1, 0.66, 1, 1,
0.66, 1, 1, 0.66. In this case, the processor 104 may determine an
average of the second consistency score of the labelled results
corresponding to the raw data R01-R10 as the fourth consistency
score of the dataset of FIG. 3. Therefore, the fourth consistency
score of the dataset would be 0.86 (i.e.,
(1+0.66+1+0.66+1+1+0.66+1+1+0.66)/10=0.86). With the fourth
consistency score, the processor 104 may determine whether the
dataset as a whole is qualified for training by determine whether
the fourth consistency score is higher than a consistency score
threshold. If yes, the processor 104 may determine that the dataset
is qualified for training, and vice versa, but the disclosure is
not limited thereto.
[0079] As can be known from the above, the method of the disclosure
may determine whether the labelled results of the annotators are
valid to train the AI machine based on the consistencies (e.g., the
inter-annotator consistency or/and the intra-annotator consistency)
of the annotators. Further, if there exist questionable raw data,
unqualified annotators, or like, a notification may be accordingly
created and provided to the administrators. As such, the
administrators may be aware of the performances of the annotators
and the quality of the raw data and correspondingly take actions,
such as excluding the questionable raw data and the unqualified
annotators. Moreover, when the labelled results are fed to the AI
machine, the labelled results may be assigned with different
weightings based the related confidence levels, such that the AI
machine may decide to learn more from which of the labelled results
(i.e., the labelled results with higher confidence levels).
[0080] In some other embodiments, the concept described in the
above may be used to verify whether the annotators are qualified
(or well-trained) to label raw data. See FIG. 6 and FIG. 7 for
further discussions.
[0081] FIG. 6 is a flow chart of the method for verifying the
annotators according to an exemplary embodiment of the disclosure,
and FIG. 7 is a schematic view of comparing the labelled results of
the annotators with correct answers according to an embodiment of
the disclosure. The method shown in FIG. 6 may be implemented by
the training system 100 of FIG. 1A. The details of each step of
FIG. 6 will be described below with reference to the elements shown
in FIG. 1A and the scenario shown in FIG. 7.
[0082] In step S410, a plurality of raw data R01', R02', . . . ,
R09', and R10' may be provided to annotators B and C. In the
present embodiment, each of the raw data R01'-R10' may be an image
of a cat or a dog for the annotators B and C to identify, but the
disclosure is not limited thereto. In other embodiments, the raw
data may include other types of images, such as human images,
medical images, etc.
[0083] For example, when the raw data R01' is presented to the
annotator B, the annotator B may recognize the raw data R01' as an
image with a cat, and hence the annotator B may use "C" (which may
be regarded as a labelled data LD012') to label the raw data R01'.
For the annotator C, since the annotator C may also recognize the
raw data R01' as an image with a cat, and hence the annotator C may
also use "C" (which may be regarded as labelled data LD013') to
label the raw data R01'.
[0084] For the ease of the following discussions, the labelled data
of the annotators on the same raw data will be collectively
referred as a labelled result, and the labelled data of one
annotator on all of the raw data would be collectively referred as
a labelled data set.
[0085] Under this situation, the labelled data LD012' and LD013' of
the annotators B and C on the raw data R01' may be collectively
referred as a labelled result LR01', and the labelled data LD102'
and LD103' of the annotators B and C on the raw data R10' may be
collectively referred as a labelled result LR10'. The labelled data
made by the annotator B on the raw data R01'-R10' may be referred
as a labelled data set LDS2', and the labelled data made by the
annotator C on the raw data R01'-R10' may be referred as a labelled
data set LDS3', but the disclosure is not limited thereto.
[0086] In step S420, a plurality of labelled results may be
retrieved. In the present embodiment, the labelled results of the
annotators B and C on labelling the raw data R01'-R10', such as the
labelled results LR01' and LR10', may be retrieved after the
annotators B and C finish their labelling tasks on the raw data
R01'-R10'.
[0087] In step S430, the labelled results of each of the annotators
may be compared with a plurality of training data to obtain a
plurality of consistency scores. In FIG. 7, the training data may
be characterized as the correct answers of labelling the raw data
R01'-R10'. That is, each of the raw data R01'-R10' will be
pre-labelled as a cat or a dog by, for example, the administrators
of the annotator B and C. For the annotator B, since there are 6 of
the labelled data in the labelled data set LDS2' are consistent
with the correct answers, the consistency score of the annotator B
may be 0.6 (i.e., 6/10). For the annotator C, since there are 8 of
the labelled data in the labelled data set LDS3' are consistent
with the correct answers, the consistency score of the annotator C
may be 0.8 (i.e., 8/10).
[0088] In other embodiments, if the annotators B and C are asked to
label the raw data in different ways, such as labelling an ROI in
each of the raw data, the way of obtaining/calculating the
consistency scores may be correspondingly modified. For example, if
the annotator B is asked to label the region of a tumor in a CT
image, the consistency score of the annotator B may be
characterized by the similarities between the ROI labelled by the
annotator B and the ROI (i.e., the training data or the correct
answer) pre-labelled by the administrators in each of the raw data.
In various embodiments, these similarities may be obtained by
algorithms such as Dice similarity indices/Jaccard index method,
Cohen's Kappa method, Fleiss' Kappa method, Krippendorff's alpha
method, and/or the intraclass correlation coefficient method, but
the disclosure is not limited thereto.
[0089] In step S440, the annotators with consistency scores higher
than a threshold may be determined to be qualified, and in step
S450 a notification may be created based on the consistency scores
of the annotators. For example, if the annotator B is determined to
be qualified, it may represent that the performance of the
annotator B is good enough to label other unknown raw data.
Therefore, the annotator B can be, for example, dispatched to do
the works discussed in the embodiments of FIG. 2 and FIG. 3, but
the disclosure is not limited thereto.
[0090] On the other hand, if the consistency score of the annotator
B fails to meet the threshold, it may represent that the
performance of the annotator B is not good enough to label other
unknown raw data. Therefore, the annotator B can be highlighted in
the notification for the administrators to know, and the
administrators may take actions such as asking the annotator B to
be trained with specific training sessions related to enhance the
skills of labelling the raw data, such as medical trainings for
identifying a tumor in a CT image, but the disclosure is not
limited thereto.
[0091] In one embodiment, those questionable raw data that are
highly incorrectly answered by the annotators B and C may also be
highlighted in the notification for the administrators to know that
which of the raw data may be too difficult to be identified.
Therefore, the administrators may accordingly decide whether to
exclude the questionable raw data from the raw data used to verify
other annotators. In some other embodiments, the questionable data
may be automatically excluded by the processor 104 based on some
specific criteria.
[0092] In some other embodiments, before the annotators B and C
perform the labelling tasks on the raw data, the annotators B and C
may be asked to participate the specific training sessions related
to enhance the skills of labelling the raw data, such as medical
trainings for identifying a tumor in a CT image. If some of the
annotators are determined to be unqualified, the administrators may
correspondingly modify the contents of the training sessions and
ask theses annotators to take the modified training sessions again,
but the disclosure is not limited thereto.
[0093] As can be known from the above, the method of the disclosure
may verify whether the annotators are qualified to label other
unknown raw data based on the comparison result between their
labelled results and the training data (e.g., the correct answers).
Further, the method may create notification based on the
consistency scores of the annotators, such that the administrators
may be aware of questionable raw data, unqualified annotators, or
the like and correspondingly take actions such as modifying
training sessions, excluding questionable raw data, and asking
unqualified annotators to take training sessions again, etc.
[0094] See FIG. 8, which shows a conventional way of labelling the
data for training AI machines to generate the AI model. As shown in
FIG. 8, if raw data 710 include data belonging to a first class
data set (e.g., an image of a cat) or a second class data set
(e.g., an image of a dog), each of the raw data 710 has to be
manually labelled by users as "C" (i.e., cat) or "D" (i.e., dog),
which is a labor intensive work for the users. After all of the raw
data 710 are labelled, the labelled data 711 which are labelled as
"C" and the labelled data 712 which are labelled as "D" may be used
to train the AI machine to generate the AI model.
[0095] However, as the number of the raw data 710 grows, more and
more effort of the users have to be given to label the raw data
710, which makes the labelling tasks inefficient.
[0096] Accordingly, the disclosure proposes a method for labelling
raw data based on pre-labelled data, which may use a plurality of
the pre-labelled data to train the AI machine and use the
correspondingly generated AI model to assist the work of labelling
the remaining raw data. See FIG. 9, FIG. 10A, and FIG. 10B for
detailed discussions.
[0097] FIG. 9 is a flow chart of the method for labelling raw data
based on pre-labelled data according to one embodiment of the
disclosure, and FIG. 10A and FIG. 10B are schematic views of
implementing the method of FIG. 9. The method shown in FIG. 9 may
be implemented by the training system 100 of FIG. 1A. The details
of each step of FIG. 9 will be described below with reference to
the elements shown in FIG. 1A and the scenario shown in FIGS. 10A
and 10B.
[0098] In step S710, a plurality of raw data 810 may be provided.
In the present embodiment, the raw data 810 are assumed to be
formed by data belonging to a first class data set or a second
class data set. For the ease of following discussions, the data
belonging to the first class data set are assumed to be images of
cats (class "C"), and the data belonging to the second class data
set are assumed to be images of dogs (class "D"), but the
disclosure is not limited thereto.
[0099] In step S720, a plurality of first labelled results 820 may
be retrieved. In the present embodiment, the first labelled results
820 may be the pre-labelled data made by, for example,
professionals or the annotators on labelling ROIs in a first part
of the raw data 810. For example, the annotators may be asked to
label a small batch of the raw data 810 that may be regarded as
belonging to the first class data set. That is, the first labelled
results 820 may be the images labelled by the annotators as images
of cats.
[0100] In step S730, an AI machine may be trained with the first
labelled results 820 to generate an AI model. In the present
embodiment, since the AI machine is trained with the first labelled
results 820, which are the images belonging to the class "C", the
AI machine may learn whether to identify unknown images as class
"C". In this case, the correspondingly generated AI model may
determine whether to identify unknown images as class "C" in
response to receiving the unknown images, but the disclosure is not
limited thereto.
[0101] In step S740, the AI model may be used to label a second
part of the raw data 810 as a plurality of second results 821, as
shown in FIG. 10B. That is, after the AI machine has been trained
with the first labelled results 820 to generate the AI model, and
the AI model may be used to assist the tasks on labelling the
second part of the raw data 810 as the second labelled results 821.
Therefore, the data that can be used to train the AI machine to
identify the images of the class "C" may be increased.
[0102] In step S750, the AI machine may be trained with the second
labelled results 821 to update the AI model. That is, since the
data that can be used to train the AI machine to identify the
images of the class "C" are used to further train the AI machine,
the updated AI model may better learn whether to identify unknown
images as class "C". In some embodiments, before the second
labelled results 821 are used to train the AI machine, the second
labelled results 821 may be double checked by the annotators, but
the disclosure is not limited thereto.
[0103] In other embodiments, the updated AI model may be further
used to label a third part of the raw data 810 as a plurality of
third labelled results, and the AI machine may be further trained
with the third labelled results to further update the AI model.
Further, the updated AI model may be used to assist the tasks on
labelling other parts of the raw data 810 as the class "C". The
steps in the above can be repeatedly performed until all of the
class "C" data in the raw data 810 have been completely labelled,
but the disclosure is not limited thereto.
[0104] As a result, the whole tasks on labelling the class "C" data
in the raw data 810 may be less labor intensive and more
efficient.
[0105] In other embodiments, after all of the class "C" data (i.e.,
the data belonging to the data set) are identified by the AI model,
a plurality of fourth labelled results may be retrieved. In the
present embodiment, the fourth labelled results may be the
pre-labelled data made by, for example, professionals or the
annotators on labelling ROIs in a fourth part of the raw data 810.
For example, the annotators may be asked to label a small batch of
the raw data 810 that may be regarded as belonging to the second
class data set, i.e., the class "D" data. That is, the fourth
labelled results may be the images labelled by the annotators as
images of dogs.
[0106] Next, the AI machine may be trained with the fourth labelled
results to update the AI model. In the present embodiment, since
the AI machine is trained with the fourth labelled results, which
are the images belonging to the class "D", the AI machine may learn
whether to identify unknown images as class "D". In this case, the
correspondingly generated AI model may determine whether to
identify unknown images as class "D" in response to receiving the
unknown images, but the disclosure is not limited thereto.
[0107] Similar to the above teachings, the AI model may be used to
label a fifth part of the raw data as a plurality of fifth labelled
results, wherein the fifth labelled results are identified by the
artificial model to be categorized as the second class data set,
i.e., the class "D" data. Afterwards, the AI machine may be further
trained with the fifth labelled results to update the AI model. The
steps in the above can be repeatedly performed until all of the
class "D" data in the raw data 810 have been completely labelled,
but the disclosure is not limited thereto.
[0108] In one embodiment, after all of the raw data 810 have been
labelled and used to train the AI machine, the AI model to identify
an unknown data as belonging to the first class data set (i.e., a
class "C" data) or the second class data set (i.e., a class "D"
data).
[0109] In other embodiments, if the raw data 810 includes other
class data set, the concept introduced in the above may be used to
train the AI model to assist the tasks on labelling the raw data
810. In this case, the generated AI model may be further used to
identify an unknown data as belonging to the first class data set
(i.e., a class "C" data), the second class data set (i.e., a class
"D" data), or other class data set, but the disclosure is not
limited thereto.
[0110] Consequently, the whole tasks on labelling the class "C"
data, the class "D" data in the raw data 810 may be less labor
intensive and more efficient.
[0111] In other embodiments, the mechanism of the training system
100 determining whether the labelled results are valid for training
an artificial intelligence machine based on a plurality of
consistencies of the annotators in step S230 can be implemented in
other ways, and the related details would be provided
hereinafter.
[0112] Specifically, as mentioned in the above, as the annotators
are provided with a raw data, the annotators may be asked to label
ROIs (e.g., a tumor) in the raw data. In the following discussions,
each of the raw data provided by the processor 104 to the
annotators may include one or more target object for the annotators
to label, and each target object may belong to one of object
categories. For example, a raw data may be an image that includes
target objects such as a cat and a dog, wherein the cat and the dog
belong to different categories. As the processor 104 provides the
annotators with the raw data, the annotators may label the raw data
by, for example, labelling bounding regions with a tag in the raw
data to label the target objects. The bounding regions with tags
labelled in the raw data may be referred to as labelled data, and
the raw data with labelled data may be referred to as a labelled
result. In various embodiments, the tag of each bounding regions
may indicate the chosen category. For example, after the annotator
labels a bounding region in the image, the annotator may choose one
of the object categories as the tag (e.g., cat or dog) of the
bounding region to specify which of the object categories
corresponds to the labelled bounding region.
[0113] See FIG. 11, which illustrates a labelled result according
to an embodiment of the disclosure. In the scenario of FIG. 11, a
first raw data 1111 may be provided by the processor 104 to three
annotators 1, 2, and 3 to label. In the embodiment, the first raw
data 1111 is assumed to be an image that only includes target
objects (e.g., dogs) of one object category for ease of
discussions, but the disclosure is not limited thereto. In other
embodiments, the first raw data may be an image that includes
multiple target objects belonging to various object categories.
[0114] With the first raw data 1111, the annotators 1, 2, and 3
may, for example, draw bounding regions to label the target object
therein. In FIG. 11, a bounding region b.sub.i,j represents that it
is the j-th bounding region labelled by the annotator i. For
example, the bounding region b.sub.1,1 is the first bounding region
labelled by the annotator 1, the bounding region b.sub.2,1 is the
first bounding region labelled by the annotator 2, and the bounding
region b.sub.3,2 is the second bounding region labelled by the
annotator 3, etc.
[0115] Noted that each annotator performs their labelling operation
individually without seeing to others' labelling operations, and
FIG. 11 integrally shows all of the bounding regions of the
annotators for visual aid.
[0116] After the annotators 1, 2, and 3 finish their labelling
operations, the first raw data with the shown bounding regions
(i.e., labelled data) may be referred to as a first labelled result
1110 and retrieved by the processor 104. With the first labelled
result 1110, the processor 104 may accordingly determine whether
the first labelled result 1110 is valid for training an AI machine
based on a plurality of consistencies of the annotators.
[0117] See FIG. 12, which shows a mechanism of determining whether
the first labelled result is valid for training an AI machine. In
step S1210, for the first labelled result 1110, the processor 104
may identify at least one target object of each object category,
wherein each target object is commonly labelled by at least two of
the annotators.
[0118] Specifically, the processor 104 may determine a plurality of
region pairs, wherein each region pair includes a pair of bounding
regions labelled by different annotators. Taking FIG. 13A as an
example, which shows the bounding regions labelled by two of the
annotators according to FIG. 11.
[0119] In FIG. 13A, there are four bounding regions
b.sub.1,1-b.sub.1,4 of the annotator 1 and three bounding regions
b.sub.2,1-b.sub.2,3 of the annotator 2. In this case, the processor
104 may accordingly determine 12 region pairs between the annotator
1 and 2, and each of the region pairs includes one of the bounding
regions of the annotator 1 and one of the bounding regions of the
annotator 2. For example, one of the region pairs between the
annotator 1 and 2 may be formed by the bounding regions b.sub.1,1
and b.sub.2,1. Based on the same principle, the processor 104 may
determine the region pairs between the annotator 2 and 3 and
determine the region pairs between the annotator 1 and 3.
[0120] After all of the region pairs between any two of the
annotators 1, 2, and 3 are determined, the processor 104 may
determine a plurality of correlation coefficients that respectively
corresponds to the region pairs, wherein each correlation
coefficients characterizes a similarity of one of the region pairs,
and the similarity may be obtained based on the principle of IoU,
which has been taught in FIG. 5 and would not be repeated
herein.
[0121] Take the region pair formed by bounding regions b.sub.1,1
and b.sub.2,1 as an example. The correlation coefficient of this
region pair may be characterized by the similarity between the
bounding regions b.sub.1,1 and b.sub.2,1. In this embodiment, the
similarity between the bounding regions b.sub.1,1 and b.sub.2,1 may
be obtained based on the mechanism taught in the descriptions
related to FIG. 5. That is, the similarity may be obtained by
dividing the intersection of the bounding regions b.sub.1,1 and
b.sub.2,1 with the union of the bounding regions b.sub.1,1 and
b.sub.2,1, but the disclosure is not limited thereto. Based on the
same principle, the correlation coefficients of other region pairs
between any two of the annotators 1, 2, and 3 may be accordingly
determined.
[0122] As exemplarily shown in FIG. 13A, the correlation
coefficients of the region pairs between the annotators 1 and 2 may
be determined. For improving the subsequent operations for
identifying target objects, those region pairs whose correlation
coefficients are lower than a correlation threshold would be
discarded or ignored. In FIG. 13A, the correlation threshold may be
assumed to be 0.5, but the disclosure is not limited thereto.
Accordingly, there are only two region pairs left in FIG. 13A,
wherein one with the correlation coefficient of 0.91 includes the
bounding regions b.sub.1,1 and b.sub.2,1, and the other with the
correlation coefficient of 0.87 includes the bounding regions
b.sub.1,2 and b.sub.2,2.
[0123] For the region pairs between the annotators 2 and 3 and the
region pairs between the annotators 1 and 3, the processor 104 may
perform similar operations to find out those region pairs whose
correlation coefficients are higher than the correlation threshold.
The related result may be referred to FIG. 13B, which shows all of
the region pairs whose correlation coefficients are higher than the
correlation threshold according to FIG. 11 and FIG. 13A.
[0124] Based on the exemplary result in FIG. 13B, the processor 104
may merge the bounding regions of the annotators 1, 2, and 3 into a
plurality of groups based on the correlation coefficients that are
higher than a correlation threshold, wherein each group includes at
least two bounding regions from different annotators.
[0125] See FIG. 13C, which show the mechanism of merging the
bounding regions into the groups. In the present embodiment, the
mechanism of the processor 104 merging the bounding regions into
the groups may include the following steps: (a) retrieving a
specific correlation coefficient, wherein the specific correlation
coefficient is highest among the correlation coefficients that are
higher than the correlation threshold (step S1310); (b) retrieving
a specific region pair corresponding to the specific correlation
coefficient from the region pairs, wherein the specific region pair
comprises a first specific region and a second specific region
(step S1320); (c) determining whether one of the first specific
region and the second specific region belongs to an existing group
(step S1330); (d) in response to determining that neither of the
first specific region or the second specific region belongs to the
existing group, creating a new group based on the first specific
region and the second specific region (step S1340); (e) in response
to determining that one of the first specific region and the second
specific region belongs to the existing group, determining whether
another of the first specific region and the second specific region
corresponds to the same annotator with a member of the existing
group (step S1350); (f) in response to determining that the another
of the first specific region and the second specific region does
not correspond to the same annotator with the member of the
existing group, adding the another of the first specific region and
the second specific region into the existing group (step S1360);
(g) excluding the specific correlation coefficient from the
correlation coefficients and excluding the specific region pair
from the region pairs (step S1380); and (h) in step S1390, in
response to determining that the region pairs are not empty,
returning to step S1310. In other embodiments, in response to the
determination is "yes" in step S1370, the specific box pair may be
neglected (step S1370).
[0126] The mechanism in the above would be further discussed with
FIG. 13D, which shows the mechanism of merging the bounding regions
into the groups according to FIG. 13B. Firstly, Stage 1 where no
group exists, the processor 104 may retrieve the highest
correlation coefficient 0.95 as the specific correlation
coefficient (step S1310) and the related specific region pair (step
S1320), which includes the bounding regions b.sub.2,1 and b.sub.3,1
(i.e., the first specific bounding region and the second specific
bounding region). Since there are no existing group in the first
stage, the processor 104 would accordingly create a new group
(i.e., Group 1) that includes the bounding regions b.sub.2,1 and
b.sub.3,1 (step S1330 and S1340). Next, the processor 104 may
exclude the specific correlation coefficient (i.e., 0.95) from the
correlation coefficients and exclude the specific region pair
(which includes the bounding regions b.sub.2,1 and b.sub.3,1) from
the region pairs (step S1380). Since the region pairs are not
empty, the processor 104 may perform step S1390 to return to step
S1310, which leads to Stage 2 in FIG. 13D.
[0127] In Stage 2 of FIG. 13D, since the correlation coefficient of
0.95 has been excluded, the specific correlation coefficient
retrieved by the processor 104 may be 0.91 (step S1310), i.e., the
highest correlation coefficient in the remaining correlation
coefficients. Accordingly, the processor 104 may retrieve the
related specific region pair (step S1320), which includes the
bounding regions b.sub.1,1 and b.sub.2,1 (i.e., the first specific
bounding region and the second specific bounding region). In this
case, since the bounding region b.sub.2,1 has been in an existing
group (i.e., Group 1), the processor 104 may determine that the
bounding region b.sub.1,1 corresponds to the same annotator with a
member (i.e., the bounding regions b.sub.2,1 and b.sub.3,1) of
Group 1 (step S1350). Since the bounding region b.sub.1,1 does not
corresponds to the same annotator with a member of Group 1, the
processor 104 may adding the bounding region b.sub.1, into Group 1
(step S1360). Next, the processor 104 may exclude the specific
correlation coefficient (i.e., 0.91) from the correlation
coefficients and exclude the specific region pair (which includes
the bounding regions b.sub.1,1 and b.sub.2,1) from the region pairs
(step S1380). Since the region pairs are not empty, the processor
104 may perform step S1390 to return to step S1310, which leads to
Stage 3 in FIG. 13D.
[0128] In Stage 3 of FIG. 13D, based on the teachings in the above,
the specific correlation coefficient would be 0.87, and the
specific region pair would be the region pair including the
bounding regions b.sub.1,2 and b.sub.2,2). According to a procedure
similar to those performed in Stage 1, the processor 104 would
create a new group (i.e., Group 2) that includes the bounding
regions b.sub.1,2 and b.sub.2,2. Next, the processor 104 may
exclude the specific correlation coefficient (i.e., 0.87) from the
correlation coefficients and exclude the specific region pair
(which includes the bounding regions b.sub.1,2 and b.sub.2,2) from
the region pairs (step S1380). Since the region pairs are not
empty, the processor 104 may perform step S1390 to return to step
S1310, which leads to Stage 4 in FIG. 13D.
[0129] In Stage 4, the specific correlation coefficient would be
0.86, and the specific region pair would be the region pair
including the bounding regions b.sub.3,1 and b.sub.1,1). Since the
bounding regions b.sub.3,1 and b.sub.1,1 are already in Group 1,
the processor 104 may directly perform step S1380 and step S1390,
which leads to Stage 5 of FIG. 13D. From another perspective, since
the bounding region b.sub.3,1 already belongs to Group 1, the
processor 104 may determine whether the bounding region b.sub.1,1
corresponds to the same annotator with a member of Group 1.
[0130] Since the bounding region b.sub.1,1 corresponds to the same
annotator with the member (i.e., the bounding region b.sub.1,1
itself) of Group 1, the processor 104 may neglect the specific
region pair including the bounding regions b.sub.3,1 and
b.sub.1,1.
[0131] In Stage 5 of FIG. 13D, based on the teachings in the above,
the specific correlation coefficient would be 0.84, and the
specific region pair would be the region pair including the
bounding regions b.sub.3,4 and b.sub.1,3. According to a procedure
similar to those performed in Stage 1, the processor 104 would
create a new group (i.e., Group 3) that includes the bounding
regions b.sub.3,4 and b.sub.1,3. Next, the processor 104 may
exclude the specific correlation coefficient (i.e., 0.84) from the
correlation coefficients and exclude the specific region pair
(which includes the bounding regions b.sub.3,4 and b.sub.1,3) from
the region pairs (step S1380). Since the region pairs are not
empty, the processor 104 may perform step S1390 to return to step
S1310, which leads to Stage 6 in FIG. 13D.
[0132] In Stage 6 of FIG. 13D, the specific correlation coefficient
would be 0.82, and the specific region pair would be the region
pair including the bounding regions b.sub.2,2 and b.sub.3,2). Since
the bounding region b.sub.2,2 already belongs to Group 2, the
processor 104 may determine whether the bounding region b.sub.3,2
corresponds to the same annotator with a member of Group 2 (step
S1350). Since the bounding region b.sub.3,2 does not correspond to
the same annotator with any of the member of Group 2, the processor
104 may add the bounding region b.sub.3,2 into Group 2 (step
S1360). Next, the processor 104 may exclude the specific
correlation coefficient (i.e., 0.82) from the correlation
coefficients and exclude the specific region pair (which includes
the bounding regions b.sub.2,2 and b.sub.3,2) from the region pairs
(step S1380). Since the region pairs are not empty, the processor
104 may perform step S1390 to return to step S1310, which leads to
Stage 7 in FIG. 13D.
[0133] In Stage 7 of FIG. 13D, the specific correlation coefficient
would be 0.78, and the specific region pair would be the region
pair including the bounding regions b.sub.3,3 and b.sub.1,2). Since
the bounding region b.sub.1,2 already belongs to Group 2, the
processor 104 may determine whether the bounding region b.sub.3,3
corresponds to the same annotator with a member of Group 2. Since
the bounding region b.sub.3,3 corresponds to the same annotator
with the member (i.e., the bounding region b.sub.3,2) of Group 2,
the processor 104 may neglect the specific region pair including
the bounding regions b.sub.3,3 and b.sub.1,2. Next, the processor
104 may exclude the specific correlation coefficient (i.e., 0.78)
from the correlation coefficients and exclude the specific region
pair (which includes the bounding regions b.sub.3,3 and b.sub.1,2)
from the region pairs (step S1380). Since the region pairs are not
empty, the processor 104 may perform step S1390 to return to step
S1310, which leads to Stage 8 in FIG. 13D.
[0134] In Stage 8 of FIG. 13D, the specific correlation coefficient
would be 0.67, and the specific region pair would be the region
pair including the bounding regions b.sub.2,2 and b.sub.3,3). Since
the bounding region b.sub.2,2 already belongs to Group 2, the
processor 104 may determine whether the bounding region b.sub.3,3
corresponds to the same annotator with a member of Group 2. Since
the bounding region b.sub.3,3 corresponds to the same annotator
with the member (i.e., the bounding region b.sub.3,2) of Group 2,
the processor 104 may neglect the specific region pair including
the bounding regions b.sub.2,2 and b.sub.3,3. Next, the processor
104 may exclude the specific correlation coefficient (i.e., 0.67)
from the correlation coefficients and exclude the specific region
pair (which includes the bounding regions b.sub.2,2 and b.sub.3,3)
from the region pairs (step S1380). Since the region pairs are not
empty, the processor 104 may perform step S1390 to return to step
S1310, which leads to Stage 9 in FIG. 13D.
[0135] In Stage 9, the specific correlation coefficient would be
0.63, and the specific region pair would be the region pair
including the bounding regions b.sub.3,2 and b.sub.1,2). Since the
bounding regions b.sub.3,2 and b.sub.1,2 are already in Group 2,
the processor 104 may directly perform step S1380 and step S1390.
From another perspective, since the bounding region b.sub.3,2
already belongs to Group 2, the processor 104 may determine whether
the bounding region b.sub.1,2 corresponds to the same annotator
with a member of Group 2. Since the bounding region b.sub.1,2
corresponds to the same annotator with the member (i.e., the
bounding region b.sub.1,2 itself) of Group 2, the processor 104 may
neglect the specific region pair including the bounding regions
b.sub.3,2 and b.sub.1,2. Since there no unconsidered region pairs,
the processor 104 may determine that the procedure of merging the
bounding regions into the groups has been completed.
[0136] More specifically, after the processor 104 performs the
above operations, the bounding regions of the annotators 1, 2, and
3 would be merged into Group 1, Group 2, and Group 3, wherein Group
1 includes the bounding regions b.sub.2,1, b.sub.3,1 and b.sub.1,1,
Group 2 includes the bounding regions b.sub.1,2, b.sub.2,2 and
b.sub.3,2, and Group 3 includes the bounding regions b.sub.3,4 and
b.sub.1,3.
[0137] Afterwards, for each group, the processor 104 may generate a
reference region for identifying one of the target objects based on
the bounding regions in each group.
[0138] See FIG. 14, which shows the obtained reference regions of
each group according to FIG. 13D. Taking Group 1 as an example, the
processor 104 may take an average of the bounding regions
b.sub.2,1, b.sub.3,1 and b.sub.1,1 to obtain the reference region
1410 of Group 1. In one embodiment, the bounding regions b.sub.2,1,
b.sub.3,1 and b.sub.1,1 may be characterized as coordinates, and
the reference region 1410 may be obtained by calculating an average
of the coordinates, but the disclosure is not limited thereto.
Based on the same principle, the processor 104 may obtain reference
regions 1420 and 1430 that respectively corresponds to Group 2 and
Group 3.
[0139] After the processor 104 obtains the reference regions
1410-1430 for identifying target objects in the first labelled
result 1110, in step S1220, the processor 104 may determine a
consistency score of the first labelled result 1110 based on the at
least one target object of each object category and the at least
one bounding region labelled by the annotators 1, 2, and 3.
[0140] In one embodiment, for each annotator, the processor 104 may
calculate a first consistency score of each object category based
on the target object of each object category and the bounding
regions, and the processor 104 may take an average of the first
consistency score of each object category to obtain a second
consistency score. FIG. 15 will be used as an example for further
discussions.
[0141] See FIG. 15, which shows all bounding regions according to
FIG. 11 and FIG. 14. Taking FIG. 15 and the annotator 1 as an
example, since it is assumed that the target objects corresponding
to the reference regions 1410-1430 in FIG. 15 belong to one object
category, the processor 104 may calculate the first consistency
score of this object category based on the target objects
identified by the reference regions 1410-1430 and the bounding
regions labelled by the annotator 1.
[0142] In one embodiment, the first consistency score of the
annotator 1 may be used to determine whether the annotator 1 is
reliable for labelling. For example, in response to determining
that the first consistency score is lower than a certain threshold,
the processor 104 may determine that the annotator 1 is unreliable
for labelling, and vice versa. In some embodiment, if the annotator
1 is determined to be unreliable for labelling, the labelled data
generated by the annotator 1 would be excluded from training the AI
machine, and vice versa, but the disclosure is not limited
thereto.
[0143] In one embodiment, the first consistency score of the
annotator 1 may be obtained by dividing a first number with a
second number, wherein the second number is a sum of the first
number, a third number and a fourth number.
[0144] In the present embodiment, the first number characterizes a
number of the bounding regions of the annotator 1 that matches the
identified target object of the object category. From another
perspective, for the annotator 1, the first number may be regarded
as a number of the bounding regions that are grouped In FIG. 15,
for the three groups used for generating the reference regions
1410-1430, since there are three bounding regions b.sub.1,1,
b.sub.1,2 and b.sub.1,3 in the groups, the processor 104 may
determine that the number of the bounding regions of the annotator
1 that matches the identified target object of the object category
to be 3. That is, the first number of the annotator 1 is 3.
[0145] In addition, the third number may be a number of the
identified target object that does not match any of the bounding
regions of the annotator 1. From another perspective, for the
annotator 1, the third number may be a number of the groups that
fail to include the bounding regions of the annotator 1 among the
groups. In FIG. 15, since all of the target objects match the
bounding regions of the annotator 1, the third number would be 0,
which means that all of the groups include the bounding regions of
the annotator 1. The fourth number may be a number of the bounding
regions of the annotator 1 that does not match any of the
identified target object. From another perspective, for the
annotator 1, the fourth number may be a number of the bounding
regions that are not grouped. In FIG. 15, since there is one
bounding region (i.e., the bounding region b.sub.1,4) of the
annotator 1 that does not match any of the identified target
object, the fourth number would be 1, which means that the number
of the bounding regions of the annotator 1 that are not grouped is
1.
[0146] Therefore, the first consistency score of the annotator 1 of
the object category would be
3 3 + 0 + 1 = 0.75 . ##EQU00001##
Moreover, since it is assumed that there is only one object
category in the first labelled result, the second consistency score
of the annotator 1 of all object category would be equal to the
first consistency score.
[0147] In other embodiments, if the identified target objects in a
labelled result is determined to belong to multiple object
categories, for a certain annotator, the processor 104 may
calculate the first consistency score of the certain annotator of
each object category, and taking an average of the first
consistency score of each object category as the second consistency
score of the considered labelled result of the certain
annotator.
[0148] Taking FIG. 15 and the annotator 2 as another example, since
it is assumed that the target objects corresponding to the
reference regions 1410-1430 in FIG. 15 belong to one object
category, the processor 104 may calculate the first consistency
score of this object category based on the target objects
identified by the reference regions 1410-1430 and the bounding
regions labelled by the annotator 2.
[0149] In one embodiment, the first consistency score of the
annotator 2 may be obtained based on a procedure similar to the
procedure of obtaining the first consistency score of the annotator
1. That is, the first consistency score of the annotator 2 may be
obtained by dividing a first number with a second number, wherein
the second number is a sum of the first number, a third number and
a fourth number.
[0150] In FIG. 15, for the three groups used for generating the
reference regions 1410-1430, since there are two bounding regions
b.sub.2,1 and b.sub.2,2 in the groups, the processor 104 may
determine that the number of the bounding regions of the annotator
2 that matches the identified target object of the object category
to be 2. That is, the first number of the annotator 2 is 2.
[0151] In addition, the third number may be a number of the
identified target object that does not match any of the bounding
regions of the annotator 2. In FIG. 15, since there is one
identified target object (which corresponds to the reference region
1430) that does not match any of the bounding regions of the
annotator 2, the third number would be 1, which means that one of
the groups fails to include the bounding regions of the annotator
2. Next, since there is one bounding region (i.e., the bounding
region b.sub.2,3) of the annotator 2 that does not match any of the
identified target object, the fourth number would be 1, which means
that the number of the bounding regions of the annotator 2 that
fails to be grouped is 1.
[0152] Therefore, the first consistency score of the annotator 2 of
the object category would be
2 2 + 1 + 1 = 0.5 . ##EQU00002##
Moreover, since it is assumed that there is only one object
category in the first labelled result, the second consistency score
of the annotator 2 of all object category would be equal to the
first consistency score.
[0153] Taking FIG. 15 and the annotator 3 as yet another example,
since it is assumed that the target objects corresponding to the
reference regions 1410-1430 in FIG. 15 belong to one object
category, the processor 104 may calculate the first consistency
score of this object category based on the target objects
identified by the reference regions 1410-1430 and the bounding
regions labelled by the annotator 3.
[0154] In one embodiment, the first consistency score of the
annotator 3 may be obtained based on a procedure similar to the
procedure of obtaining the first consistency score of the annotator
1. That is, the first consistency score of the annotator 3 may be
obtained by dividing a first number with a second number, wherein
the second number is a sum of the first number, a third number and
a fourth number.
[0155] In FIG. 15, for the three groups used for generating the
reference regions 1410-1430, since there are three bounding regions
b.sub.3,1, b.sub.3,2 and b.sub.3,4 in the groups, the processor 104
may determine that the number of the bounding regions of the
annotator 3 that matches the identified target object of the object
category to be 3. That is, the first number of the annotator 3 is
3.
[0156] In addition, the third number may be a number of the
identified target object that does not match any of the bounding
regions of the annotator 3. In FIG. 15, since there is no
identified target object that does not match any of the bounding
regions of the annotator 3, the third number would be 0. Next,
since there is one bounding region (i.e., the bounding region
b.sub.3,3) of the annotator 3 that does not match any of the
identified target object, the fourth number would be 1.
[0157] Therefore, the first consistency score of the annotator 3 of
the object category would be
3 3 + 0 + 1 = 0.75 . ##EQU00003##
In some embodiments, the first consistency score of the annotator 3
may be used to determine whether the annotator 3 should be excluded
from the labelling tasks. For example, if the first consistency
score of the annotator 3 is lower than a specific threshold, the
annotator 3 may be determined to be unreliable for performing the
labelling tasks. Moreover, since it is assumed that there is only
one object category in the first labelled result, the second
consistency score of the annotator 3 of all object category would
be equal to the first consistency score.
[0158] After the second consistency score of each of the annotators
1, 2, and 3 is obtained, the processor 104 may take an average of
the second consistency score of each annotator to obtain the
consistency score of the first labelled result 1110. In FIG. 15,
the consistency score of the first labelled result 1110 may be
calculated as
0.75 + 0.5 + 0.75 3 = 0.66 ##EQU00004##
[0159] With the consistency score of the first labelled result
1110, the processor 104 may determine whether the consistency score
of the first labelled result 1110 is higher than a score threshold.
In various embodiments, the score threshold may be determined to be
any value that is high enough for the designer to think that the
labelling operations between the annotators are consistent, but the
disclosure is not limited thereto.
[0160] In step S1230, in response to the consistency score of the
first labelled result 1110 is higher than the score threshold, the
processor 104 may determine that the first labelled result 1110 is
valid for training the AI machine. Afterwards, step S240 in FIG. 2
may be subsequently performed. On the other hand, if the
consistency score of the first labelled result 1110 is not higher
than the score threshold, the processor 104 may determine that the
first labelled result 1110 is not valid for training the AI
machine, and hence step S250 in FIG. 2 may be subsequently
performed. The details related to step S240 and S250 may be
referred to the teaching described in the above embodiments, which
would not be repeated herein.
[0161] In other embodiments, for a dataset that includes multiple
labelled results, the processor 104 may calculate the consistency
score of each labelled result based on the teachings in the above
and calculate an overall consistency score of the dataset by taking
an average of the consistency score of each labelled result. As a
result, the administrator may accordingly determine whether the
dataset as a whole is appropriate for examining the ability of the
annotators or for training the AI machine, but the disclosure is
not limited thereto.
[0162] See FIG. 16, which shows a schematic view of handling a
situation of no target object according to an embodiment of the
disclosure. In the present embodiment, the processor 104 may
provide a function (e.g., a specific button) for an annotator to
choose if the annotator believes that there are no target object
exists in a raw data 1610. In this case, the processer 104 would
equivalently treat this situation as the annotator has labelled a
virtual bounding region 1620 at a specific position outside of the
raw data 1610 to generate a labelled result 1630.
[0163] In one embodiment, if the annotator is asked to label the
target objects of multiple object categories, the processer 104
would equivalently treat the situation in FIG. 16 as the annotator
has labelled the virtual bounding region 1620 of each object
category at the specific position outside of the raw data 1610 to
generate the labelled result 1630.
[0164] In addition, if the processer 104 determines that no
reference region is suggested based on the labelled result 1630,
the first consistency score of the annotator on each object
category may be determined to be 1, and hence the second
consistency score of the annotator of the labelled result 1630
would be 1 as well.
[0165] See FIG. 17, which shows a schematic view of handling a
situation of no target object according to another embodiment of
the disclosure. In the present embodiment, the annotators 1 and 2
are assumed to draw bounding regions 1710 and 1720 in a raw data
1705. The annotator 3 is assumed to believe that there is no target
object exists in the raw data 1705, and hence a virtual bounding
region 1730 is generated outside of the raw data 1705. Therefore, a
labelled result 1740 can be correspondingly generated.
[0166] Based on the teaching in the above, the processer 104 would
determine that no reference region is suggested in the labelled
result 1740. In this case, the processor 104 would generate a
reference region 1750 that is identical to the virtual bounding
region 1730 for facilitating the following calculations of the
consistency score.
[0167] In this situation, the first/second consistency scores of
the annotators 1 and 2 would be calculated as 0, and the
first/second consistency scores of the annotator 3 would be
calculated as 1. As a result, the consistency score of the labelled
result 1740 would be calculated as 0.33.
[0168] See FIG. 18, which shows a schematic view of handling a
situation of no target object according to yet another embodiment
of the disclosure. In the present embodiment, the annotator 1 is
assumed to label a bounding regions 1810 in a raw data 1805. The
annotators 2 and 3 are assumed to believe that there is no target
object exists in the raw data 1805, and hence virtual bounding
regions 1820 and 1830 are generated outside of the raw data 1805,
wherein the virtual bounding regions 1820 and 1830 may overlap with
each other. Therefore, a labelled result 1840 can be
correspondingly generated.
[0169] Based on the teaching in the above, the processer 104 would
determine that no reference region is suggested in the labelled
result 1840. In this case, the processor 104 would generate a
reference region 1850 that is identical to the virtual bounding
regions 1820 and 1830 for facilitating the following calculations
of the consistency score of the labelled result 1840.
[0170] In this situation, the first/second consistency scores of
the annotator 1 would be calculated as 0, and the first/second
consistency scores of the annotators 2 and 3 would be calculated as
1. As a result, the consistency score of the labelled result 1840
would be calculated as 0.66.
[0171] The present disclosure further provides a computer program
product for executing foregoing method for verifying training data.
The computer program product is composed of a plurality of program
instructions (for example, a setting program instruction and a
deployment program instruction) embodied therein. These program
instructions can be loaded into an electronic device and executed
by the same to execute the method for verifying training data and
the functions of the electronic device described above.
[0172] In summary, the disclosure may determine whether the
labelled results of the annotators are valid to train the AI
machine based on the consistencies of the annotators. Further, if
there exist questionable raw data, unqualified annotators, or like,
a notification may be accordingly created and provided to the
administrators. As such, the administrators may be aware of the
performances of the annotators and the quality of the raw data and
correspondingly take actions, such as excluding the questionable
raw data and the unqualified annotators. Moreover, when the
labelled results are fed to the AI machine, the labelled results
may be assigned with different weightings based the related
confidence levels, such that the AI machine may decide to learn
more from which of the labelled results (i.e., the labelled results
with higher confidence levels).
[0173] It will be apparent to those skilled in the art that various
modifications and variations can be made to the structure of the
present invention without departing from the scope or spirit of the
invention. In view of the foregoing, it is intended that the
present invention cover modifications and variations of this
invention provided they fall within the scope of the following
claims and their equivalents.
* * * * *