U.S. patent application number 16/311395 was filed with the patent office on 2019-11-07 for fisher's exact test calculation apparatus, method, and program.
This patent application is currently assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION. The applicant listed for this patent is NIPPON TELEGRAPH AND TELEPHONE CORPORATION, TOHOKU UNIVERSITY. Invention is credited to Koji CHIDA, Koki HAMADA, Satoshi HASEGAWA, Kazuharu MISAWA, Masao NAGASAKI.
Application Number | 20190340215 16/311395 |
Document ID | / |
Family ID | 60912169 |
Filed Date | 2019-11-07 |
![](/patent/app/20190340215/US20190340215A1-20191107-D00000.png)
![](/patent/app/20190340215/US20190340215A1-20191107-D00001.png)
![](/patent/app/20190340215/US20190340215A1-20191107-D00002.png)
![](/patent/app/20190340215/US20190340215A1-20191107-D00003.png)
![](/patent/app/20190340215/US20190340215A1-20191107-M00001.png)
![](/patent/app/20190340215/US20190340215A1-20191107-M00002.png)
![](/patent/app/20190340215/US20190340215A1-20191107-M00003.png)
![](/patent/app/20190340215/US20190340215A1-20191107-M00004.png)
![](/patent/app/20190340215/US20190340215A1-20191107-M00005.png)
![](/patent/app/20190340215/US20190340215A1-20191107-P00001.png)
United States Patent
Application |
20190340215 |
Kind Code |
A1 |
CHIDA; Koji ; et
al. |
November 7, 2019 |
FISHER'S EXACT TEST CALCULATION APPARATUS, METHOD, AND PROGRAM
Abstract
A Fisher's exact test calculation apparatus includes: a
condition storage 1 that has stored therein a condition for
determining whether a result of Fisher's exact test corresponding
to input is significant or not, the input being frequencies in a
summary table; and a calculation unit 2 that obtains the result of
Fisher's exact test corresponding to the frequencies in the summary
table by inputting the frequencies in the summary table to the
condition read from the condition storage 1.
Inventors: |
CHIDA; Koji; (Musashino-shi,
JP) ; HASEGAWA; Satoshi; (Musashino-shi, JP) ;
HAMADA; Koki; (Musashino-shi, JP) ; NAGASAKI;
Masao; (Sendai-shi, JP) ; MISAWA; Kazuharu;
(Sendai-shi, JP) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
TOHOKU UNIVERSITY |
Chiyoda-ku
Sendai-shi |
|
JP
JP |
|
|
Assignee: |
NIPPON TELEGRAPH AND TELEPHONE
CORPORATION
Chiyoda-ku
JP
TOHOKU UNIVERSITY
Sendai-shi
JP
|
Family ID: |
60912169 |
Appl. No.: |
16/311395 |
Filed: |
June 30, 2017 |
PCT Filed: |
June 30, 2017 |
PCT NO: |
PCT/JP2017/024119 |
371 Date: |
December 19, 2018 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/9027 20190101;
G16B 40/00 20190201; G06F 17/18 20130101; G16B 50/00 20190201; G06F
21/6245 20130101 |
International
Class: |
G06F 17/18 20060101
G06F017/18; G06F 16/901 20060101 G06F016/901; G16B 40/00 20060101
G16B040/00 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 6, 2016 |
JP |
2016-134085 |
Claims
1. A Fisher's exact test calculation apparatus comprising: a
condition storage that has stored therein a condition for
determining whether a result of Fisher's exact test corresponding
to input is significant or not, the input being frequencies in a
summary table; and a calculation unit that obtains the result of
Fisher's exact test corresponding to the frequencies in the summary
table by inputting the frequencies in the summary table to the
condition read from the condition storage.
2. The Fisher's exact test calculation apparatus according to claim
1, wherein the calculation unit performs calculations for obtaining
the result of Fisher's exact test corresponding to the frequencies
in the summary table while keeping the frequencies in the summary
table concealed via secure computation.
3. The Fisher's exact test calculation apparatus according to claim
1, wherein the condition is a decision tree with conditional
expressions assigned to a root node and internal nodes and a result
of Fisher's exact test assigned to each leaf node, and the
calculation unit outputs the result of Fisher's exact test assigned
to a leaf node which is determined by a value obtained by inputting
the frequencies in the summary table to the conditional expressions
assigned to the root node and the internal nodes.
4. The Fisher's exact test calculation apparatus according to claim
3, wherein conditional expressions are assigned to the root node
and the internal nodes such that, for a leaf node assigned with a
result of Fisher's exact test indicative of being significant, a
predetermined value will be obtained by inputting the frequencies
in the summary table to the conditional expressions assigned to the
root node and the internal nodes on a path from that leaf node to
the root node, and the calculation unit determines whether there is
a leaf node for which the predetermined value is obtained by
inputting the frequencies in the summary table to the conditional
expressions assigned to the root node and the internal nodes on the
path from the leaf node to the root node while concealing the
frequencies in the summary table, and if there is such a leaf node,
obtains a result that the result of Fisher's exact test for the
summary table is significant.
5. A Fisher's exact test calculation method comprising: a
calculation step in which a calculation unit obtains a result of
Fisher's exact test corresponding to frequencies in a summary table
by inputting the frequencies in the summary table to a condition
read from a condition storage, the condition storage storing a
condition for determining whether a result of Fisher's exact test
corresponding to input is significant or not, the input being
frequencies in a summary table.
6. A program for causing a computer to function as the units of
Fisher's exact test calculation apparatus according to any one of
claims 1 to 4.
Description
TECHNICAL FIELD
[0001] The present invention relates to techniques for efficiently
calculating Fisher's exact test.
BACKGROUND ART
[0002] Fisher's exact test is widely known as one of statistical
test methods. An application of Fisher's exact test is genome-wide
association study (GWAS) (see Non-patent Literature 1, for
instance). Brief description of Fisher's exact test is given
below.
TABLE-US-00001 TABLE 1 X Y Total A a b a + b G c d c + d Total a +
c b + d a + b + c + d (=n)
[0003] This table is an example of a 2.times.2 summary table that
classifies n subjects according to character (X or Y) and a
particular allele (A or G) and counts the results, where a, b, c,
and d represent frequencies (non-negative integers). In Fisher's
exact test, when the following is assumed for a non-negative
integer i,
p i = ( a + b ) ! ( c + d ) ! ( a + c ) ! ( b + d ) ! n ! i ! ( a +
b - i ) ! ( a + c - i ) ! ( d - a + i ) ! ##EQU00001##
it is determined whether there is a statistically significant
association between the character and a particular allele based on
the magnitude relationship between:
p = p a + ma x ( 0 , a - d ) .ltoreq. i .ltoreq. m i n ( a + b , a
+ c ) p i < p a p i ##EQU00002##
and a threshold T of a predetermined value. In genome-wide
association study, a summary table like the above one can be
created for each single nucleotide polymorphism (SNP) and Fisher's
exact test can be performed on each one of the summary tables.
Genome-wide association study involves an enormous number of SNPs
on the order of several millions to tens of millions. Thus, in
genome-wide association study, there can be a situation where a
large quantity of Fisher's exact test is performed.
[0004] Meanwhile, in view of the sensitivity or confidentiality of
genome information, some prior studies are intended to perform
genome-wide association study while concealing genome information
via encryption techniques (see Non-patent Literature 2, for
instance). Non-patent Literature 2 proposes a method of performing
a chi-square test while concealing genome information.
PRIOR ART LITERATURE
Non-Patent Literature
[0005] Non-patent Literature 1: Konrad Karczewski, "How to do a
GWAS", GENE 210: Genomics and Personalized Medicine, 2015. [0006]
Non-patent Literature 2: Yihua Zhang, Marina Blanton, and Ghada
Almashaqbeh, "Secure distributed genome analysis for GWAS and
sequence comparison computation", BMC Med. Inform. Decis. Mak.,
2015.
SUMMARY OF THE INVENTION
Problems to be Solved by the Invention
[0007] Since Fisher's exact test requires calculation of min(a+b,
a+c)-max(0, a-d)+1l types of p.sub.i and Fisher's exact test can be
conducted on individual ones of a large quantity of summary tables
particularly in the case of genome-wide association study, it could
involve an enormous processing time depending on the computer
environment and/or the frequencies in the summary tables.
[0008] An object of the present invention is to provide a Fisher's
exact test calculation apparatus, method, and program for
performing calculations for Fisher's exact test in a more efficient
manner than conventional arts.
Means to Solve the Problems
[0009] A Fisher's exact test calculation apparatus according to an
aspect of the present invention includes: a condition storage that
has stored therein a condition for determining whether a result of
Fisher's exact test corresponding to input is significant or not,
the input being frequencies in a summary table; and a calculation
unit that obtains the result of Fisher's exact test corresponding
to the frequencies in the summary table by inputting the
frequencies in the summary table to the condition read from the
condition storage.
Effects of the Invention
[0010] By performing precomputation, calculations for Fisher's
exact test in main calculation can be performed in a more efficient
manner than conventional arts.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a block diagram for describing an example of a
Fisher's exact test calculation apparatus.
[0012] FIG. 2 is a flow diagram for describing an example of a
Fisher's exact test calculation method.
[0013] FIG. 3 is a diagram showing an example of a decision
tree.
DETAILED DESCRIPTION OF THE EMBODIMENT
[0014] An embodiment of the present invention is described below
with reference to the drawings.
[0015] As shown in FIG. 1, a Fisher's exact test calculation
apparatus includes, for example, a condition storage 1 and a
calculation unit 2. A Fisher's exact test calculation method is
implemented by the calculation unit 2 of the Fisher's exact test
calculation apparatus performing the processing at step S2
described in FIG. 2 and below.
[0016] It is assumed that a total sum n of the frequencies in a
summary table and the value of a threshold T representing a
significance level are predetermined.
<Condition Storage 1>
[0017] The condition storage 1 has stored therein a condition for
determining whether the result of Fisher's exact test corresponding
to input is significant or not, the input being the frequencies in
a summary table.
[0018] For instance, the result of Fisher's exact test is
determined in advance for each set (a', b', c', d') of all the
non-negative integers that satisfy n=a'+b'+c'+d', and (a', b', c',
d') and the corresponding result of Fisher's exact test are stored
in the condition storage 1 in association with each other. The
Fisher's exact test calculation apparatus may include a
precomputation unit 3 for performing this precomputation.
[0019] The condition stored in the condition storage 1 may be a
decision tree with conditional expressions assigned to a root node
and internal nodes and a result of Fisher's exact test assigned to
each leaf node. An example of a decision tree for a case with n=50
and the predetermined threshold T=10.sup.-8 is shown in FIG. 3.
Such a decision tree may be created using an existing machine
learning method.
[0020] For instance, such a tree can be created with the methods
described in Reference Literatures 1 to 3 based on the sets (a',
b', c', d') of all the non-negative integers satisfying
n=a'+b'+c'+d' and on a formula for determining the result of
Fisher's exact test. [0021] Reference Literature 1: Leo Breiman,
Jerome H. Friedman, Richard A. Olshen, Charles J. Stone,
"Classification and Regression Trees (Wadsworth
Statistics/Probability)", Wadsworth & Brooks/Cole Advanced
Books & Software. [0022] Reference Literature 2: John R.
Quinlan, "Induction of decision trees", Machine learning, 1986, P.
81-106. [0023] Reference Literature 3: Trevor Hastie, Robert
Tibshirani, Jerome Friedman, "The Elements of Statistical Learning
Data Mining, Inference, and Prediction, Second Edition", Springer
Series in Statistics.
[0024] In FIG. 3, the conditional expressions assigned to the root
node and the internal nodes are formulas each representing the
magnitude relationship between one of V1=a, V3=c, V9=ad, V11=bc,
V12=bd, V14=cd, V21=abc, V25=ad.sup.2, V30=bcd and the threshold
corresponding to each node. In FIG. 3, 14e+3 means 14*10.sup.3 and
13e+3 means 13*10.sup.3. For the decision tree of FIG. 3, it is
assumed that, when the frequencies (a, b, c, d) in a summary table
to be subjected to Fisher's exact test are input to the conditional
expression corresponding to a certain node, transition is made to
the child node on the left side of that node if the conditional
expression holds, and to the child node on the right side of that
node if the conditional expression does not hold. In the decision
tree of FIG. 3, "FALSE" at a leaf node means p>T and the result
of Fisher's exact test is not significant, and "TRUE" at a leaf
node means p.ltoreq.T and the result of Fisher's exact test is
significant.
[0025] <Calculation Unit 2>
[0026] The calculation unit 2 obtains the result of Fisher's exact
test corresponding to the frequencies in the summary table
subjected to Fisher's exact test by inputting the frequencies in
the summary table to the condition read from the condition storage
1 (step S2).
[0027] For example, when (a', b', c', d') and the corresponding
result of Fisher's exact test are stored in the condition storage 1
in association with each other, the calculation unit 2 obtains the
result of Fisher's exact test corresponding to the frequencies (a,
b, c, d) in the input summary table by referencing the condition
storage 1. More specifically, the calculation unit 2 obtains the
result of Fisher's exact test corresponding to the same frequencies
(a', b', c', d') as the frequencies (a, b, c, d) in the input
summary table by referencing the condition storage 1, and outputs
it as the result of Fisher's exact test corresponding to the
frequencies (a, b, c, d) in the input summary table. Alternatively,
only the frequencies in summary tables for which the result of
Fisher's exact test is significant may be stored in the condition
storage 1, and the result of Fisher's exact test may be determined
to be significant if the same frequencies (a', b', c', d') as the
frequencies (a, b, c, d) in the input summary table are present in
the condition storage 1. Similarly, the frequencies in summary
tables for which the result of Fisher's exact test is not
significant may be stored.
[0028] In a case where the condition storage 1 stores a decision
tree with conditional expressions assigned to the root node and the
internal nodes and a result of Fisher's exact test assigned to each
leaf node, the calculation unit 2 may output the result of Fisher's
exact test assigned to the leaf node which is determined by a value
obtained by inputting the frequencies in the summary table to the
conditional expressions assigned to the root node and the internal
nodes.
[0029] For example, processing may be performed as described below
when data for the decision tree of FIG. 3 is stored in the
condition storage 1 as the condition and this decision tree of FIG.
3 is used to obtain the result of Fisher's exact test corresponding
to the frequencies (a, b, c, d) in the input summary table.
[0030] The calculation unit 2 first determines by calculation
whether the conditional expression "V25>=14e+3" at the topmost
or root node, that is, ad.sup.2.gtoreq.14000, is satisfied or not.
If it is satisfied, the calculation unit 2 proceeds to the
conditional expression of the child node on the left side of the
root node, and if it is not satisfied, the calculation unit 2
proceeds to the conditional expression of the child node on the
right side of the root node. Subsequently, the calculation unit 2
repeatedly performs a similar process until it reaches a leaf node.
The calculation unit 2 then outputs the result of Fisher's exact
test corresponding to the leaf node reached.
[0031] In the example of a decision tree in FIG. 3, it can be seen
that the result of Fisher's exact test is obtained with no more
than seven magnitude comparisons. In contrast, calculation of
p = p a + ma x ( 0 , a - d ) .ltoreq. i .ltoreq. m i n ( a + b , a
+ c ) p i < p a p i ##EQU00003##
requires determination of about n/2=25 p.sub.i's in the worst case.
p.sub.i in turn consists of multiplications and divisions of nine
factorial values, which can be efficiently determined by taking a
logarithm and performing precomputation as shown below.
p i = ( a + b ) ! ( c + d ) ! ( a + c ) ! ( b + d ) ! n ! i ! ( a +
b - i ) ! ( a + c - i ) ! ( d - a + i ) ! hence , log p a = j = 1 a
+ b log j + j = 1 c + d log j + j = 1 a + c log j + j = 1 b + d log
j - j = 1 n log j - j = 1 a log j - j = 1 b log j - j = 1 c log j -
j = 1 d log j Formula A ##EQU00004##
[0032] thus, by precomputing .SIGMA..sub.i=1.sup.k log j for k=0-,
1, 2, . . . , n and determining log p.sub.a with the precomputed
value, it can be calculated just by additions and subtractions of
precomputed values. Then, p.sub.i is determined from log p.sub.i.
Here, .SIGMA..sub.i=1.sup.0 log j=0 holds.
[0033] For determining p.sub.i, the aforementioned computation
requires nine matching processes, eight additions and subtractions,
and determination of p.sub.i from log p.sub.i. Accordingly, at
least 25.times.9=225 matching processes are required, meaning a
number of calculations 225/7=32.1 times as large as just seven
magnitude comparisons. Further considering that a matching process
takes a processing time equivalent to or longer than that of
magnitude comparison, use of a decision tree is expected to lead to
a shorter processing time as well.
[0034] The calculation unit 2 may also perform calculations for
obtaining the result of Fisher's exact test corresponding to the
frequencies (a, b, c, d) in the input summary table subjected to
Fisher's exact test while keeping the frequencies (a, b, c, d)
concealed via secure computation.
[0035] Such a calculation can be carried out by combining
encryption techniques capable of magnitude comparison while
concealing the input and output. In the following, magnitude
comparison with the input and output concealed (hereinafter
abbreviated as input/output-concealed magnitude comparison) is
described. Assume that two values x and y for magnitude comparison
are the input and at least one of x and y is encrypted such that
its real numerical value is not known. In the present description,
only x is encrypted, which is denoted as E(x). The result of
magnitude comparison, which is to be output, is defined as:
z = { 1 if x .gtoreq. y 0 otherwise ##EQU00005##
[0036] That is to say, the input/output-concealed magnitude
comparison means determining cipher text E(z) for the result of
magnitude comparison by using E(x),y as the input and without
decrypting E(x). When z is the result to be finally obtained, E(z)
is appropriately decrypted. An example of such
input/output-concealed magnitude comparison is the method described
in Reference Literatures 4 and 5, for example. [0037] Reference
Literature 4: Ivan Damgard, Matthias Fitzi, Eike Kiltz, Jesper B.
Nielsen, Tomas Toft, "Unconditionally secure constant-rounds
multi-party computation for equality, comparison, bits and
exponentiation", In Proc. 3rd Theory of Cryptography Conference, T
C C 2006, volume 3876 of Lecture Notes in Computer Science, pages
285-304, Berlin, 2006, Springer-Verlag [0038] Reference Literature
5: Takashi Nishide, Kazuo Ohta, "Multiparty Computation for
Interval, Equality, and Comparison Without Bit-Decomposition
Protocol", Public Key Cryptography--PKC 2007, 10th International
Conference on Practice and Theory in Public-Key Cryptography, 2007,
P. 343-360
[0039] A method for determining the result of Fisher's exact test
from the decision tree of FIG. 3 using input/output-concealed
magnitude comparison is described below.
[0040] Assume that the input is cipher text (E(a), E(b), E(c),
E(d)) of the frequencies (a, b, c, d) in a summary table and data
on the decision tree of FIG. 3. When input/output-concealed
magnitude comparison is not used, it is first determined by
calculation whether the conditional expression "V25>=14e+3" at
the topmost or root node, that is, ad.sup.2.gtoreq.14000, is
satisfied or not as described above. If it is satisfied, the flow
proceeds to the conditional expression of the child node on the
left side of the root node, and if it is not satisfied, the flow
proceeds to the conditional expression of the child node on the
right side of the root node. Subsequently, a similar process is
repeatedly performed until a leaf node is reached. Then, the result
of Fisher's exact test corresponding to the leaf node reached is
output.
[0041] When this is carried out using input/output-concealed
magnitude comparison, E(ad.sup.2) is first calculated for the
conditional expression ad.sup.2.gtoreq.14000 from the input E(a),
E(d) without decrypting them. As this method is described in
Reference Literatures 4 and 5, for instance, specific description
is omitted. Then, using input/output-concealed magnitude
comparison, E([ad.sup.2.gtoreq.14000]) is calculated from
E(ad.sup.2) and from 14000 of the conditional expression.
[0042] Here, [ad.sup.2.gtoreq.14000] is 1 if ad.sup.2.gtoreq.14000,
and 0 otherwise. The flow next proceeds to either the lower left
conditional expression "V30<94" or the lower right conditional
expression "V9>=485"; however, to which of them to proceed is
not known because the value of [ad.gtoreq.14000] is concealed.
Thus, for each one of all the paths that start at the root node and
reach a leaf node assigned with TRUE, calculations are performed to
determine whether the conditional expressions of the nodes on that
path are satisfied, and then whether a leaf node assigned with TRUE
has been reached is output as the final output, namely the Fisher's
exact test result.
[0043] For example, in the case of the path that reaches the
leftmost leaf node assigned with TRUE, the first four of the
conditional expressions, "V25.gtoreq.14e+3", "V30<94",
"V12<38", "V14<134", "V12<36", and "V3<1.5" should be
Yes and the remaining two should be No. Thus, conditions are
established so that cipher text of "1" will be returned if the
first four are Yes and the remaining two are No, and cipher text of
"0" will be returned otherwise. For example, for the sake of
simplicity, the conditional expressions "V12<36" and "V3<1.5"
are replaced with "V12.gtoreq.36" and "V3.gtoreq.1.5" respectively
so that cipher text of "1" will be returned if all the conditional
expressions result in Yes. Data on the decision tree after such
replacement is stored in the condition storage 1 as the condition
in advance.
[0044] Here, for the conditional expressions "V25.gtoreq.14e+3",
"V30<94", "V12<38", "V14<134", "V12.gtoreq.36", and
"V3.gtoreq.1.5", cipher text of their results (referred to as
E(z.sub.1), E(z.sub.2), E(z.sub.3), E(z.sub.4), E(z.sub.5), and
E(z.sub.6), respectively) is determined using
input/output-concealed magnitude comparison. In order to return
cipher text of "1" only if all the conditional expressions result
in Yes, E(z.sub.1z.sub.2z.sub.3z.sub.4z.sub.5z.sub.6) is determined
from E(z.sub.1), E(z.sub.2), E(z.sub.3), E(z.sub.4), E(z.sub.5),
and E(z.sub.6). That is, the calculation unit 2 performs
multiplication while keeping the content of the cipher text
concealed. As this method is also described in Reference
Literatures 4 and 5, for instance, specific description is omitted.
Subsequently, a similar process is performed for all the paths that
reach a leaf node assigned with TRUE, and multiplication is further
performed while concealing the content of the cipher text for all
of the resulting cipher text (16 patterns of cipher text that reach
TRUE, including E(z.sub.1z.sub.2z.sub.3z.sub.4z.sub.5z.sub.6)).
[0045] The result is cipher text of "1" if the Fisher's exact test
result is TRUE and cipher text of "0" otherwise, so that the
Fisher's exact test result can be obtained by decrypting the
result. That is, if such calculation is performed using cipher text
(E(a), E(b), E(c), E(d)) of the frequencies (a, b, c, d) in an
input summary table and there is a path for which the result of
decrypting E(z.sub.1z.sub.2z.sub.3z.sub.4z.sub.5z.sub.6) is 1, it
may be determined that the result of Fisher's exact test for that
input summary table is significant.
[0046] Further, in order to also conceal which leaf node has been
reached, a logical sum for plaintext,
z.sub.1jz.sub.2jz.sub.3jz.sub.4jz.sub.5jz.sub.6j, for example, is
determined by secure computation for the cipher text
E(z.sub.1jz.sub.2jz.sub.3jz.sub.4jz.sub.5jz.sub.6j) (j=1, 2, . . .
) obtained for all the paths that reach a leaf node assigned with
TRUE. That is, this makes use of the fact that
z.sub.1jz.sub.2jz.sub.3jz.sub.4jz.sub.5jz.sub.6j=1 if the result of
decryption is 1 even for only one cipher text
E(z.sub.1jz.sub.2jz.sub.3jz.sub.4jz.sub.5jz.sub.6j) (j=1, 2, . . .
). Secure computation for logical sum may be performed by the
method described in Reference Literature 1, for example.
[0047] That is, the result may also be obtained in the following
manner. Results indicative of being significant or not in Fisher's
exact test are assigned to leaf nodes. Conditional expressions are
assigned to the respective nodes on a path from the root node to a
leaf node. Starting from the root node, by inputting the
frequencies in a summary table to the conditional expression of
each node and branching from the node, a path to one leaf node can
be followed. This can obtain the result assigned to the leaf node,
indicative of being significant or not in Fisher's exact test. The
calculation unit 2 can evaluate the conditional expression of each
node and obtain a path leading to one leaf node while concealing
the frequencies in the summary table, thus obtaining the result of
Fisher's exact test while concealing the frequencies in the summary
table.
[0048] In this manner, by determining the correspondence between
the frequency patterns of summary tables and the results of
Fisher's exact test (for example, a binary value of TRUE or FALSE)
in advance and creating a condition under which the result of
Fisher's exact test is TRUE (or FALSE), main calculation
(calculation excluding preliminary calculation portions) can obtain
the result of Fisher's exact test just by inputting the frequencies
in a summary table to the condition and performing calculations,
without calculating p.sub.i. As calculation of p.sub.i is not
necessary, calculations for Fisher's exact test can be performed
efficiently.
[0049] In addition, by making precomputation results available for
public use, computers with lower computational ability on which
execution of Fisher's exact test has been difficult may become able
to perform Fisher's exact test by referencing the precomputation
results. Specifically, effects such as reduced usage of calculation
resources and/or a shortened processing time are expected to be
achieved.
[0050] Further, the above-described concealment enables Fisher's
exact test to be executed while concealing genome information and
various kinds of associated data, for example. This allows, for
example, multiple research institutions to obtain the result of
executing Fisher's exact test on combined data while concealing the
genome data possessed by the individual institutions and without
revealing it to one another, which potentially leads to provision
of execution environments for genome analysis of an extremely high
security level and hence further development of medicine.
[0051] [Program and Recording Medium]
[0052] The processes described in connection with the Fisher's
exact test calculation apparatus and method may be executed not
only in a chronological order in accordance with the order of their
description but in a parallel manner or separately depending on the
processing ability of the apparatus executing the processes or any
necessity.
[0053] Also, when the processes of the Fisher's exact test
calculation apparatus are to be implemented by a computer, the
processing specifics of the functions to be provided by the
Fisher's exact test calculation apparatus are described by a
program. By the program then being executed by the computer, the
processes are embodied on the computer.
[0054] The program describing the processing specifics may be
recorded on a computer-readable recording medium. The
computer-readable recording medium may be any kind of media, such
as a magnetic recording device, an optical disk, a magneto-optical
recording medium, and semiconductor memory.
[0055] Processing means may be configured through execution of a
predetermined program on a computer or at least some of the
processing specifics thereof may be embodied in hardware.
[0056] It will be appreciated that modifications may be made as
appropriate without departing from the scope of the present
invention.
INDUSTRIAL APPLICABILITY
[0057] The Fisher's exact test calculation apparatus and method of
the present invention are applicable to performing Fisher's exact
test via secure computation while keeping information on summary
tables concealed in an analysis utilizing Fisher's exact test, for
example, genome-wide association study, genome analysis, clinical
research, social survey, academic study, analysis of experimental
results, marketing research, statistical calculations, medical
information analysis, customer information analysis, and sales
analysis. In the case of genome-wide association study, the input
to the Fisher's exact test calculation apparatus and method may be
the frequencies in a 2.times.2 summary table that classifies n
subjects according to whether they have a particular character and
whether they have a particular allele (for example, A or G) and
counts the results, for example.
* * * * *