U.S. patent application number 13/845999 was filed with the patent office on 2013-10-17 for file creating method for searching of data, searching method of data file and managing system for searching of data file.
The applicant listed for this patent is Jong Sun JUNG. Invention is credited to Jong Sun JUNG.
Application Number | 20130275462 13/845999 |
Document ID | / |
Family ID | 40483165 |
Filed Date | 2013-10-17 |
United States Patent
Application |
20130275462 |
Kind Code |
A1 |
JUNG; Jong Sun |
October 17, 2013 |
FILE CREATING METHOD FOR SEARCHING OF DATA, SEARCHING METHOD OF
DATA FILE AND MANAGING SYSTEM FOR SEARCHING OF DATA FILE
Abstract
A method for creating/storing a file that facilitates search of
data stored in a storage medium, and a data search method using the
same are disclosed. The file creating method creates a rack of
virtual RAM (RAM) file that is divided into several units according
to divisional units for individual divisional units, and a record
allocation table (RAT) file that stores a record position of each
divisional unit of the RVR file. As a result, a database (DB) of
large-volume irregular data can be easily created, and data
analysis can be quickly achieved.
Inventors: |
JUNG; Jong Sun; (Goyang-si,
KR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
JUNG; Jong Sun |
Goyang-si |
|
KR |
|
|
Family ID: |
40483165 |
Appl. No.: |
13/845999 |
Filed: |
March 18, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13003649 |
Mar 21, 2011 |
8423513 |
|
|
PCT/KR2009/003790 |
Jul 10, 2009 |
|
|
|
13845999 |
|
|
|
|
Current U.S.
Class: |
707/769 |
Current CPC
Class: |
G11B 2220/2516 20130101;
G06F 16/148 20190101; G11B 20/1252 20130101; G06F 16/13 20190101;
G11B 27/329 20130101 |
Class at
Publication: |
707/769 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 11, 2008 |
KR |
10-2008-0067778 |
Claims
1. A file creation method for searching for a single irregular data
file, the method comprising: (A1) receiving a divisional unit of
data as an input; (A2) discriminating the single irregular data
file using the received divisional unit, and creating a rack of
virtual RAM (RVR) file; (A3) detecting a record position for each
divisional unit of the RVR file, and creating a record allocation
table (RAT) file; and (A4) storing the RVR file and the RAT
file.
2. The method according to claim 1, wherein: the data is regular
data, and the divisional unit is any one of [page], [paragraph],
[line] and [word].
3. The method according to claim 1, wherein the record position is
the size of data accumulated in the single data extended up to a
specific position where corresponding data is recorded.
4. The method according to claim 1, wherein the record position is
a number of a hard disk cluster in which data of a corresponding
part is recorded.
5. A file creation method for searching for a single regular data
file, the method comprising: (B1) discriminating between a row and
a column of one regular data, and creating a rack of virtual RAM
(RVR) file; (B2) detecting a record position for each row or column
of the RVR file, and creating a record allocation table (RAT) file;
and (B3) storing the RVR file and the RAT file.
6. The method according to claim 5, wherein the record position is
the size of data accumulated in the single data extended up to a
specific position where corresponding data is recorded.
7. The method according to claim 5, wherein the record position is
a number of a hard disk cluster in which data of a corresponding
part is recorded.
8. The method according to claim 5, wherein the record position is
the size of data accumulated in the single data extended up to a
specific position where corresponding data is recorded.
9. The method according to claim 5, wherein the record position is
a number of a hard disk cluster in which data of a corresponding
part is recorded.
10. A method for searching for a single data file, the method
comprising: (C1) receiving search information; (C2) detecting a
record position contained in single data corresponding to searched
information from a record allocation table (RAT) file; (C3)
detecting a physical storage position contained in a storage medium
of data corresponding to the searched information from the record
position; and (C4) searching for data of a physical position of the
data, and outputting the searched result.
11. The method according to claim 10, wherein, if the single data
is irregular data, the searched information is an order of each
divisional unit.
12. The method according to claim 10, wherein, if the single data
is regular data, the searched information is a number of a row or
column of corresponding data from among the regular data.
13. The method according to any one of claim 10, wherein the record
position is the size of data accumulated in the single data
extended up to a specific position where corresponding data is
recorded.
14. The method according to claim 13, wherein the detecting step
(C3) of the storage position includes: calculating a cluster
position from the record position using a size of data of each
divisional unit, reading a physical storage position of the cluster
position from a file allocation table (FAT), and detecting the read
physical storage position of the cluster position.
15. The method according to any one of claim 10, wherein the record
position is a number of a hard disk cluster in which data of a
corresponding part is recorded.
16. A method for searching for a data file, the method comprising:
(D1) fragmenting a genome base sequence into predetermined-sized
base units; (D2) allocating a unique number to each fragmented base
unit; (D3) storing a storage position of each base unit; and (D4)
creating a record allocation table (RAT) file.
17. The method according to claim 16, wherein the unique number in
the step (D2) is a tetramal number classified according to bases
constructing the base unit.
18. A method for searching for a data file, the method comprising:
(E1) assigning a serial number to input data according to a
divisional unit; (E2) calculating a serial number of data that
includes a word contained in the input data; (E3) creating a hash
table including the word and the serial number; and (E4) creating a
record allocation table (RAT) using the hash table.
Description
CROSS REFERENCE TO PRIOR APPLICATIONS
[0001] The present application is a Divisional Application of
co-pending U.S. patent application Ser. No. 13/003,649 (filed on
Mar. 21, 2011) under 35 U.S.C. .sctn.120, which is a National Stage
Patent Application of International Patent Application No.
PCT/KR2009/003790 (filed on Jul. 10, 2009) under 35 U.S.C.
.sctn.371, which claims priority to Korean Patent Application No.
10-2008-0067778 (filed on Jul. 11, 2008), which are all hereby
incorporated by reference in their entirety.
BACKGROUND
[0002] The present invention relates to a method for creating and
storing a file that enables easier searching and a method for
searching for data using the same.
[0003] FIG. 1 is a conceptual diagram illustrating data stored in a
general hard disk. The hard disk constructs a cylinder composed of
a plurality of tracts constructing an original plate, and performs
input/output (I/O) operations through a Read/Write header connected
to a boom of each tract. In FIG. 1, it is assumed that the smallest
data unit (i.e., record) is stored in each of the 1.sup.st,
2.sup.nd, 3.sup.rd, 4.sup.th, . . . i-1.sup.th, i.sup.th, and
N.sup.th sectors. The term `cluster` means a set of neighboring
sectors. A file manager may arrange a cluster and a physical
position using a File Allocation Table (FAT).
[0004] In the FAT system, records are sequentially arranged in a
plurality of clusters. In order to search for record information of
an i-th sector located in an intermediate stage, the FAT system
sequentially processes tracks from a first sector to the i-th
sector, and finally arrives at the i-th sector, such that it can
search for records contained in the first to i-th sectors.
[0005] On the other hand, when using a Random Access Memory (RAM),
in order to quickly extract necessary information from files
including either variables or variable names, it is necessary for
all variables to be processed by a Dynamic Random Access Memory
(DRAM) in a programming process, such that the RAM can immediately
search for a position in which the corresponding variable name is
stored. As a result, necessary information can be quickly found in
RAM.
[0006] However, as DRAM capacity increases, the price of a DRAM
serving as a semiconductor material rapidly increases as compared
to a hard disk, resulting in a reduction of the cost efficiency of
large amount of data that requires more than 128 Gigabytes.
Therefore, in order to store large amounts of data, hard disks have
been more widely used than DRAMs throughout the world.
[0007] Therefore, disc formats of the conventional art have the
following disadvantages.
[0008] In other words, when using a sequential access method in the
same manner as in a disc to search through large amounts of stored
data, the access speed geometrically varies with the size of data
as compared to a random access speed of a data record.
[0009] In addition, provided that the conventional art
pre-calculates random access addresses (highly integrated indexes)
of all data records and does not store the calculated addresses in
external storage, the access speed geometrically changes with the
data size.
[0010] Specifically, in recent times, with the increasing
development of biotechnology, large amounts of dielectric clinical
genetic function--related data such as genomics or omics data
(large capacity biological information) has been accumulated, and
researchers can extract useful information through calculation
using the resultant data. The size of each irregular data (each
irregular data) is about several to tens of terabytes, and it is
expected that the size of each irregular data is about pentabytes
during the execution of a greater project. In this case, a speed
difference in data access time between the sequential access method
and the random access method based on the highly integrated index
technology may be several days to several years, such that the
conventional art will be incapable of implementing data access or
data search.
SUMMARY OF THE INVENTION
[0011] Accordingly, the present invention is directed to a method
for creating a file for data search, a method for searching for a
data file, and a database management system for searching for the
data file, that substantially obviate one or more problems due to
limitations and disadvantages of the related art.
[0012] It is an object of the present invention to provide a method
for constructing a Record Allocation Table (RAT) for a variety of
records of all constituent units (i.e., page, paragraph, line,
word, string, integer, and float) of a large amount of data,
performing random access of position information (i.e., address) on
a hard disk, implementing a database management system (DBMS) for
large volumes of irregular data, and allowing a hard disk to search
for data as quickly as in a DRAM.
[0013] It is another object of the present invention to provide a
method for analyzing and calculating large volumes of data,
allowing a huge amount of data not to be processed in a DRAM (DRAMs
of more than 128 gigabytes are very expensive, resulting in a
reduction in practical use), and controlling the huge amount of
data to be processed in a hard disk at a speed similar to a DRAM
access speed.
[0014] It is yet another object of the present invention to provide
a data processing method for quickly and effectively searching a
large file, thereby facilitating intensive research into clustering
of large amounts of data.
[0015] In accordance with the present invention, the above and
other objects can be accomplished by the provision of a file
creation method for searching for a single irregular data file, the
method including: (A1) receiving a divisional unit of data as an
input; (A2) discriminating the single irregular data file using the
received divisional unit, and creating a rack of virtual RAM (RVR)
file; (A3) detecting a record position for each divisional unit of
the RVR file, and creating a record allocation table (RAT) file;
and (A4) storing the RVR file and the RAT file.
[0016] The data may be regular data, and the divisional unit may be
any one of [page], [paragraph], [line] and [word].
[0017] In accordance with another aspect of the present invention,
a file creation method for searching for a single regular data file
includes: (B1) discriminating between a row and a column of one
regular data, and creating a rack of virtual RAM (RVR) file; (B2)
detecting a record position for each row or column of the RVR file,
and creating a record allocation table (RAT) file; and (B3) storing
the RVR file and the RAT file.
[0018] The record position may be the size of data accumulated in
the single data extended up to a specific position where
corresponding data is recorded.
[0019] The record position may be a number of a hard disk cluster
in which data of a corresponding part is recorded.
[0020] In accordance with another aspect of the present invention,
a method for searching for a single data file includes: (C1)
receiving search information; (C2) detecting a record position
contained in single data corresponding to searched information from
a record allocation table (RAT) file; (C3) detecting a physical
storage position contained in a storage medium of data
corresponding to the searched information from the record position;
and (C4) searching for data of a physical position of the data, and
outputting the searched result.
[0021] If the single data is irregular data, the searched
information may be an order of each divisional unit.
[0022] If the single data is regular data, the searched information
may be a number of a row or column of corresponding data from among
the regular data.
[0023] The record position may be the size of data accumulated in
the single data extended up to a specific position where
corresponding data is recorded.
[0024] The detecting step (C3) of the storage position may include
calculating a cluster position from the record position using a
size of data of each divisional unit, reading a physical storage
position of the cluster position from a file allocation table
(FAT), and detecting the read physical storage position of the
cluster position.
[0025] The record position may be a number of a hard disk cluster
in which data of a corresponding part is recorded.
[0026] In accordance with another aspect of the present invention,
a system for managing a database (DB) to search for a data file
includes: a database (DB) for storing a rack of virtual RAM (RVR)
file created by discriminating a single input irregular data file
using a predetermined divisional unit, and a record allocation
table (RAT) file created by detecting a record position for each
divisional unit of the RVR file; a rack of virtual RAM (RVR)
controller for detecting a record position of searched information
from the RAT file in association with input searched information,
detecting a physical storage position contained in a storage medium
of data corresponding to the searched information from the record
position, searching for data of the physical position, and reading
the searched data; and an analysis module for analyzing a result
read by the RVR controller.
[0027] The divisional unit may be any one of [pagen], [page],
[fastan], [fasta], [line], [image], [audio] and [video].
[0028] The searched information may be an order of each divisional
unit.
[0029] The record position may be the size of data accumulated in
the single data extended up to a specific position where
corresponding data is recorded.
[0030] The record position may be a number of a hard disk cluster
in which data of a corresponding part is recorded.
[0031] The storage medium may be a semiconductor storage
medium.
[0032] In accordance with another aspect of the present invention,
a system for managing a database (DB) to search for a data file
includes: a database (DB) for storing a rack of virtual RAM (RVR)
file created by discriminating a single input data file using a
regular divisional unit based on a row and column, and a record
allocation table (RAT) file created by detecting a record position
for each divisional unit of the RVR file; a rack of virtual RAM
(RVR) controller for detecting a record position of searched
information from the RAT file in association with input searched
information, detecting a physical storage position contained in a
storage medium of data corresponding to the searched information
from the record position, searching for data of the physical
position, and reading the searched data; and an analysis module for
analyzing a result read by the RVR controller.
[0033] The divisional unit of the regular data may be any one of
[seq], [int], [float], [string], [csv], [r], [xml] and [smtx].
[0034] The searched information may be a row number or a column
number of corresponding data from among the regular data.
[0035] The record position may be the size of data accumulated in
the single data extended up to a specific position where
corresponding data is recorded.
[0036] The detecting of the storage position may include
calculating a cluster position from the record position using a
size of data of each divisional unit, reading a physical storage
position of the cluster position from a file allocation table
(FAT), and detecting the read physical storage position of the
cluster position.
[0037] The record position may be a number of a hard disk cluster
in which data of a corresponding part is recorded.
[0038] In accordance with another aspect of the present invention,
a method for searching for a data file includes: (D1) fragmenting a
genome base sequence into predetermined-sized base units; (D2)
allocating a unique number to each fragmented base unit; (D3)
storing a storage position of each base unit; and (D4) creating a
record allocation table (RAT) file.
[0039] The unique number in the step (D2) may be a tetramal number
classified according to bases constructing the base unit.
[0040] In accordance with another aspect of the present invention,
a method for searching for a data file, the method includes: (E1)
assigning a serial number to input data according to a divisional
unit; (E2) calculating a serial number of data that includes a word
contained in the input data; (E3) creating a hash table including
the word and the serial number; and (E4) creating a record
allocation table (RAT) using the hash table.
BRIEF DESCRIPTION OF THE DRAWINGS
[0041] The above and other objects, features and other advantages
of the present invention will be more clearly understood from the
following detailed description taken in conjunction with the
accompanying drawings, in which:
[0042] FIG. 1 is a conceptual diagram illustrating data stored in a
general hard disk.
[0043] FIG. 2 exemplarily shows the relationship among a data file,
a RAT file, and an RVR file according to the embodiments of the
present invention.
[0044] FIG. 3 exemplarily shows the relationship among an RVR file,
a RAT file, and data stored in a disc according to the embodiments
of the present invention.
[0045] FIG. 4 exemplarily shows the relationship between a data
file and a RAT file according to the embodiments of the present
invention.
[0046] FIGS. 5 and 6 show examples for creating a RAT file and an
RVR file from a data file when stored data of the present invention
is a general document.
[0047] FIGS. 7 and 8 show examples for creating a RAT file and an
RVR file from a data file when stored data of the present invention
is a matrix.
[0048] FIG. 9 exemplarily shows a program and source code for
performing record and access functions of RVR and RAT files
according to the embodiments of the present invention.
[0049] FIG. 10 is a flowchart illustrating a method for creating an
RVR file and a RAT file according to the embodiments of the present
invention.
[0050] FIG. 11 is a flowchart illustrating a method for searching
for data according to the embodiments of the present invention.
[0051] FIGS. 12 and 13 show the result of comparison between a data
access speed of the present invention and a sequential data access
speed of a general hard disk.
[0052] FIG. 14 is a block diagram illustrating an RVR DBMS
according to the present invention.
[0053] FIG. 15 exemplarily shows divisional units for classifying
irregular data by an RVR DBMS according to the present
invention.
[0054] FIG. 16 exemplarily shows divisional units for classifying
regular data by an RVR DBMS according to the present invention.
[0055] FIG. 17 exemplarily shows a method for adding, deleting,
updating, and searching for data by an RVR DBMS according to the
present invention.
[0056] FIG. 18 is a flowchart illustrating a method for creating an
RVR file and a RAT file of base sequence data by an RVR DBMS
according to the present invention.
[0057] FIG. 19 exemplarily shows a method for creating a RAT file
and a RAT file of base sequence data by an RVR DBMS according to
the present invention.
[0058] FIG. 20 is a flowchart illustrating a method for creating an
RVR file and a RAT file of large-scale abstract data by an RVR DBMS
according to the present invention.
[0059] FIG. 21 is a conceptual diagram illustrating a method for
creating an RVR file and a RAT file of large-scale abstract data by
an RVR DBMS according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0060] Reference will now be made in detail to the preferred
embodiments of the present invention, examples of which are
illustrated in the accompanying drawings. In the drawings, the same
or similar elements are denoted by the same reference numerals even
though they are depicted in different drawings. In the following
description, a detailed description of known functions and
configurations incorporated herein will be omitted when it may make
the subject matter of the present invention rather unclear.
Exemplary embodiments of the present invention provide a method for
recording data in a disc and a method for searching for data in a
disc.
[0061] FIG. 2 exemplarily shows the relationship among a data file,
a Record Allocation Table (RAT) file, and a Rack of Virtual RAM
(RVR) file according to the embodiments of the present invention.
FIG. 3 exemplarily shows the relationship among an RVR file, a RAT
file, and data stored in a disc according to the embodiments of the
present invention. FIG. 4 exemplarily shows the relationship
between a data file and a RAT file according to the embodiments of
the present invention.
[0062] Referring to FIGS. 2 to 4, data according to the present
invention is stored as an RVR file format, and a RAT file acting as
a dynamic table of the RVR record is generated and stored.
[0063] In other words, if a user attempts to store an arbitrary
data file, a data file is converted into an RVR file, and a RAT
file is generated, such that the RVR file and the RAT file are
stored in a hard disk.
[0064] In this case, the RVR file is generated by including a
divisional factor in a data file. In addition, the divisional
factor is adapted to discriminate data for each divisional unit
serving as a recording unit of data. The divisional unit may be
established in various ways, for example, [paragraph], [line],
[word], [string], [integer], or [float], etc.
[0065] Type and function of the divisional factor (divisional unit)
will hereinafter be described with reference to a method for
creating the RVR file and the RAT file.
[0066] The RAT file stores a dynamic table that indicates the
position of each recording unit, and indicates the position of
specific data in the RVR file during the data searching
operation.
[0067] Referring to FIG. 3, it is assumed that the smallest data
unit of a hard disk serving as a data storage unit is stored in
each of first, second, third, fourth, . . . i-1.sup.th, i.sup.th,
and N.sup.th sectors. A cluster is a set of sectors, and is used as
a record unit of data.
[0068] A file manager serving as a file management program arranges
a cluster and a physical position according to a File Allocation
Table (FAT), such that it can store a file.
[0069] However, a plurality of clusters may be required to store
one file, and the clusters are not allocated in regular order. In
other words, the file manager searches for a recordable cluster and
stores a file corresponding to the searched cluster. The order of
clusters used for recording the file is recorded in the FAT. While
the file is reproduced (searched), the order of clusters is read
and data can be read, such that the file can be reproduced or
searched.
[0070] That is, as shown in FIGS. 2 to 4, individual physical
cluster positions are stored according to a series of cluster
numbers.
[0071] Meanwhile, the RAT file for the above data is a data file
distinguished by a divisional factor for each divisional unit. FIG.
3 shows an exemplary text divided into line units. In accordance
with the embodiment of the present invention, stored data of the
present invention is a general document.
[0072] The created RAT file stores a serial number (line number in
FIG. 3) for indicating the order of divisional factors and a record
position (i.e., address) where data corresponding to the serial
number is recorded.
[0073] In this case, the address serving as the record position may
be represented by the size of accumulated data.
[0074] That is, the address serving as the record position can be
represented by the following equation 1.
address[k]=(i-1)*bytes_of_record [Equation 1]
[0075] In this case, assuming that all records use the same number
of bytes, `bytes_of_record` is a constant decided according to hard
disk characteristics.
[0076] Therefore, provided that the record position is divided by
the constant (bytes_of_record), the cluster number (i') can be
recognized, such that the record position of physical data can be
recognized through FAT.
[0077] Meanwhile, provided that stored data of the present
invention is configured in the form of a matrix (table) and all
records (record units) of the matrix use the same number of bytes,
a record address of a specific position (k) on the matrix is
obtained by the partitioning of Equation 1.
[0078] The partitioning result of Equation 1 can be represented by
the following equation 2.
address[k]=[x-1]*bytes_of_record+[y-1]*bytes_of_record*N
[0079] In Equation 2, k is a serial number of a matrix record unit,
and N is the number of divisional factors on an X axis.
[0080] Provided that bytes of respective records are different from
one another, the following equation 3 can be obtained.
address [ k ] = i = 2 k bytes of record [ i - 1 ] [ Equation 3 ]
##EQU00001##
[0081] In Equation 3, k is a serial number of a matrix record unit,
bytes.of.record[i] indicates bytes of paragraph records, i is a
serial number of a specific line or a paragraph record, and
address[k=1] is initialized to zero `0`.
[0082] FIG. 5 shows an example for creating an RVR file from a data
file when stored data of the present invention is a general
document. FIG. 6 shows an example for creating a RAT file from a
data file when stored data of the present invention is a
matrix-type regular document.
[0083] Referring to FIG. 5, if stored data is a general document
(data and document have the same meaning), there are a variety of
divisional units, for example, [paragraph], [line], [word], etc. As
can be seen from FIG. 5, divisional units (i.e., [paragraph],
[line], and [word]) are applied to the same document such that the
RVR file is created.
[0084] In this case, a general document is an irregular document in
which the document format is irregular. The irregular document may
indicate most documents not written in a regular format instead of
a matrix format (e.g., a table). The term `general document` has
the same meaning as that of the term `irregular document`.
[0085] In other words, a first paragraph creates an RVR file using
`[paragraph]` as a divisional unit. As shown in the drawing, each
paragraph is denoted by a divisional factor `>`.
[0086] A second paragraph creates an RVR file using `[line]` as a
divisional unit. As shown in the drawing, each line is denoted by a
divisional factor `\n`.
[0087] A third paragraph creates an RVR file using `[word]` as a
divisional unit. As shown in the drawing, each paragraph is denoted
by a divisional factor ` `.
[0088] The divisional unit of data is decided according to user
selection, and may be replaced with an arbitrary symbol.
[0089] Meanwhile, as shown in FIG. 6, the RAT file is created from
the created RVR file. As described above, the RAT file includes not
only a serial number for indicating the order of sequential
divisional units of the RVR file but also a record position
(address) where the corresponding data is stored.
[0090] That is, as shown in FIG. 6, the amount of accumulated data
is indicative of a record position.
[0091] FIG. 7 shows an example for creating a RAT file and an RVR
file from a data file when data of the present invention is stored
in matrix format. FIG. 6 shows an example for creating a RAT file
and an RVR file from a data file when data of the present invention
is stored in matrix format.
[0092] Referring to FIG. 7, if data is stored in matrix format, an
additional divisional unit and an additional divisional factor are
not present. That is, a row and a column of the matrix are a
divisional unit and a divisional factor, respectively.
[0093] In this case, the matrix-format document is indicative of a
regular document. The matrix-format document has the same meaning
as that of the regular document.
[0094] In this case, a storage format is classified into [string],
[integer] and [float] according to data formats stored in each
matrix.
[0095] FIG. 7 shows an example of an arbitrary RVR file having a
storage format such as [string], [integer] or [float].
[0096] In this case, `string` may indicate a storage format in
which all kinds of data including character data and numeric data
(including a decimal point) can be freely stored in a cell of the
matrix.
[0097] In addition, `integer` may indicate a storage format in
which data stored in a cell of the matrix is an integer
variable.
[0098] Also, `float` may indicate a storage format in which data
stored in a cell of the matrix includes a decimal point.
[0099] Meanwhile, as shown in FIG. 8, the RAT file is created from
the created RVR file. As described above, the RAT file includes not
only a serial number indicating a row number of the matrix-type RVR
file but also a record position where the corresponding data is
stored.
[0100] FIG. 9 exemplarily shows a program and source code for
performing record and access functions of RVR and RAT files
according to the embodiments of the present invention.
[0101] In FIG. 9, a program for executing read/write (R/W)
operations of the RVR-RAT is an Indexing RVR (IRVR). FIG. 9 shows
actual exemplary sequences of a plurality of records of 6 different
data (depending upon a divisional unit or a data format) shown in
FIGS. 5 and 7.
[0102] Values of respective bytes are calculated in different ways
according to respective records and categories of computer
operating systems (OSs). Specifically, a file can be converted into
a binary file by the `fwrite( )` function of the C/C++ computer
language program, such that the size of each record of individual
input files returns to the unit of bytes. Therefore, while large
data is converted into an RVR file, all data records are converted
into values of bytes obtained through the `fwrite( )` function and
all record addresses obtained by Equations 1, 2 and 3, and the
converted result is output and stored as a RAT file.
[0103] General users other than experts handling high-level system
programming are unable to gain access to information about a
FAT-Sector (See FIG. 2). Therefore, a controller is used as an
intermediate bridge between FAT-sectors. Likewise, general users
who use most high-level computer languages (e.g., Perl, Python,
Fortran, C/C++, JAVA, etc.) may use an RVR-RAT that includes a
record and record address of a file stored in a hard disk in the
same manner as in the FAT-Sector controller.
[0104] A method for recording data in a disc and a method for
searching for data according to the present invention will
hereinafter be described with reference to the method for creating
RVR/RAT files and the method for searching for data using the
same.
[0105] FIG. 10 is a flowchart illustrating a method for creating an
RVR file and a RAT file according to the embodiments of the present
invention. FIG. 11 is a flowchart illustrating a method for
searching for data according to the embodiments of the present
invention.
[0106] An exemplary case in which stored data is general document
data will hereinafter be described with reference to the annexed
drawings.
[0107] Referring to FIG. 10, in accordance with a method for
creating the RAT file and the RVR file, if a user attempts to store
data, the system of the present invention receives information of a
divisional unit from the user at step S110.
[0108] Thereafter, upon receiving a data file, the system includes
a divisional factor corresponding to the divisional unit in the
divisional unit information so as to create an RVR file at step
S120. Needless to say, the divisional factor does not include a
certain function and facilitates creation of the RAT file during a
substantial search operation, such that it need not be contained in
the RVR file.
[0109] In addition, the RAT file is created from the RVR file at
step S130. The RAT file discriminates the RVR file using the
divisional factor, numbers each serial number, and records a record
position of data corresponding to each serial number, such that the
RAT file can be created.
[0110] Needles to say, the divisional factor may not include a
certain function during the substantial search process, and
facilitates creation of the RAT file, such that it may not be
contained in the RVR file. In this case, the above data is divided
by the divisional unit, and at the same time the numbering of the
serial number is performed. In addition, the record position of the
corresponding data is stored such that the RAT File is created.
[0111] In addition, the created RVR and RAT files are stored at
step S140.
[0112] A method for searching for data using the RAT file according
to the present invention will hereinafter be described with
reference to the annexed drawings.
[0113] Referring to FIG. 11, in order to search for data using the
RAT file of the present invention, the system for use in the
present invention receives search information form a user at step
S210.
[0114] The search information may indicate the order of each
divisional unit when data is general data. If data is matrix-type
data, the search information may indicate a row number of the
matrix.
[0115] That is, provided that the divisional unit is [paragraph]
and the user attempts to search for the N-th paragraph, the search
information is denoted by N. Provided that the divisional unit is
[line] and the user attempts to search for the N'-th line, the
search information is denoted by N'. Provided that the divisional
unit is [word] and the user attempts to search for the N''-th word,
the search information is denoted by N''.
[0116] Thereafter, the system of the present invention searches for
the stored RAT file and reads a record position corresponding to
the search information at step S220.
[0117] Next, the system calculates a cluster number from the record
position (address), and thus calculates a physical cluster position
of data from the FAT at step S230.
[0118] In order to calculate the cluster number, Equations 1 to 3
can be utilized as previously stated above.
[0119] Thereafter, the system reads the physical data storage
position of the hard disk and outputs the read result at step
S250.
[0120] Next, the sequential data access speed of a general hard
disk is compared with the data access speed of the present
invention.
[0121] FIGS. 12 and 13 show the result of comparison between a data
access speed of the present invention and a sequential data access
speed of a general hard disk.
[0122] In this case, the search data is one large-scale data of 192
gigabytes having a dimension denoted by `[X:20,000]*[Y:1,000,000]`,
where X indicates the presence of 20000 variables, each of which
includes a decimal point and Y indicates the presence of one
million of [X:20000]. In this data, the sequential access time of
each 10.sup.th, 100.sup.th, 1000.sup.th, 10000.sup.th,
100000.sup.th, or 1000000.sup.th record value of the Y value is
compared with a random access time using the RVR-RAT.
[0123] Under fedora 8.0 Linux environment, the above-mentioned test
is carried out by a 64-bit Quadra Core Zeon CPU, and this test is
calculated by [IRVR] shown in FIG. 9.
[0124] Data located in the frontmost record position has a
relatively short access time, and the access speed of more than
1000000.sup.th data geometrically increases (See FIG. 12).
[0125] In contrast, the data access speed of the present invention
is maintained at the almost constant time irrespective of the
record position, and it can be recognized that the constant time is
about 0.1 sec considered to be superior.
[0126] Although the method of the present invention requires a
considerable time to create the RVR file and the RAT file, the
method can very easily search for data after the RVR file and the
RAT file are created.
[0127] A management system (hereinafter referred to as RVR DBMS)
for managing a database (DB) using the above-mentioned file search
method will hereinafter be described with reference to the annexed
drawings.
[0128] FIG. 14 is a block diagram illustrating an RVR DBMS
according to the present invention. FIG. 15 exemplarily shows
divisional units for classifying irregular data by an RVR DBMS
according to the present invention. FIG. 16 exemplarily shows
divisional units for classifying regular data by an RVR DBMS
according to the present invention. FIG. 17 exemplarily shows a
method for adding, deleting, updating, and searching for data by an
RVR DBMS according to the present invention.
[0129] The RVR DBMS according to the present invention constructs
the data file using a set (i.e., RVR file) of data records and a
set (i.e., RAT file) of hard disk highly integrated indexes of the
data record set, such that it performs the same data management and
analysis operations as those of the standard DB management system
using the RVR file and the RAT file.
[0130] For this operation, as shown in FIG. 14, the RVR DBMS
according to the present invention includes at least one DB, an RVR
controller, and an analysis module for analyzing data using stored
in the DB.
[0131] In this case, the DB stores a data file applied to the RVR
DBMS, and also stores the RVR file and the RAT file that are
manufactured by the aforementioned processing of the data file.
Operations of creating and storing the RVR file and RAT file have
already been described in the afore-described best mode for
implementing the present invention.
[0132] Therefore, the RVR controller creates and stores the RVR
file and the RAT file using the data file, and performs a desired
analysis operation using the stored RVR and RAT files.
[0133] Meanwhile, in order to perform the above-mentioned analysis
operation, the analysis module includes a programming language
library (PLL) and a statistics language library (SLL).
[0134] In this case, PLL is an analysis module composed of various
computer programming languages (Java, Perl, Python, C/C++, etc.),
and SLL is an analysis module composed of various statistical
languages (R, SAS, SPSS, etc.). PLL or SLL is used as an analysis
module library that directly receives a pointer of the RVR DB
through a pipe and performs analysis and calculation operations on
the received pointer. Therefore, PLL or SLL is an analysis library
capable of minimizing a time requisite for I/O operations of the
analysis system.
[0135] Next, a method for recognizing and discriminating data by
the RVR DBMS according to the present invention will hereinafter be
described in detail. Each data record defined in the RVR DBMS is
always matched with the hard disk highly integrated index. Although
the above-mentioned hard disk drive (HDD) address has the same
physical address as that of a pointer used in C/C++ languages, it
has a different method for recognizing/using the physical address.
In other words, the pointer of the C/C++ languages indicates a
relatively absolute address about a given part irrespective of data
records. The hard disk absolute address (i.e., pointer) for use in
the RVR DBMS begins with a first data record of a given file and is
a relative address of the first data record, and a relative record
number (RRN) is given to each of the start and end addresses.
[0136] Therefore, the pointer for use in the RVR DBMS indicates
only hard disk highly integrated index addresses of individual data
records in the set of defined data records.
[0137] In addition, the size of a data record may be extended from
a single character (=1 byte) to the whole human genome (=3 GB
bytes), or the data record may be configured to have various sizes
of more than the whole human genome (=3 GB bytes).
[0138] Specifically, when using addresses of all individual bases
of the genome sequence, a DNA fragment of a predetermined size
(e.g., 12 oligonucleotides) moves one base by one base such that
fragments of the whole chromosome are constructed. The constructed
fragments are considered to be A, C, G, and T corresponding to four
base sequences, and A, C, G, and T are assigned to the 0.sup.th
order, the 1.sup.st order, the 2.sup.nd order, and the 3.sup.rd
order, respectively, and each of the A, C, G and T values is
converted into a tetramal number (i.e., is tetramalized) so that a
hash table is configured. In addition, if a number is given, the
given number may be applied to a specific function that can freely
and quickly modified into 12 base sequences.
[0139] In the above-mentioned case, if it is assumed that all bases
are modified into predetermined-sized records according to the
above-mentioned scheme or the similar scheme, the RVR-RAT for
records each having 12 base sequences is constructed. In addition,
a query base sequence for the data searching can be read through
random access in the whole genome related to a plurality of records
each including 12 bases.
[0140] The embodiment of the base sequences will hereinafter be
described in detail.
[0141] In addition, when recognizing and representing the data
record, tab (comma, white space, line breaker, symbol `>`, etc.)
may be utilized as necessary. That is, the tab may be defined in
different ways according to the user's intention.
[0142] Various divisional units for use in the RVR DBMS will
hereinafter be described in detail.
[0143] The divisional unit may also be extended in various ways
other than the best mode of the present invention, and more
extended divisional units can be established as follows.
[0144] In other words, referring to FIGS. 15 and 13, the RVR DBMS
processors can process irregular data and regular data using a
variety of divisional units (for example, [1]pagen, [2]page,
[3]fastan, [4]fasta, [5]line, [6]image, [7]audio, [8]video, [9]seq,
[10]int, [11]float, [12]string, [13]csv, [14]r, [15].times.ml, and
[16]smtx).
[0145] In this case, an irregular-type divisional unit (or a
non-table type divisional unit) will hereinafter be described in
detail. [0146] pagen: Paragraph-type data such as an abstract is
modified into a data record, and a line break present in each line
is recognized. [0147] page: Although paragraph-type data such as an
abstract is modified into a data record, a line break is given only
to the end of record. [0148] fastan: Although `fastan` is equal to
`fasta`, `fastan` is able to recognize a line break in each data
record.
[0149] fasta: `fasta` indicates a processor for a specific format
in which a line (that includes content such as an ID and
description (or annotation) of each record in the same manner as in
a page format) including a fasta format related to DNA/protein data
begins with `>`, does not permit a white space in a base
sequence and amino acid sequence, and assigns a line break only to
the end of a record. [0150] line: `line` is a processor that uses a
line breaker of each line as a separator and uses data as a data
record. [0151] image: `image` is needed for individual data records
of multimedia. `image` is a processor that converts file formats
(gif, jpeg, bmp, pict, pcx, etc.) of various still images into data
records. [0152] video: `video` is needed for individual data
records of multimedia. `video` is a processor that converts various
moving file formats (mpeg, avi, asf, rm, wmv, etc.) into data
records. [0153] audio: `audio` is needed for individual data
records of multimedia. `audio` is a processor that converts various
file formats (way, asf, mp3, ogg, etc.) into data records in the
same manner as in data capable of being recognized by person's
hearing sense.
[0154] Meanwhile, in the case of regular-type data or table-type
data, a hard disk drive (HDD) address is stored in units of a table
line. Therefore, the address of each data record is calculated by
adding a column position value to an address of each line and then
adding size information up to the position value to the added
result at the corresponding line.
[0155] The regular-type data has the following divisional units.
[0156] seq: `seq` is a processor that uses each line of DNA/protein
multiple alignment as a data record. [0157] int: `int` is a
regular-type table data format composed of integers. [0158] float:
`float` is a regular-type table data format composed of double
precision. [0159] string: `string` is a regular-type table data
format composed of the same-sized or different-sized words. [0160]
csv: `csv` is a value separated by a comma of an Excel file, and
indicates a table data format. [0161] r: `r` is a format about a
file, that includes a header and an ID of each line simultaneously
while being configured in a `csv` format.
[0162] Specifically, in accordance with the RVR DBMS, such
processors (string', `csv` and `r`) use the double indexing schema
because data records have the same or different sizes. That is, the
RVR DBMS stores a map of each line and a map of each record size.
However, since `int` and `float` have the same sizes in data
records, random access of all records are possible on a single
map.
[0163] xml: `xml` is used for a format of object-type structured
data.
[0164] smtx: `smtx` is a reduction type of a (N.times.N) matrix.
For example, after a part (i.e., a null part), that does not
include information of each line, of the (N.times.N) matrix is
completely removed, the number of data records is displayed, if as
many records and sub-records as the number of data records are
arranged, the size of matrix can be minimized. The processor is
able to use such data in the same manner as in the (N.times.N)
matrix.
[0165] Indexes of irregular data and regular data will hereinafter
be described in detail.
[0166] Each index (addressing) used in the RVR DBMS may use a dense
index. In other words, a data address is always present in one data
record, and a hashing table for a key or an ID about a data record
is also used.
[0167] In accordance with the RVR DBMS, a regular data record
performs dense indexing up to all sub-data records. In relation to
irregular data, the dense indexing method is applied to a data
record related to irregular data, and the sparse indexing method is
applied to sub-records.
[0168] FIG. 17 exemplarily shows a method for adding, deleting,
updating, and searching for data using an RVR DBMS according to the
present invention.
[0169] In the case of the RVR DBMS, since the size of data record
is not present and only the address of the corresponding position
is present, `addition` is always processed at the end of the entire
data record, and `deletion` is marked in address information.
[0170] `insertion` has the same meaning as the addition because the
access speed is invariable irrespective of the position of data
record.
[0171] `update` is performed to delete a data record through
deletion, addition, and order-readjustment (e.g., case in which a
specific order must be maintained as in `insertion`), and is also
performed for readjustment of data record addresses of a disc.
[0172] Hereinafter, a method for generating an RVR file and a RAT
file about specific data (genome base sequence and large-scale
abstract data) using the RVR DBMS, and searching/analyzing data
according to the embodiments of the present invention will
hereinafter be described in detail.
[0173] A method for processing genome base sequence data according
to the embodiments of the present invention will hereinafter be
described with reference to FIGS. 18 and 19. A method for
processing large-scale abstract data according to the embodiments
of the present invention will hereinafter be described with
reference to FIGS. 20 and 21.
[0174] FIG. 18 is a flowchart illustrating a method for creating an
RVR file and a RAT file of base sequence data by an RVR DBMS
according to the present invention. FIG. 19 exemplarily shows a
method for creating a RAT file and a RAT file of base sequence data
by an RVR DBMS according to the present invention.
[0175] Referring to FIG. 18, if input data is genome base sequence
data, the genome base sequence is fragmented into a
predetermined-sized base unit at step S310. In this case, although
the predetermined size may be established in various ways, it is
assumed that 12 bases are exemplarily utilized in the embodiment of
the present invention as shown in FIG. 19.
[0176] That is, as shown in FIG. 19, in association with the entire
base sequence, 12 bases from the first base (i.e., an initial base)
are discriminated. Then, 12 bases from the second base are
discriminated. Until bases from the (the number of all
bases-12).sup.th base to the last 12 bases are sequentially
discriminated, the discrimination operation is continued.
[0177] Next, a unique number is assigned to 7-digit base sequences
separated from one another at step S320. In this case, each base of
the discriminated base sequence is A, G, T or C, such that a unique
number can be effectively assigned using a tetramal number.
[0178] In addition, number information including each base sequence
to which the unique number is assigned, and each number position is
indexed as shown in FIG. 19 (Step S330).
[0179] Next, the above-mentioned `smtx`-type RVR and RAT files are
created from the indexed data and the created RVR and RAT files are
stored (Step S330).
[0180] In this case, the RVR file may correspond to the entire base
sequence data. The RAT file stores base data divided into 7 base
units, serial numbers added to the 7 base units, and storage
position of each base unit.
[0181] Meanwhile, if a user attempts to search for data using the
RVR and RAT files and inputs a desired base unit (divided into 12
base units) to a query, the RVR DBMS according to the present
invention searches for an input base unit in the RAT file, searches
for a data position including the base unit, and informs the user
of data corresponding to the searched data. Needless to say, if a
different analysis command other than the search command is
present, the analysis task is performed using the above-mentioned
searched result, and the result is applied to the user.
[0182] A method for mining data large abstract data using the RVR
DBMS according to the embodiment of the present invention will
hereinafter be described.
[0183] FIG. 20 is a flowchart illustrating a method for creating an
RVR file and a RAT file of large abstract data by an RVR DBMS
according to the present invention. FIG. 21 is a conceptual diagram
illustrating a method for creating an RVR file and a RAT file of
large-scale abstract data by an RVR DBMS according to the present
invention.
[0184] In this case, as shown in FIG. 21, large-scale abstract data
is composed of a plurality of abstract data such that it constructs
large-scale data.
[0185] If the input data is large-scale abstract data, a serial
number is assigned to each abstract data at step S410.
[0186] In relation to words contained in the abstract data, a
serial number (RRN) of data including the above word is calculated
at step S420.
[0187] Thereafter, each word, a serial number assigned to each
word, and the number of words are configured in a hash table at
step S430.
[0188] In this case, although steps S410 to S430 shown in FIG. 20
are performed separately from one another, steps S410 to S430 may
be simultaneously performed. The RVR DBMS searches the abstract
data from the start part to the end part, calculates serial numbers
(RRNs) of initial words, and stores the calculated RRNs. In
relation to the overlapped words, number information and a RRN are
added to conventional data, such that the overlapped words are
created.
[0189] Thereafter, `smtx`-type RVR and RAT files are created from
data configured in a table format in step S430 (Step S440).
[0190] In this case, the RVR file may correspond to data in which
input large-scale abstract data is divided by a serial number
(abstract 1, abstract 2, . . . in FIG. 21). The RAT file stores
individual words, serial numbers (RRNs) of such words, and the
number of stored words.
[0191] Meanwhile, the method for searching/analyzing data using the
RVR file and the RAT file according to the embodiment of the
present invention is performed in the same principle as in the
aforementioned base sequence data. However, the RVR DBMS receives
each word as a query, and performs an operation corresponding to
the received query.
[0192] The detailed description of the exemplary embodiments of the
present invention has been given to enable those skilled in the art
to implement and practice the invention. Although the invention has
been described with reference to the exemplary embodiments, those
skilled in the art will appreciate that various modifications and
variations can be made in the present invention without departing
from the spirit or scope of the invention described in the appended
claims. For example, those skilled in the art may use each
construction described in the above embodiments in combination with
each other.
[0193] Although the above-mentioned embodiment has exemplarily
disclosed that the RVR DBMS of the present invention is used in a
HDD, the present invention can be applied to a variety of storage
mediums used as a substitute of a HDD. For example, the present
invention can also be applied to either a solid state drive (SSD)
(solid state disk) that uses a flash memory as a substitute of a
HDD, or a Dynamic Random Access Memory (DRAM). In this case, the
concept of the RVR file is identical to that of the RAT file as
described above, and SSD and DRAM must be interpreted as a
substitute of a HDD.
[0194] The present invention relates to a method for
creating/storing a file that facilitates a search operation, and a
data search method using the same.
[0195] In recent times, a task for decoding human genome sequences
of 1000 people is being conducted by United States NIH
(http://www.1000genomes.org/). Only the amount of all data is about
3 terabytes, and it is impossible for the standard DBMS to process
data of about 3 terabytes.
[0196] In the Republic of Korea, through the Korean Association
Resource (KARE)--I project of Korea Centers for Disease Control and
Prevention (KCDC) in 2007, the size of single dielectric data is
about 500 Gigabytes. In KARE-II in 2008, similar data is further
created by 2 Terabytes. In addition, it is impossible for the
standard DBMS to create a database (DB) related to clinical
epidemiology function information.
[0197] Therefore, when the present invention is applied to the task
for storing/searching the latest data that is being developed to
large capacity data, the present invention has greater effects in
economic efficiency and research execution speed.
[0198] For example, theoretically, a similar (or homologous) matrix
of (100 K bytes.times.100 K bytes) data records is created. In
order to perform exhaustive clustering of data using this matrix,
the (100 K bytes.times.100 K bytes) matrix must be normally loaded
in a DRAM. In this case, if the C/C++ program uses precise integer
variable (double), a DRAM of 8 Gigabytes (TB) is needed.
[0199] Therefore, the RVR-RAT scheme that uses a HDD is absolutely
required to research such large-volume clustering.
[0200] Although the RVR DBMS according to the present invention can
be used as a DBMS having various purposes, the current RVR DBMS
version can be most efficiently used as a method for
analyzing/managing large-volume bulk data for scientific
technology. By means of some additional formats (minimum formatting
task), the RVR DBMS can be directly connected to a data process and
an analysis module of a DBMS. In addition, according to the present
invention, several files can be quickly DBMS-processed in the same
manner as in each user's Web 2.0 personal computer (PC) acting as a
server. Cloud computing means a service that enables many users to
use the analysis/calculation devices centralized in one place over
the Internet. Such cloud computing is implemented by the present
invention, such that a plurality of users can perform rapid
calculation. The RVR DBMS can obtain the best application result
from the cloud computing. If highly integrated indexing of all data
is performed, data can be quickly distributed. Parallel
distribution calculation of such distributed data can be quickly
processed using a large number of PC clusters indicating the best
advantage of the cloud computing technology.
[0201] As apparent from the above description, the file creating
method for searching for single data and a method for searching for
a single data file according to the embodiments of the present
invention have the following effects.
[0202] In accordance with Rack of Virtual RAM (RVR) serving as a
binary file for use in the present invention, addresses of all data
records on a hard disk are recorded in an RAT file. Therefore, a
user randomly accesses RVR file record information using not only
programming languages (Perl, Python, Fortran, C/C++, JAVA, etc.)
but also address information stored in the RAT file, formats the
accessed resultant information using such programming languages,
and outputs the formatted result. Therefore, the embodiments of the
present invention can create a database (DB) for large irregular
data and can also analyze data.
[0203] In addition, the present invention can implement random
access using a relatively cheap hard disk without the need for
large amounts of DRAM, resulting in economic efficiency.
[0204] With the development biotechnology, more than 2000 whole
genome sequences ranging from microorganisms to animals and plants
have been decoded and a single human genome consumes about 3
gigabytes.
[0205] In the meantime, an RVR database management system (DBMS)
according to the present invention performs DBMS using data records
of data files and their addresses, whereas the conventional
standard DBMS constructs a regular data table, inputs data to the
table, and applies a DBMS to the input table. In addition, the RVR
DBMS according to the present invention constructs a plurality of
tables in the same manner as in the standard DBMS, and systemically
applies a DBMS to the relationship between inter- or intra-tables.
Compared to the above-mentioned standard scheme, the RVR DBMS
according to the present invention has advantages in that it
constructs RVR-RATs for data records having different formats in
different files and performs DBMS for the inter- or
intra-files.
[0206] In the meantime, the RVR DBMS according to the present
invention has the following advantages as compared to the
conventional standard DBMSs (e.g., Main Memory based DBMS (MMDBMS),
Disk resident DBMS (DRDBMS), and Hybrid DBMS (HDBMS)). There are
differences among MMDBMS, DRDBMS and HDBMS. In more detail, MMDBMS
enables a table present in a memory to be filled with data, DRDBMS
enables a table present in a hard disk to be filled with data, and
HDBMS stores data in a memory for rapid calculation and stably
stores data in a disc. DRDBMS is preferable when dealing with large
amounts of data, although response time is slow. However, MMDBMS is
better suited to small amounts of data. The above-mentioned two
advantages of DRDBMS and MMDBMS are all present in HDBMS.
[0207] Compared to the standard DBMS, the RVR DBMS scheme has the
following characteristics (1), (2), (3), (4) and (5).
[0208] (1) The RVR DBMS scheme uses only hard disk highly
integrated index addresses of a data file composed of a specific
format stored in a hard disk, such that the RVR DBMS scheme is
identical to the DRDBMS scheme. (2) The RVR DBMS scheme is similar
to MMDBMS and has a rapid interaction speed. (3) Specifically, the
RVR DBMS scheme can be applied even to bulk data for science and
technologies. In addition, DBMS and analysis processes can also be
easily applied to irregular data such as a large number of genome
sequences incapable of being processed using the standard DBMS. (4)
The RVR DBMS scheme is used as DBMS for data files, such that it
can perform statistics and analysis calculation of data.
[0209] Therefore, the RVR DBMS scheme can be utilized in a system
for analyzing interaction data for use in science and technology.
(5) In addition, the RVR DBMS scheme performs highly integrated
indexing of all files contained in a hard disk and manages the
highly integrated result, such that it can easily distribute data.
Such capability for easier data distribution of the RVR DBMS scheme
can be more efficiently applied to cloud computing capable of
easily performing distributed calculation.
[0210] It will be apparent to those skilled in the art that various
modifications and variations can be made in the present invention
without departing from the spirit or scope of the inventions. Thus,
it is intended that the present invention covers the modifications
and variations of this invention provided they come within the
scope of the appended claims and their equivalents.
* * * * *
References