U.S. patent application number 12/986466 was filed with the patent office on 2012-07-12 for scalable cloud storage architecture.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Rong N. Chang, Byung C. Tak, Chunqiang Tang.
Application Number | 20120179874 12/986466 |
Document ID | / |
Family ID | 46456129 |
Filed Date | 2012-07-12 |
United States Patent
Application |
20120179874 |
Kind Code |
A1 |
Chang; Rong N. ; et
al. |
July 12, 2012 |
SCALABLE CLOUD STORAGE ARCHITECTURE
Abstract
a virtual storage module operable to run in a virtual machine
monitor may include a wait-queue operable to store incoming
block-level data requests from one or more virtual machines.
In-memory metadata may store information associated with data
stored in local persistent storage that is local to a host computer
hosting the virtual machines. The data stored in local persistent
storage replicates a subset of data in one or more virtual disks
provided to the virtual machines. The virtual disks are mapped to
remote storage accessible via a network connecting the virtual
machines and the remote storage. A cache handling logic may be
operable to handle the block-level data requests by obtaining the
information in the in-memory metadata and making I/O re-quests to
the local persistent storage or the remote storage or combination
of the local persistent storage and the remote storage to service
the block-level data requests.
Inventors: |
Chang; Rong N.;
(Pleasantville, NY) ; Tak; Byung C.; (State
College, PA) ; Tang; Chunqiang; (Ossining,
NY) |
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
46456129 |
Appl. No.: |
12/986466 |
Filed: |
January 7, 2011 |
Current U.S.
Class: |
711/128 ;
711/E12.018; 718/1 |
Current CPC
Class: |
G06F 12/0813 20130101;
G06F 2212/154 20130101; G06F 9/45558 20130101; G06F 12/0802
20130101; G06F 2212/152 20130101; G06F 12/0808 20130101; G06F
2009/45583 20130101; H04L 67/1097 20130101; G06F 11/1453 20130101;
G06F 2009/45579 20130101 |
Class at
Publication: |
711/128 ; 718/1;
711/E12.018 |
International
Class: |
G06F 12/08 20060101
G06F012/08; G06F 9/455 20060101 G06F009/455 |
Claims
1. A storage system for handling data for virtual machines,
comprising: a virtual storage module operable to run in a virtual
machine monitor, the virtual storage module including at least, a
wait-queue operable to store incoming block-level data requests
from one or more virtual machines; in-memory metadata for storing
information associated with data stored in local persistent storage
that is local to a host computer hosting the virtual machines, the
data stored in local persistent storage being replication of a
subset of data in one or more virtual disks provided to the virtual
machines, the virtual disks being mapped to remote storage
accessible via a network connecting the virtual machines and the
remote storage; and a cache handling logic operable to handle the
block-level data requests by obtaining the information in the
in-memory metadata and making I/O requests to the local persistent
storage or the remote storage or combination of the local
persistent storage and the remote storage to service the
block-level data requests.
2. The system of claim 1, wherein the in-memory metadata includes
at least virtual disk identifier that identifies a virtual disk
stored on the remote storage, remote address of the data in the
remote storage, a bit vector that indicates whether the data is
valid, and a dirty bit that indicates whether the data is
modified.
3. The system of claim 2, wherein the virtual storage module
manages block groups and performs I/O requests to the local
persistent storage in units of one or more predetermined sized
blocks.
4. The system of claim 3, wherein each block stored in the local
persistent storage includes a trailer that stores metadata of the
block and hash value of the block used for checking data integrity
of data content of the block, wherein after a host crash and
recovery, the virtual storage module can examine the trailer to
determine a virtual disk that owns said each block stored in the
local persistent storage, and determine whether the data content of
the block and the hash value are consistent.
5. The system of claim 4, wherein the data content of the block and
the trailer are read and written together in a single disk I/O
operation.
6. The system of claim 3, wherein the virtual storage module
organizes the local persistent storage as set-associative cache
structured into a table-like structure with rows and columns, each
of the rows having multiple block groups wherein the block groups
in a same row are laid out in logically contiguous disk blocks, and
wherein each block group in the same row can store contents coming
from a different virtual disk.
7. The system of claim 6, wherein the one or more predetermined
sized blocks can store data and metadata associated with the data,
and wherein the in-memory metadata includes each of the metadata
stored in the one or more predetermined sized blocks.
8. The system of claim 7, wherein the predetermined sized blocks
can further store hash value of the data.
9. The system of claim 1, wherein the cache handling logic replaces
data in the local persistent storage based on a score determined
from summing weighted values associated with how recently the data
was accessed, how sequential the data is with respect to an
adjacent data, how far away the data is from a base row, how
sequential the data would be if new block is cached, how far away
from the base row the data would be if a new block is cached, and
whether the data is modified.
10. The system of claim 1, wherein the virtual storage module
automatically destages modified data in the local persistent
storage to the remote storage in response to determining that the
modified data has reached a threshold.
11. The system of claim 10, wherein the virtual storage module
further determines how many blocks of data to destage at a given
time based on total allowed data transmission size including
combined data transmission size for both remote storage accesses
and destaging.
12. The system of claim 1, wherein the in-memory metadata are
persisted on disk in a write-through manner to guarantee data
integrity in an event of a host crash.
13. A method for handling data storage for virtual machines,
comprising: intercepting one or more incoming block-level data
requests received by a virtual machine monitor from one or more
virtual machines; obtaining from in-memory metadata, information
associated with data of the block-level data request, the in-memory
metadata for storing information associated with data stored in
local persistent storage that is local to a host computer hosting
the virtual machines, the data stored in local persistent storage
being replication of a subset of data in one or more virtual disks
provided to the virtual machines, the virtual disks being mapped to
remote storage accessible via a network connecting the virtual
machines and the remote storage; and making I/O requests to the
local persistent storage or the remote storage or combination of
the local persistent storage and the remote storage to service the
block-level data requests.
14. The method of claim 13, wherein the in-memory metadata includes
at least virtual disk identifier that identifies a virtual disk
stored on the remote storage, remote address of the data in the
remote storage, a bit vector that indicates whether the data is
valid, and a dirty bit that indicates whether the data is
modified.
15. The method of claim 14, further including managing block groups
and performing I/O requests to the local persistent storage in
units of predetermined sized blocks.
16. The method of claim 15, further including organizing the local
persistent storage as set-associative cache structured into a
table-like structure with rows and columns, each of the rows having
multiple block groups wherein the block groups in a same row are
laid out in logically contiguous disk blocks, and wherein each
block group in the same row can store contents coming from a
different virtual disk
17. The method of claim 16, wherein the one or more predetermined
sized blocks can store data and metadata associated with the data,
and wherein the in-memory metadata includes each of the metadata
stored in the one or more predetermined sized blocks.
18. The method of claim 17, wherein the predetermined sized blocks
can further store hash value of the data.
19. The method of claim 13, further including replacing data in the
local persistent storage based on a score determined from summing
weighted values associated with how recently the data was accessed,
how sequential the data is with respect to an adjacent data, how
far away the data is from a base row, how sequential the data would
be if new block is cached, how far away from the base row the data
would be if a new block is cached, and whether the data is
modified.
20. The method of claim 13, further including automatically
destaging modified data in the local persistent storage to the
remote storage in response to determining that the modified data
has reached a threshold.
21. The method of claim 20, further including determining how many
blocks of data to destage at a given time based on total allowed
data transmission size including combined data transmission size
for both remote storage accesses and destaging.
22. A computer readable storage medium storing a program of
instructions executable by a machine to perform a method for
handling data storage for virtual machines, comprising:
intercepting one or more incoming block-level data requests
received by a virtual machine monitor from one or more virtual
machines; obtaining from in-memory metadata, information associated
with data of the block-level data request, the in-memory metadata
for storing information associated with data stored in local
persistent storage that is local to a host computer hosting the
virtual machines, the data stored in local persistent storage being
replication of a subset of data in one or more virtual disks
provided to the virtual machines, the virtual disks being mapped to
remote storage accessible via a network connecting the virtual
machines and the remote storage; and making I/O requests to the
local persistent storage or the remote storage or combination of
the local persistent storage and the remote storage to service the
block-level data requests.
23. The computer readable storage medium of claim 22, wherein the
in-memory metadata includes at least virtual disk identifier that
identifies a virtual disk stored on the remote storage, remote
address of the data in the remote storage, a bit vector that
indicates whether the data is valid, and a dirty bit that indicates
whether the data is modified.
24. The computer readable storage medium of claim 20, further
including managing block groups and performing I/O requests to the
local persistent storage in units of predetermined sized
blocks.
25. The computer readable storage medium of claim 24, further
including organizing the local persistent storage as
set-associative cache structured into a table-like structure with
rows and columns, each of the rows having multiple block groups
wherein the block groups in a same row are laid out in logically
contiguous disk blocks, wherein each block group in the same row
can store contents coming from a different virtual disk, wherein
the one or more predetermined sized blocks can store data and
metadata associated with the data, and wherein the in-memory
metadata includes each of the metadata stored in the one or more
predetermined sized blocks.
Description
FIELD
[0001] The present application generally relates to computer
systems and computer storage, and more particularly to virtual
storage and storage architecture.
BACKGROUND
[0002] Designing a storage system is a challenging task. For
instance, in Cloud Computing, high degree of virtualization
increases the demand for storage spaces and this requires the use
of remote storage spaces. However, uncontrolled access to the
remote storage from large number of virtual machines can easily
saturate the networking infrastructure and affect the entire
systems using the network.
[0003] More particularly, for example, in an IaaS
(Infrastructure-as-a-Service) cloud services, storage needs of VM
(Virtual Machine) instances are met through virtual disks (i.e.
virtual block devices). However, it is nontrivial to provide
virtual disks to VMs in an efficient and scalable way for a couple
of reasons. First, a VM host may be required to provide virtual
disks for a large number of VMs. It is difficult to ascertain the
largest possible storage demands and physically provision them all
in the host machine. On the other hand, if the storage spaces for
virtual disks are provided through remote storage servers,
aggregate network traffic due to storage accesses from VMs can
easily deplete the network bandwidth and cause congestion.
BRIEF SUMMARY
[0004] A storage system and method for handling data for virtual
machines, for instance, for scalable cloud storage architecture,
may be provided. The system, in one aspect, may include a virtual
storage module operable to run in a virtual machine monitor. The
virtual storage module may include a wait-queue operable to store
incoming block-level data requests from one or more virtual
machines, and in-memory metadata for storing information associated
with data stored in local persistent storage that is local to a
host computer hosting the virtual machines. The data stored in
local persistent storage may be replication of a subset of data in
one or more virtual disks provided to the virtual machines, the
virtual disks being mapped to remote storage accessible via a
network connecting the virtual machines and the remote storage. A
cache handling logic may be operable to handle the block-level data
requests by obtaining the information in the in-memory metadata and
making I/O requests to the local persistent storage or the remote
storage or combination of the local persistent storage and the
remote storage to service the block-level data requests.
[0005] A method for handling data storage for virtual machines, in
one aspect, may include intercepting one or more incoming
block-level data requests received by a virtual machine monitor
from one or more virtual machines. The method may also include
obtaining from in-memory metadata, information associated with data
of the block-level data request. The in-memory metadata may store
information associated with data stored in local persistent storage
that is local to a host computer hosting the virtual machines. The
data stored in local persistent storage may be replication of a
subset of data in one or more virtual disks provided to the virtual
machines. The virtual disks may be mapped to remote storage
accessible via a network connecting the virtual machines and the
remote storage. The method may further include making I/O requests
to the local persistent storage or the remote storage or
combination of the local persistent storage and the remote storage
to service the block-level data requests.
[0006] A computer readable storage medium storing a program of
instructions executable by a machine to perform one or more methods
described herein also may be provided.
[0007] Further features as well as the structure and operation of
various embodiments are described in detail below with reference to
the accompanying drawings. In the drawings, like reference numbers
indicate identical or functionally similar elements.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0008] FIG. 1 shows the architecture of a scalable Cloud storage
system in one embodiment of the present disclosure.
[0009] FIG. 2 shows the architecture of vStore in one embodiment of
the present disclosure.
[0010] FIG. 3 illustrates structure of one cache entry in one
embodiment of the present disclosure.
[0011] FIG. 4A is a flow diagram illustarting a read request
handling in one embodiment of the present disclosure.
[0012] FIG. 4B is a flow diagram illustarting a write request
handling in one embodiment of the present disclosure.
[0013] FIG. 5 shows as an example, the Xen implementation of vStore
in one embodiment of the present disclsoure.
DETAILED DESCRIPTION
[0014] The present disclosure in one embodiment presents a system
(referred to in this disclosure as vStore), which utilizes the
host's (e.g., computer server hosting virtual machines) local disk
space as a block-level cache for the remote storage (e.g., network
attached storages), for example, in order to absorb network
traffics from storage accesses. This allows the VMM (Virtual
Machine Monitor, a.k.a. hypervisor) to serve VMs' disk input/output
(I/O) requests from the host's local disks most of the time, while
providing the illusion of much larger storage space for creating
new virtual disks. Caching virtual disks at block-level poses
special challenges in achieving high performance while maintaining
virtual disk semantics. First, after a disk write operation
finishes from the VM's perspective, the data should survive even if
the host immediately encounters a power failure. That is, the
block-level cache should preserve the data integrity in the event
of host crashes. To that end, cache handling operations in one
embodiment of the present disclosure may ensure consistency between
on-disk metadata and data to avoid committing incorrect data to the
network attached storage (NAS) during recovery from a crash, while
minimizing overheads in updating on-disk metadata. Second, as disk
I/O performance is dominated by disk seek times, a virtual disk
should be kept as sequential as possible in the limited cache
space. Unlike memory-based caching schemes, the performance of an
on-disk cache is highly sensitive to data layout. The present
disclosure in one embodiment may utilize a cache placement policy
that maintains a high degree of data sequentiality in the cache as
in the original (i.e., remote) virtual disk. Third, the destaging
operation that sends dirty pages back to the remote storage server
may be self-adaptive and minimize the impact on the foreground
traffic.
[0015] In another aspect, a scalable architecture is presented that
provides reliable virtual disks (i.e., block devices as opposed to
object stores) for virtual machines (VM) in a cloud
environment.
[0016] FIG. 1 shows the architecture of a scalable Cloud storage
system in one embodiment of the present disclosure. The
architecture may include one or more VM-hosting machines (e.g.,
102, 104, 106). A VM-hosting machine is a physical machine that
hosts a large number of VMs and has limited local storage space.
vStore 108 uses local storage 110 as a block-level cache and
provides to VMs 112 the illusion of unlimited storage space. vStore
108 may be implemented in hypervisor 114 and provides persistent
cache. vStore 108 performs caching at the block device level rather
than the file system level. The hypervisor 114 executes on one or
more computer processors and provides a virtual block device to VMs
112, which implies that VMs 112 see raw block devices and they are
free to install any file systems on top of it. Thus, hypervisor 114
receives block-level requests and redirects it to the remote
storage (e.g., 116, 118).
[0017] In one embodiment, single cache space is provided per
machine (e.g., 102). The cache tries to replicate the block layout
of remote storage (e.g., 116, 118) in the local cache space (local
disk) 110.
[0018] Storage server clusters (e.g., 116, 118) provide network
attached storage to physical machines (e.g., 102, 104, 106). They
(e.g., 116, 118) can be either dedicated high-performance storage
servers or a cluster of servers using commodity storage devices.
The interface to the hypervisors 114 can be either block-level or
file-level. If it is the block-level, iSCSI type of protocol can be
used between storage servers and clients (i.e., hypervisors). If it
is file-level, the hypervisor mounts a remote directory structure
and keeps the virtual disks as individual files. Regardless of the
protocol between hypervisors and storage servers, the interface
between VMs and hypervisor remains at block-level.
[0019] The directory server 120 holds the location information
about the storage server clusters. When a hypervisor 114 wants to
attach a virtual disk to a VM, it consults the directory server 120
to determine the address of a specific storage server (e.g., 116,
118) that currently stores the virtual disk.
[0020] The architecture also includes networking infrastructure.
Usually network bandwidth within a rack is well-provisioned, but
cross-rack network is usually 5-10 times under-provisioned than
that of within-rack network. As a result, uncontrolled storage
accesses from VMs can easily deplete the network bandwidth and
cause congestion.
[0021] An example configuration may have rack-mounted servers for
hosting virtual machines and remote storage servers to provide
storage services to the VMs. A rack may contain more than 20
servers and virtual machine monitors such as Xen-3.1.4 hypervisor
installed on each of them. Servers may have processors such as two
Intel.RTM. Xeon.TM. CPU of 3.40 GHz and have memory, e.g., 2 giga
(G) bytes of memory. They can communicate through 1 Gbps link
within the rack. Local storage for each server may be about 1
terabytes and they have a network file system (NFS)-mounted shared
storage space that is used to hold VM images for all Virtual
Machines. Remote storage servers may have physical hard disks
attached, e.g., through Serial Advanced Technology Attachment
(SATA) interface.
[0022] There may be multiple options when designing a storage
system for a Cloud. One solution is to use only local storage. In a
Cloud, VMs may use different amounts of storage space, depending on
how much the user pays. If every host's local storage space is
over-provisioned for the largest possible demand, the cost would be
prohibitive. Another solution is to only use network attached
storage. That is, a VM's root file system, swap area, and
additional data disks are all stored on network attached storage.
This solution, however, would incur a large amount of network
traffic and disk I/O load on the storage servers.
[0023] Sequential disk access can achieve a data rate of 100 MB/s.
Even with pure random access, it can reach 10 MB/s. Since 1 Gbps
network can sustain roughly about 13 MB/s, four uplinks to the
rack-level switch are not enough to handle even one single
sequential access. Note that uplinks to the rack-level network
switches are limited in numbers and cannot be easily increased in
commodity systems. Even for random disk access, it can only support
about five VMs' disk I/O traffic. Even with 10 Gbps networks, it
still can hardly support thousands of VMs running in one rack
(e.g., typical numbers are 42 hosts per rack, and 32 VMs per host,
i.e., 1,344 VMs per rack).
[0024] vStore 108 takes a hybrid approach that leverages both local
storage 110 and network attached storage 116, 118. It still relies
on network attached storage 116, 118 to provide sufficient storage
space for VMs 112, but utilizes the local storage 110 of a host 102
to cache data and avoid accessing network attached storage 116, 118
as much as possible.
[0025] Consider the case of Amazon EC2, where a VM is given one 10
GB virtual disk to store its root file system and another 160 GB
virtual disk to store data. The root disk can be stored on local
storage due to its small size. The large data disk can be stored on
network attached storage and accessed through the vStore cache.
Data integrity and performance are two main challenges in the
design of vStore. After a disk write operation finishes from the
VM's perspective, the data should survive even if the host
immediately encounters a power failure. In vStore, system failures
can compromise data integrity in several ways. If the host crashes
while vStore is in the middle of updating either the metadata or
the data and there is no mechanism for detecting the inconsistency
between the metadata and the data, after the host restarts,
incorrect data may remain in the cache and be written back to the
network attached storage. Another case that may compromise data
integrity is through violating the semantics of writes. If data is
buffered in memory and not flushed to disk after reporting write
completion to the VM, a system crash will cause data loss. Taking
such semantics in consideration vStore of the present disclosure in
one embodiment may be designed to support data integrity.
[0026] The second challenge is to achieve high performance, which
conflicts with ensuring data integrity and hence may be designed to
minimize performance penalties. The performance of vStore may be
affected by several factors: (i) data placement within the cache,
(ii) vStore metadata placement on disk, (iii) complication
introduced by the vStore logic. For (i), if sequential blocks in a
virtual disk are placed far apart in the cache, a sequential read
of these blocks incurs a high overhead due to a long disk seek
time. Therefore, in one embodiment, vStore keeps a virtual disk as
sequential as possible in the limited cache space. For (ii),
ideally, on-disk metadata should be small and should not require an
additional disk seek to access data and metadata separately. For
(iii), one potential overhead is the dependency among outstanding
requests. For example, if one request is about to evict one cache
entry, then all the requests on that entry must wait. All of these
factors may be considered in the design of vStore.
[0027] FIG. 2 shows the architecture of vStore in one embodiment of
the present disclosure. The description herein is based on
para-virtualized Xen as an example. VMs 202 generate block requests
in the form of (sector address, sector count). Requests arrive at
the front-end device driver within the VM 202 after passing through
the guest kernel. Then they are forwarded to the back-end driver in
Domain-0. The back-end driver issues actual I/O requests to the
device, and send responses to the guest VM 202 along the reverse
path.
[0028] In one embodiment, the vStore module 204 runs in Domain-0,
and extends the function of the back-end device driver. vStore 204
intercepts requests and filters them through its cache handling
logic. In FIG. 2, vStore 204 internally may include a wait queue
206 for incoming requests, a cache handling logic 208, and
in-memory metadata 210. Incoming requests are first put into
vStore's wait queue 206. The wait queue 206 is used in one
embodiment because the cache entry that this request needs to use
might be under eviction or update triggered by previous requests.
After clearing such conflicts, the request is handled by the cache
handling logic 208. The in-memory metadata 210 are consulted to
obtain information such as block address, dirty bit, and
modification time. Depending on the current cache state, actual I/O
requests are made to either the cache on local storage 212 or the
network attached storage 214.
[0029] I/O Unit: Guest VMs usually operate on 4 KB blocks, but
vStore can perform I/Os to and from the network attached storage at
a configurable larger unit. A large I/O unit reduces the size of
in-memory metadata, as it reduces the number of cache entries to
manage. Moreover, a large I/O unit works well with high-end storage
servers, which are optimized for large I/O sizes (e.g., 256 KB or
even 1 MB). Thus, reading a large unit is as efficient as reading 4
KB. This may increase the incoming network traffic, but our
evaluation shows that the subsequent savings outweigh the initial
cost. We use the term, block group, to refer to the I/O unit used
by the vStore as opposed to the (typically 4 KB) block used by the
guest VMs. That is, one block group contains one or more 4 KB
blocks.
[0030] Metadata: Metadata holds information about cache entries on
disk. Metadata are stored on disk for data integrity and cached in
memory for performance. Metadata updates are done in a
write-through manner. After a host crashes and recovers, vStore
visits each metadata entry on disk and recovers any dirty data that
have not been flushed to network attached storage. Table 1
summarizes examples of the metadata fields in one embodiment of the
present disclosure.
TABLE-US-00001 TABLE 1 vStore Metadata. Fields Size Descriptions
Virtual 2 Bytes ID assigned by vStore to uniquely identify a Disk
ID virtual disk. An ID is unique only within individual
hypervisors. Sector 4 Bytes Cache entry's remote address in unit of
Address sector. Dirty Bit 1 Bit Set if cache content is modified.
Valid Bit 1 Bit Set if cache entry is being used and the
corresponding data is in the cache. Lock Bit 1 Bit Set if under
modification by a request. Read Count 2 Bytes How many read
accesses within a time unit. Write Count 2 Bytes How many write
accesses within a time unit. Bit Vector Variable Each bit
represents 4 KB within the block group. Set if corresponding 4 KB
is valid. The size is (block group)/4 KB bits. Access Time 8 Bytes
Most recently accessed time. Total Size <23 Bytes
[0031] Virtual Disk identifier (ID) identifies a virtual disk
stored on network attached storage. When a virtual disk is detached
and reconnected later, cached contents that belong to this disk is
identified and reused. Bit Vector has one bit for each 4 KB block
in a block group so that the states of 4 KB blocks in the same
block group can be changed and tracked individually. Without Bit
Vector, the states of 4 KB blocks in the same block group must
always be changed together. As a result, when the VM writes to a 4
KB block, vStore must read the entire block group (including all 4
KB blocks in that block group) from network attached storage, merge
with the 4 KB new data, and writes the entire block group to cache.
With Bit Vector, vStore can write to the 4 KB data directly without
fetching the entire block group, and then only change the affected
4 KB block's state in Bit Vector. Our experiments show that Bit
Vector helps reduce network traffic when using a large cache unit
size.
[0032] Maintaining metadata on disk may compromise performance. A
naive implementation may require two disk accesses to handle one
write request issued by a VM--one for metadata update and one for
writing actual data. In the present disclosure in one embodiment,
vStore solves this problem by putting metadata and data together,
and updates them in a single write. The details are described
below.
[0033] In-memory Metadata: To avoid disk I/Os for reading the
on-disk metadata, vStore in one embodiment maintains a complete
copy of the metadata in memory and updates them in a write-through
manner. One embodiment of the present disclosure use a large block
group size (e.g., 256 KB) to reduce the size of the in-memory
metadata.
[0034] Cache Structure: vStore in one embodiment of the present
disclosure organizes local storage as a set-associative cache with
write-back policy by default. We describe the cache as a table-like
structure, where a cache set is a column in the table, and a cache
row is a row in the table. A cache row includes multiple block
groups. A block group has contents coming from one virtual disk,
but different block groups in the same cache row may have contents
coming from different virtual disks. Block groups in the same cache
row are laid out in logically contiguous disk blocks in one
embodiment of the present disclosure.
[0035] FIG. 3 illustrates structure of one cache entry in one
embodiment of the present disclosure. A block group includes n
number of 4 kilobyte (KB) blocks and each 4 KB blocks have
trailers. For instance, each 4 KB block 302 in a block group 304
has a 512-byte trailer 306 shown in FIG. 3. This trailer 306 in one
embodiment includes metadata 308 and the hash value 310 of the 4 KB
data block 302. On a write operation, vStore computes the hash of
the 4 KB block 302, and writes the 4 KB block 302 and its 512-byte
trailer 306 in a single write operation. If the host crashes during
the write operation, after recovery, the hash value helps detect
that the 4 KB block and the trailer are inconsistent. The 4 KB
block can be safely discarded, because the completion of the write
operation has not been acknowledged to the VM yet. When handling a
read request, vStore also reads the 512-byte trailer 306 together
with the 4 KB block 302. As a result, a sequential read of two
adjacent blocks issued by the VM is also sequential in the cache.
If only the 4 KB data block is read without the trailer, the
sequential request would be broken into two sub-requests, spaced
apart by 512 bytes.
Cache Replacement
[0036] In one aspect, simple policies like least recently used
(LRU) and least frequently used (LFU) may not be suitable for
vStore, because they are designed primarily for memory-based cache
without consideration of block sequentiality on disk. If two
consecutive blocks in a virtual disk are placed at two random
locations in vStore's cache, sequential I/O requests issued by the
VM become random accesses on the physical disk. In one embodiment,
vStore's cache replacement algorithm strives to preserve the
sequentiality of a virtual disk's blocks.
[0037] Below, we describe an embodiment of vStore's cache
replacement algorithm in detail. We introduce the concept of base
cache row of a virtual disk. The base cache row is the default
cache row on which the first row of blocks of a virtual disk is
placed. Subsequent blocks of the virtual disk are mapped to the
subsequent cache rows. For example, if there are two virtual disks
Disk.sub.1 and Disk.sub.2 currently attached to the vStore and the
cache associativity is 5 (i.e., there are 5 cache rows), then Disk
might be assigned 1 as a base cache row and Disk.sub.2 might be
assigned 3 to keep them reasonably away from each other. If we
assume one cache row is made of ten 128 KB cache groups,
Disk.sub.2's block at address 1280K will be mapped to row 4 which
is the next row from Disk.sub.2's base cache row.
[0038] Upon arrival of new data block, vStore in one embodiment
determines the cache location in two steps. First, it looks at the
cache entry's state whose location is calculated using the base
cache row and the block's address. If it is invalid or not dirty,
then it is immediately assigned to the cache entry. If dirty, a
victim entry is selected based on the scores. Six criteria may be
used to calculate the score one embodiment. [0039]
Recentness--E.g., the more recently accessed, higher the score.
[0040] Prior Sequentiality--This measures how sequential the cache
entry is with respect to the adjacent cache entries. If the cache
entry is already sequential, then we prefer to keep it in one
embodiment. [0041] Prior Distance--This measures how far away the
cache entry is from the default base cache row. If the entry is
located in cache row 2 and the default base cache row of the
virtual disk is 1, then the value is 2-1=1. [0042] Posterior
Sequentiality--This measures how sequential it will be if we cache
new block. If it becomes sequential, then we prefer this cache
entry as a victim. [0043] Posterior Distance--This measures how far
away from the default base cache row it would be if we cache new
block. If this distance is far, it is less preferable. [0044]
Dirtiness--If the cache entry is modified, we would like to avoid
evicting this entry as much as possible.
[0045] Let x.sub.i be each of the six criteria described above,
e.g., for i=1 to 6. A score may be computed using equation (1) as
follows.
S=a.sub.0x.sub.0+a.sub.1x.sub.1+ . . . +a.sub.5x.sub.5 (1)
[0046] Here the coefficient a.sub.i represents the weight of each
criterion. If all a.sub.i is 0 except for a.sub.5, the eviction
policy becomes equivalent to LRU. Weight coefficients are
adjustable according to the preference. In one embodiment, this
value (score) is computed for all the cache entry within the cache
set and the entry with the lowest score is chosen for eviction.
[0047] Cache Handling Operations
[0048] In one embodiment of the present disclosure, there may be
three cases in cache handling--cache hit, miss without flush and
miss with flush. In one embodiment, vStore design considers both
performance and data integrity in its cache handling operations.
Since vStore uses disk as a cache space, cache handling has more
disk access than when cache were not used. Excessive disk accesses
may degrade the overall performance and reduce the merit of using
vStore. In one embodiment of the present disclosure, disk accesses
are minimized to make the performance loss tolerable. vStore may
address data integrity, in one embodiment as follows. 512 byte
trailer to each 4K blocks is added to record hash of it. In order
to minimize disk I/O in one embodiment of the present disclosure,
we read and write the trailer together. This only increases data
size, but does not increase the number of I/O. However, for cache
miss handling, additional disk I/O for data integrity may be
introduced. In general, such consistency issue complicates overall
cache handling and there may be a trade-off between maintaining
consistency and performance penalty due to additional disk I/O.
[0049] FIG. 4A is a flow diagram illustarting a read request
handling in one embodiment of the present disclosure. FIG. 4B is a
flow diagram illustarting a write request handling in one
embodiment of the present disclosure.
[0050] READ Handling
[0051] FIG. 4A illustrates a flow diagram for read cache handling
in one embodiment of the present disclosure. At 402, a read request
is received. The read request may originate from an application in
a VM, for example to read data X. At 404, it is determined whether
the block group which stores the data of the read request is
already cached. For example, the sector address of the read data is
compared with the in-memory metatdata to determine whether the
block group is cached already. If it is determined that the block
group is cached, the flow logic proceeds to 406, otherwise the flow
logic proceeds to 420.
[0052] Using a virtual disk involves multiple steps: open the
virtual disk, perform reads/writes, and finally close the virtual
disk. When the virtual disk was opened, vStore assigns a "Virtual
Disk ID" to the virtual disk and maps it to a remote disk on
storage server (virtual disk ID was described previously). This
mapping relationship is kept in a mapping table, and stored both in
memory and on disk in one embodiment. When the VM issues a read
request, vStore knows the Virtual Disk ID implicitly (because the
request comes from a previously opened handle) and the sector
address is specified explicitly. Combining the virtual disk ID and
the sector address as one search key to look up the in-memory
metadata can determine whether the data is cached and if so which
block group currently caches the data. The following shows an
example data struc-ture of the combined search key.
TABLE-US-00002 Virtual 2 Bytes Disk ID Sector Address 4 Bytes
[0053] At 406, it is determined whether the 4 KB block
corresponding to the requested read data, e.g., data X is cached.
If so, at 408, local disk is read to retrieve the data. At 410, the
data is returned to the requestor. If at 406, it is determined that
parts of the requested read data are cached while other parts are
not cached (e.g., 1 KB in the cache and 3 KB on remote storage
server), the cached block group from the local disk is read at 412.
At 414, data corresponding the reqeusted read data is read from the
remote disk and returned at 416. At 418, the locally read data and
the remotely read data are merged. The merged data is written to
cache for later reuse on a cache hit.
[0054] At 404, if it is determined that the block group
corresponding to the requested read data is not cached, the cache
replacement algorithm chooses a location in the cache to hold the
requested read data. At 420, it is determined whether the old data
currently cached at that location is dirty, i.e., the old data of
that cache entry needs to be stored or updated in the remote
storage since that old data will be evicted from the cache. At 420,
if the cache entry is not dirty, the requested read data is read
from the remote storage device at 422. The data is returned at 424
and written to cache at 426.
[0055] At 420, if it is determined that the old data in the cache
entry is dirty, at 428, Bit Vector is examined to determine whether
the old data in the cache entry is partially valid, i.e., part of
the data are stored in the cache while the other part are stored on
the remote storage server. Partial validity may be determined, for
example, by reading the bit vector values for each of the 4 KB
blocks in the block group. For instance, if a bit in the bit vector
is 0, that part of the data is in local cache. If it is 1 that part
of the data is on remote storage. If it is determined that the
existing data in the cache entry is partially valid, the
corresponding data from the remote storage device is read at 430.
At 432, if the entire data of the cache entry is valid, the data is
read from the local storage. At 434, the cache entry data is
written to remote storate. If the cache entry data has partially
valid data, the remotely read data (at 430) is merged with the
locally read data (at 432) before the data is written to the remote
storage at 434. At 436, the requested read data is read from the
remote storage. The read data is returned at 438 to the requestor
(e.g., the application that requested it). At 440, the reqesuted
read data retrieved from the remote storage is written to cache.
Here, the merge at 442 implies a wait for operations on both
incoming links (434, 438) to complete, before performing the
operation on the outgoing link (440). This is used, for example, to
gurantee data integrity or to wait for data from both lock disk and
remote storage.
[0056] A difference of read handling in FIG. 4A from write handling
shown in FIG. 4B is that vStore can return the data as soon as it
is available and continue the rest of the cache operations in
background. This is reflected in the miss handling operations
(e.g., 420 to 440). For example, remote read (e.g., 422, 436) may
be initiated first. As soon as vStore finishes reading the
requested block, it returns with the data (e.g., 424, 438). On-disk
metadata update and cache data write may be performed afterwards
(e.g., 426, 440).
[0057] WRITE Handling
[0058] FIG. 4B is a flow diagram illustarting a write request
handling in one embodiment of the present disclosure. At 450, write
request (or command) is received to write data (e.g., data X). At
452, it is determined whether the block group to which the
requested write data belongs, is cached, e.g., using virtual disk
ID and sector number as the search key to look up the in-memory
metadata. At 454, if the data is cached, the data is written to the
local storage, i.e., cached. At 456, the process returns, for
instance, acknowledging successful write to the requestor.
[0059] At 458, if the block group is not cached, it is determined
as to whether the block group is dirty, i.e., whether the data
content of the block group is modified. Whether the content of the
block group is modified may be determined from reading the metadata
associated with the block group and the values for the dirty bits
of the 4 KB blocks contained therein. At 460, if the content of the
block group is determined to be not modified (i.e., not dirty), the
requested write data is written to cache. At 462, the process
returns, for instance, acknowledging successful write to the
requestor.
[0060] If the content of the block group is modified, that data
should be written out to the remote storage before the write data
can overwrite the existing content of the block group. At 464, if
the content of the block group is dirty (modified), it is
determined whether the current content of the block group is
partially valid. At 466, if the content is only partially valid,
the remotely stored data corresponding to that content is read.
This data may be merged with the current content of the block group
in the local storage in order to make the local block group content
wholely valid. At 468, the block group's content is read at 468. At
470, the content of the block group is written to the remote
storage. At 472, the requested write data is written to cache at
the location of the block group. At 474, the process returns, for
instance, acknowledging successful write to the requestor.
[0061] For write requests, vStore in one embodiment directly writes
the data to the cache without accessing the network attached
storage. This simplifies operations of cache hit and cache miss
without flush. But, write handling for cache miss with flush may
make several I/O requests. In FIG. 4B, the write handling returns
at the end of entire operation sequences. In the worst case, write
handling incurs at most four disk I/Os, which may occur in the case
of cache miss with flush.
[0062] Destaging
[0063] Destaging refers to the process of flushing dirty (modified)
data in the cache to the network attached storage. The destaging
functionality in one embodiment of the present disclosure may be
used to keep the proportion of dirty blocks under a specified
level. Large number of dirty blocks is potentially harmful to the
performance because evicting a dirty cache entry delays the cache
handling operations significantly due to flushing operations. In
addition, detachment of a virtual disk can be faster when there are
less number of dirty blocks. If a VM wants to terminate or migrate,
it has to detach the virtual disk. As part of the detachment
process, all the dirty blocks belonging to the detaching storage
has to be flushed. Without destaging, the amount of data that has
to be transferred can be as large as orders of several gigabytes.
Transferring that amount of data takes time and also generates
bursty traffic.
[0064] Mechanism Design
[0065] In one embodimnet of the present disclosure, destaging may
be triggered when the number of dirty blocks in the cache exceeds
the user-specified level, which we call the pollution level. For
example, if the pollution level is set to be 65%, it means that
user wants to keep the ratio of dirty blocks to total blocks below
65%.
[0066] Upon destaging, vStore in one embodiment may determine how
many blocks to destage at a given time t. Basic idea in one
embodiment is to maintain a window size w.sub.t which indicates the
total allowed data transmission size in unit of bytes per
millisecond (Bpms). This window size is the combined data
transmission size for both normal remote storage accesses and the
destaging. It is specified as a rate (Bpms) since destaging action
can be fired at irregularly. If w.sub.t increases, then may be more
likely that normal network attached storage access would leave more
bandwidth available for destaging.
[0067] Control technique for w.sub.t in vStore may adopt the
technique used for flow control in FAST TCP and for queue lengths
adjustment. w.sub.t may be adjusted using the network attached
storage latency. Let R be the desired network attached storage
latency. Let R.sub.t be the exponentially weighted moving average
of observed network attached storage latency expressed as
R.sub.t=(1-.alpha.)R+.alpha.R.sub.t-1, where .alpha. is a smoothing
factor. We calculate w.sub.t using
w t = ( 1 - .gamma. ) w t - 1 + .gamma. R R t w t - 1 ( 2 )
##EQU00001##
where .gamma. is another smoothing factor for w.sub.t. If observed
remote latency is smaller than R, then w.sub.t will increase and
vice versa. In vStore, we also may consider the local latency
denoted as v.sub.t.
[0068] If we let L.sub.t be the latency of local disk, we calculate
v.sub.t as
v t = ( 1 - .gamma. ) v t - 1 + .gamma. L L t v t - 1 .
##EQU00002##
We take the minimum of w.sub.t and v.sub.t as the window size. Next
we calculate how many block groups to destage using determined
window size. Let d.sub.t denote the number of destage I/O to
perform at time t, then
d.sub.t=(min(v.sub.t,w.sub.t).times..tau..sub.t-C.sub.t)/B (3)
where .tau..sub.t is time length between t and t-1 in millisec, B
the block group size and C.sub.t pending I/O requests at time t in
bytes. C.sub.t represents the remote access from normal file system
operations. Destaging may happen only if d.sub.t>0.
[0069] vStore may be implemented using Xen's blktap interface. Xen
is a virtual machine montior. Virtual machine monitor, also
referred to as hypervisor, allows guest operating systems to excute
on the same computer hardware concurrently. Other virtual machine
monitors may be used for implementing the vStore. FIG. 5 shows as
an example, the Xen implementation of vStore in one embodiment of
the present disclsoure. Blktap mechanism redirects a VM's disk I/O
requests to a tapdisk process 508 running in the userspace of
Domain-0. In a para-virtualized VM, user application 502 reads or
writes to the blkfront device 504. Normally blkfront connects to
the blkback and all the block traffics are delivered to it. If
blktap 506 is enabled, blktap replaces blkback and all the block
traffics are now redirected to the tapdisk process 508. Overall the
blktap mechanism provides convenient method to intercept block
traffics and implement new functionalities in the user space.
[0070] Xen ships with several types of tapdisks so that tapdisk
process can open the block device using the specified disk type.
Disk types are simply a set of callback functions such as open,
close, read, write, do callback and submit. Among several disk
types, synchronous I/O type uses normal read, write system calls to
handle each incoming block I/Os. AIO-based disk type uses Linux AIO
library to issue multiple block requests in a batch. vStore also
may implement those predefined set of callback functions and
registers to tapdisk as another type of tapdisk. vStore 510 may be
based on the asynchronous I/O mechanism. For example, vStore
submits requests to the Linux AIO library 512 and periodically
polls for completed I/Os. Thus, internal structure of vStore 510
may be an event-driven architecture. A vStore also may be
implemented using synchronous I/O in another embodiment.
[0071] In another aspect, the architecture of the present
disclosure may also include cloud storage infrstructure which has
features such as cache block transfer between VM hosts to support
fast migration, replication of cache blocks to nearby storage
(possibly at higher level of hierarchy or same rack) within other
hosts to support fast restart of VMs on a failed host, and an
intelligent workload balancing mechanism between using the local
stroage and the remote storage for performance and/or cost
optimization, e.g., a mechanism to dyanmically determine using
remote storage or local cache.
[0072] As will be appreciated by one skilled in the art, aspects of
the present invention may be embodied as a system, method or
computer program product. Accordingly, aspects of the present
invention may take the form of an entirely hardware embodiment, an
entirely software embodiment (including firmware, resident
software, micro-code, etc.) or an embodiment combining software and
hardware aspects that may all generally be referred to herein as a
"circuit," "module" or "system." Furthermore, aspects of the
present invention may take the form of a computer program product
embodied in one or more computer readable medium(s) having computer
readable program code embodied thereon.
[0073] Any combination of one or more computer readable medium(s)
may be utilized. The computer readable medium may be a computer
readable signal medium or a computer readable storage medium. A
computer readable storage medium may be, for example, but not
limited to, an electronic, magnetic, optical, electromagnetic,
infrared, or semiconductor system, apparatus, or device, or any
suitable combination of the foregoing. More specific examples (a
non-exhaustive list) of the computer readable storage medium would
include the following: an electrical connection having one or more
wires, a portable computer diskette, a hard disk, a random access
memory (RAM), a read-only memory (ROM), an erasable programmable
read-only memory (EPROM or Flash memory), an optical fiber, a
portable compact disc read-only memory (CD-ROM), an optical storage
device, a magnetic storage device, or any suitable combination of
the foregoing. In the context of this document, a computer readable
storage medium may be any tangible medium that can contain, or
store a program for use by or in connection with an instruction
execution system, apparatus, or device.
[0074] A computer readable signal medium may include a propagated
data signal with computer readable program code embodied therein,
for example, in baseband or as part of a carrier wave. Such a
propagated signal may take any of a variety of forms, including,
but not limited to, electro-magnetic, optical, or any suitable
combination thereof. A computer readable signal medium may be any
computer readable medium that is not a computer readable storage
medium and that can communicate, propagate, or transport a program
for use by or in connection with an instruction execution system,
apparatus, or device.
[0075] Program code embodied on a computer readable medium may be
transmitted using any appropriate medium, including but not limited
to wireless, wireline, optical fiber cable, RF, etc., or any
suitable combination of the foregoing.
[0076] Computer program code for carrying out operations for
aspects of the present invention may be written in any combination
of one or more programming languages, including an object oriented
programming language such as Java, Smalltalk, C++ or the like and
conventional procedural programming languages, such as the "C"
programming language or similar programming languages, a scripting
language such as Perl, VBS or similar languages, and/or functional
languages such as Lisp and ML and logic-oriented languages such as
Prolog. The program code may execute entirely on the user's
computer, partly on the user's computer, as a stand-alone software
package, partly on the user's computer and partly on a remote
computer or entirely on the remote computer or server. In the
latter scenario, the remote computer may be connected to the user's
computer through any type of network, including a local area
network (LAN) or a wide area network (WAN), or the connection may
be made to an external computer (for example, through the Internet
using an Internet Service Provider).
[0077] Aspects of the present invention are described with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems) and computer program products
according to embodiments of the invention. It will be understood
that each block of the flowchart illustrations and/or block
diagrams, and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer program
instructions. These computer program instructions may be provided
to a processor of a general purpose computer, special purpose
computer, or other programmable data processing apparatus to
produce a machine, such that the instructions, which execute via
the processor of the computer or other programmable data processing
apparatus, create means for implementing the functions/acts
specified in the flowchart and/or block diagram block or
blocks.
[0078] These computer program instructions may also be stored in a
computer readable medium that can direct a computer, other
programmable data processing apparatus, or other devices to
function in a particular manner, such that the instructions stored
in the computer readable medium produce an article of manufacture
including instructions which implement the function/act specified
in the flowchart and/or block diagram block or blocks.
[0079] The computer program instructions may also be loaded onto a
computer, other programmable data processing apparatus, or other
devices to cause a series of operational steps to be performed on
the computer, other programmable apparatus or other devices to
produce a computer implemented process such that the instructions
which execute on the computer or other programmable apparatus
provide processes for implementing the functions/acts specified in
the flowchart and/or block diagram block or blocks.
[0080] The flowchart and block diagrams in the figures illustrate
the architecture, functionality, and operation of possible
implementations of systems, methods and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagrams may represent
a module, segment, or portion of code, which comprises one or more
executable instructions for implementing the specified logical
function(s). It should also be noted that, in some alternative
implementations, the functions noted in the block may occur out of
the order noted in the figures. For example, two blocks shown in
succession may, in fact, be executed substantially concurrently, or
the blocks may sometimes be executed in the reverse order,
depending upon the functionality involved. It will also be noted
that each block of the block diagrams and/or flowchart
illustration, and combinations of blocks in the block diagrams
and/or flowchart illustration, can be implemented by special
purpose hardware-based systems that perform the specified functions
or acts, or combinations of special purpose hardware and computer
instructions.
[0081] The systems and methodologies of the present disclosure may
be carried out or executed in a computer system that includes a
processing unit, which houses one or more processors and/or cores,
memory and other systems components (not shown expressly in the
drawing) that implement a computer processing system, or computer
that may execute a computer program product. The computer program
product may comprise media, for example a hard disk, a compact
storage medium such as a compact disc, or other storage devices,
which may be read by the processing unit by any techniques known or
will be known to the skilled artisan for providing the computer
program product to the processing system for execution.
[0082] The computer program product may comprise all the respective
features enabling the implementation of the methodology described
herein, and which--when loaded in a computer system--is able to
carry out the methods. Computer program, software program, program,
or software, in the present context means any expression, in any
language, code or notation, of a set of instructions intended to
cause a system having an information processing capability to
perform a particular function either directly or after either or
both of the following: (a) conversion to another language, code or
notation; and/or (b) reproduction in a different material form.
[0083] The computer processing system that carries out the system
and method of the present disclosure may also include a display
device such as a monitor or display screen for presenting output
displays and providing a display through which the user may input
data and interact with the processing system, for instance, in
cooperation with input devices such as the keyboard and mouse
device or pointing device. The computer processing system may be
also connected or coupled to one or more peripheral devices such as
the printer, scanner, speaker, and any other devices, directly or
via remote connections. The computer processing system may be
connected or coupled to one or more other processing systems such
as a server, other remote computer processing system, network
storage devices, via any one or more of a local Ethernet, WAN
connection, Internet, etc. or via any other networking
methodologies that connect different computing systems and allow
them to communicate with one another. The various functionalities
and modules of the systems and methods of the present disclosure
may be implemented or carried out distributedly on different
processing systems or on any single platform, for instance,
accessing data stored locally or distributedly on the network.
[0084] The terminology used herein is for the purpose of describing
particular embodiments only and is not intended to be limiting of
the invention. As used herein, the singular forms "a", "an" and
"the" are intended to include the plural forms as well, unless the
context clearly indicates otherwise. It will be further understood
that the terms "comprises" and/or "comprising," when used in this
specification, specify the presence of stated features, integers,
steps, operations, elements, and/or components, but do not preclude
the presence or addition of one or more other features, integers,
steps, operations, elements, components, and/or groups thereof.
[0085] The corresponding structures, materials, acts, and
equivalents of all means or step plus function elements, if any, in
the claims below are intended to include any structure, material,
or act for performing the function in combination with other
claimed elements as specifically claimed. The description of the
present invention has been presented for purposes of illustration
and description, but is not intended to be exhaustive or limited to
the invention in the form disclosed. Many modifications and
variations will be apparent to those of ordinary skill in the art
without departing from the scope and spirit of the invention. The
embodiment was chosen and described in order to best explain the
principles of the invention and the practical application, and to
enable others of ordinary skill in the art to understand the
invention for various embodiments with various modifications as are
suited to the particular use contemplated.
[0086] Various aspects of the present disclosure may be embodied as
a program, software, or computer instructions embodied in a
computer or machine usable or readable medium, which causes the
computer or machine to perform the steps of the method when
executed on the computer, processor, and/or machine. A program
storage device readable by a machine, tangibly embodying a program
of instructions executable by the machine to perform various
functionalities and methods described in the present disclosure is
also provided.
[0087] The system and method of the present disclosure may be
implemented and run on a general-purpose computer or
special-purpose computer system. The computer system may be any
type of known or will be known systems and may typically include a
processor, memory device, a storage device, input/output devices,
internal buses, and/or a communications interface for communicating
with other computer systems in conjunction with communication
hardware and software, etc.
[0088] The terms "computer system" and "computer network" as may be
used in the present application may include a variety of
combinations of fixed and/or portable computer hardware, software,
peripherals, and storage devices. The computer system may include a
plurality of individual components that are networked or otherwise
linked to perform collaboratively, or may include one or more
stand-alone components. The hardware and software components of the
computer system of the present application may include and may be
included within fixed and portable devices such as desktop, laptop,
and/or server. A module may be a component of a device, software,
program, or system that implements some "functionality", which can
be embodied as software, hardware, firmware, electronic circuitry,
or etc.
[0089] The embodiments described above are illustrative examples
and it should not be construed that the present invention is
limited to these particular embodiments. Thus, various changes and
modifications may be effected by one skilled in the art without
departing from the spirit or scope of the invention as defined in
the appended claims.
* * * * *