U.S. patent application number 11/156636 was filed with the patent office on 2005-10-13 for external encapsulation of a volume into a lun to allow booting and installation on a complex volume.
This patent application is currently assigned to VERITAS Operating Corporation. Invention is credited to Karr, Ronald S..
Application Number | 20050228950 11/156636 |
Document ID | / |
Family ID | 34592023 |
Filed Date | 2005-10-13 |
United States Patent
Application |
20050228950 |
Kind Code |
A1 |
Karr, Ronald S. |
October 13, 2005 |
External encapsulation of a volume into a LUN to allow booting and
installation on a complex volume
Abstract
A system for external encapsulation of a volume into a logical
unit (LUN) to allow booting and installation on a complex volume
may include a host, one or more physical storage devices, and an
off-host virtualizer. The off-host virtualizer (i.e., a device
external to the host, capable of providing block virtualization
functionality) may be configured to aggregate storage within the
one or more physical storage devices into a logical volume and to
generate metadata to emulate the logical volume as a bootable
target device. The off-host virtualizer may make the metadata
accessible to the host, allowing the host to boot off a file system
resident in the logical volume.
Inventors: |
Karr, Ronald S.; (Palo Alto,
CA) |
Correspondence
Address: |
MEYERTONS, HOOD, KIVLIN, KOWERT & GOETZEL, P.C.
P.O. BOX 398
AUSTIN
TX
78767-0398
US
|
Assignee: |
VERITAS Operating
Corporation
|
Family ID: |
34592023 |
Appl. No.: |
11/156636 |
Filed: |
June 20, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11156636 |
Jun 20, 2005 |
|
|
|
10722614 |
Nov 26, 2003 |
|
|
|
Current U.S.
Class: |
711/114 ;
711/162; 711/170 |
Current CPC
Class: |
G06F 3/067 20130101;
G06F 3/0664 20130101; G06F 3/0607 20130101 |
Class at
Publication: |
711/114 ;
711/170; 711/162 |
International
Class: |
G06F 012/00 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 22, 2004 |
WO |
PCT/US04/39306 |
Claims
What is claimed is:
1. A system comprising: a host; one or more physical storage
devices; an off-host virtualizer; wherein the off-host virtualizer
is configured to: aggregate storage within the one or more physical
storage devices into a logical volume; and generate metadata to
emulate the logical volume as a bootable target device; make the
metadata accessible to the host; and wherein the host is configured
to use the metadata to boot off a file system residing in the
logical volume.
2. The system as recited in claim 1, wherein the logical volume is
a snapshot volume.
3. The system as recited in claim 1, wherein the logical volume is
a replicated volume.
4. The system as recited in claim 1, wherein the logical volume is
a striped volume.
5. The system as recited in claim 1, wherein the one or more
physical storage devices include a first and a second physical
storage device, and wherein the logical volume spans the first and
the second physical storage devices.
6. The system as recited in claim 1, wherein the logical volume is
a RAID volume.
7. The system as recited in claim 1, wherein the logical volume
maps to a boot partition of a designated operating system.
8. The system as recited in claim 7, wherein the designated
operating system is configured to access a plurality of additional
boot-related partitions during a boot operation, and wherein the
off-host virtualizer is further configured to: generate additional
metadata to emulate the logical volume as the plurality of
additional boot-related partitions; and make the additional
metadata accessible to the host.
9. The system as recited in claim 1, wherein, subsequent to an
initial phase of a boot process, the host is configured to access
the logical volume directly without performing I/O through the
off-host virtualizer.
10. The system as recited in claim 9, wherein the host is
configured to use a first network type for the initial phase of the
boot process, and wherein the host is configured to access the
logical volume directly using a second network type.
11. The system as recited in claim 1, wherein a physical storage
device of the one or more physical storage devices includes a fiber
channel logical unit (LUN).
12. The system as recited in claim 1, wherein a physical storage
device of the one or more physical storage devices includes an
iSCSI LUN.
13. The system as recited in claim 1, further comprising a storage
server, wherein the storage server is configured to provide access
to a physical storage device of the one or more physical storage
devices.
14. The system as recited in claim 1, wherein a physical storage
device of the one or more physical storage devices is accessed
using a target-mode host bus adapter of the off-host
virtualizer.
15. The system as recited in claim 1, wherein the off-host
virtualizer is further configured to: present the logical volume to
the host as an installable partition; and wherein the host is
further configured to: boot installation software for the operating
system from removable media; and install the at least a portion of
the operating system on the installable partition.
16. A method comprising: aggregating storage within one or more
physical storage devices into a logical volume; generating metadata
to emulate the logical volume as a bootable target device; making
the metadata accessible to a host; and the host using the metadata
to boot off a file system resident in the logical volume.
17. The method as recited in claim 16, wherein the logical volume
is a snapshot volume.
18. The method as recited in claim 16, wherein the logical volume
is a replicated volume.
19. The method as recited in claim 16, wherein a storage device of
the one or more physical storage devices includes a fibre channel
logical unit (LUN).
20. The method as recited in claim 16, wherein a storage device of
the one or more physical storage devices includes an iSCSI
(Internet SCSI) LUN.
21. The method as recited in claim 16, further comprising: the host
accessing the logical volume subsequent to the boot operation
without performing I/O through the off-host virtualizer.
22. A computer accessible medium comprising program instructions,
wherein the instructions are executable to: aggregate storage
within one or more physical storage devices into a logical volume;
generate metadata to emulate the logical volume as a bootable
target device; make the metadata accessible to a host; and use the
metadata to boot the host off a file system resident in the logical
volume.
23. The computer accessible medium as recited in claim 22, wherein
the logical volume is a snapshot volume.
24. The computer accessible medium as recited in claim 22, wherein
the logical volume is a replicated volume.
25. The computer accessible medium as recited in claim 22, wherein
a storage device of the one or more physical storage devices
includes a fibre channel logical unit (LUN).
26. The computer accessible medium as recited in claim 22, wherein
a storage device of the one or more physical storage devices
includes an iSCSI (Internet SCSI) LUN.
27. The computer accessible medium as recited in claim 22, wherein
the instructions are further executable to: access the logical
volume from the host subsequent to the boot operation without
performing I/O through the off-host virtualizer.
Description
[0001] This application is a continuation-in-part of U.S. patent
application Ser. No. 10/722,614, entitled "SYSTEM AND METHOD FOR
EMULATING OPERATING SYSTEM METADATA TO PROVIDE CROSS-PLATFORM
ACCESS TO STORAGE VOLUMES", filed Nov. 26, 2003.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] This invention relates to computer systems and, more
particularly, to off-host virtualization of bootable devices within
storage environments.
[0004] 2. Description of the Related Art
[0005] Many business organizations and governmental entities rely
upon applications that access large amounts of data, often
exceeding a terabyte of data, for mission-critical applications.
Often such data is stored on many different storage devices, which
may be heterogeneous in nature, including many different types of
devices from many different manufacturers.
[0006] Configuring individual applications that consume data, or
application server systems that host such applications, to
recognize and directly interact with each different storage device
that may possibly be encountered in a heterogeneous storage
environment would be increasingly difficult as the environment
scaled in size and complexity. Therefore, in some storage
environments, specialized storage management software and hardware
may be used to provide a more uniform storage model to storage
consumers. Such software and hardware may also be configured to
present physical storage devices as virtual storage devices (e.g.,
virtual SCSI disks) to computer hosts, and to add storage features
not present in individual storage devices to the storage model. For
example, features to increase fault tolerance, such as data
mirroring, snapshot/fixed image creation, or data parity, as well
as features to increase data access performance, such as disk
striping, may be implemented in the storage model via hardware or
software. The added storage features may be referred to as storage
virtualization features, and the software and/or hardware providing
the virtual storage devices and the added storage features may be
termed "virtualizers" or "virtualization controllers".
Virtualization may be performed within computer hosts, such as
within a volume manager layer of a storage software stack at the
host, and/or in devices external to the host, such as
virtualization switches or virtualization appliances. Such external
devices providing virtualization may be termed "off-host"
virtualizers, and may be utilized in order to offload processing
required for virtualization from the host. Off-host virtualizers
may be connected to the external physical storage devices for which
they provide virtualization functions via a variety of
interconnects, such as Fiber Channel links, Internet Protocol (IP)
networks, and the like.
[0007] In many corporate data centers, as the application workload
increases, additional hosts may need to be provisioned to provide
the required processing capabilities. The internal configuration
(e.g., file system layout and file system sizes) of each of these
additional hosts may be fairly similar, with just a few features
unique to each host. Booting and installing each newly provisioned
host manually may be a cumbersome and error-prone process,
especially in environments where a large number of additional hosts
may be required fairly quickly. A virtualization mechanism that
allows hosts to boot and/or install operating system software off a
virtual bootable target device may be desirable to support
consistent booting and installation for multiple hosts in such
environments. In addition, in some storage environments it may be
desirable to be able to boot and/or install off a snapshot volume
or a replicated volume, for example in order to be able to
re-initialize a host to a state as of a previous point in time
(e.g., the time at which the snapshot or replica was created).
SUMMARY
[0008] Various embodiments of a system and method for external
encapsulation of a volume into a logical unit (LUN) to allow
booting and installation on a complex volume are disclosed.
According to a first embodiment, a system may include a host, one
or more physical storage devices, and an off-host virtualizer. The
off-host virtualizer (i.e., a device external to the host, capable
of providing block virtualization functionality) may be configured
to aggregate storage within the one or more physical storage
devices into a logical volume and to generate metadata to emulate
the logical volume as a bootable target device. The off-host
virtualizer may make the metadata accessible to the host, allowing
the host to boot off the logical volume, e.g., off a file system
resident in the logical volume.
[0009] The metadata generated by the host may include such
information as the layouts or offsets of various boot-related
partitions that the host may need to access during the boot
process, for example to load a file system reader, an operating
system kernel, or additional boot software such as one or more
scripts. The metadata may be operating system-specific, i.e., the
location, format and contents of the metadata may differ from one
operating system to another. In one embodiment, a number of
different logical volumes, each associated with a particular
boot-related partition or file system, may be emulated as part of
the bootable target device. In another embodiment, the off-host
virtualizer may be configured to present an emulated logical volume
as an installable partition (i.e., a partition in which at least a
portion of an operating system may be installed). In such an
embodiment, the host may also be configured to boot installation
software (e.g., off external media), install at least a portion of
the operating system on the installable partition, and then boot
from a LUN containing the encapsulated volume.
[0010] The logical volume aggregated by the off-host virtualizer
may support a number of different virtualization features in
different embodiments. In one embodiment, the logical volume may be
a snapshot volume (i.e., a point-in-time copy of another logical
volume) or a replicated volume. The logical volume may span
multiple physical storage devices, and may be striped, mirrored, or
a virtual RAID volume. In some embodiments, the logical volume may
include a multi-layer hierarchy of logical devices, for example
implementing mirroring at a first layer and striping at a second
layer below the first. In one embodiment, the host may be
configured to access the logical volumes directly (i.e., without
using the metadata) subsequent to an initial phase of the boot
process. For example, during a later phase of the boot process, a
volume manager or other virtualization driver may be activated at
the host. The volume manager or virtualization driver may be
configured to obtain configuration information for the logical
volumes (such as volume layouts), e.g., from the off-host
virtualizer or some other volume configuration server, to allow
direct access.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a block diagram illustrating one embodiment of a
computer system.
[0012] FIG. 2 is a block diagram illustrating one embodiment of a
system where an off-host virtualizer is configured to present one
or more logical volumes as a bootable target device for use by host
during a boot operation.
[0013] FIG. 3a is a block diagram illustrating the mapping of
blocks within a logical volume to a virtual LUN according to one
embodiment.
[0014] FIG. 3b is a block diagram illustrating an example of a
virtual LUN including a plurality of partitions, where each
partition is mapped to a volume, according to one embodiment.
[0015] FIG. 4 is a flow diagram illustrating aspects of the
operation of a system configured to support off-host virtualization
and emulation of a bootable target device, according to one
embodiment.
[0016] FIG. 5 is a block diagram illustrating a logical volume
comprising a multi-layer hierarchy of virtual block devices
according to one embodiment.
[0017] FIG. 6 is a block diagram illustrating an embodiment where
physical storage devices include fibre channel LUNs accessible
through a fibre channel fabric, and an off-host virtualizer
includes a virtualizing switch.
[0018] FIG. 7 is a block diagram illustrating one embodiment where
the Internet SCSI (iSCSI) protocol is used to access the physical
storage devices.
[0019] FIG. 8 is a block diagram illustrating an embodiment where
physical storage devices may be accessible via storage servers
configured to communicate with an off-host virtualizer and a host
using an advanced storage protocol.
[0020] FIG. 9 is a block diagram illustrating an embodiment where
some physical storage devices may be accessible via a target-mode
host bus adapter.
[0021] FIG. 10 is a block diagram illustrating a computer
accessible medium according to one embodiment.
[0022] While the invention is susceptible to various modifications
and alternative forms, specific embodiments are shown by way of
example in the drawings and are herein described in detail. It
should be understood, however, that drawings and detailed
description thereto are not intended to limit the invention to the
particular form disclosed, but on the contrary, the invention is to
cover all modifications, equivalents and alternatives falling
within the spirit and scope of the present invention as defined by
the appended claims.
DETAILED DESCRIPTION
[0023] FIG. 1 illustrates a computer system 100 according to one
embodiment. In the illustrated embodiment, system 100 includes a
host 101 and a bootable target device 120. The host 101 includes a
processor 110 and a memory 112 containing boot code 114. Boot code
114 may be configured to read operating-system specific boot
metadata 122 at a known location or offset within bootable target
device 120, and to use boot metadata 122 to access one or more
partitions 130 (e.g., a partition from among partitions 130A, 130B,
. . . , 130N) of bootable target device 120 in order to bring up or
boot host 101. Partitions 130 may be referred to herein as boot
partitions, and may contain additional boot code that may be loaded
into memory 112 during the boot process.
[0024] The process of booting a host 101 may include several
distinct phases. In a first phase, for example, the host 101 may be
powered on or reset, and may then perform a series of "power on
self test (POST)" operations to test the status of various
constituent hardware elements, such as processor 110, memory 112,
peripheral devices such as a mouse and/or a keyboard, and storage
devices including bootable target device 120. In general, memory
112 may comprise a number of different memory modules, such as a
programmable read only memory (PROM) module containing boot code
114 for early stages of boot, as well as a larger random access
memory for use during later stages of boot and during post-boot or
normal operation of host 101. One or memory caches associate with
processor 110 may also be tested during POST operations. In
traditional systems, bootable target device 120 may typically be a
locally attached physical storage device such as a disk, or in some
cases a removable physical storage device such as a CD-ROM. In
systems employing the Small Computer System Interface (SCSI)
protocol to access storage devices, for example, the bootable
target device may be associated with a SCSI "logical unit"
identified by a logical unit number or LUN. (The term LUN may be
used herein to refer to both the identifier for a SCSI target
device, as well as the SCSI target device itself.) During POST, one
or more SCSI buses attached to the host may be probed, and SCSI
LUNS accessible via the SCSI buses may be identified.
[0025] In some operating systems, a user such as a system
administrator may be allowed to select a bootable target device
from among several choices as a preliminary step during boot,
and/or to set a particular target as the device from which the next
boot should be performed. If the POST operations complete
successfully, boot code 114 may proceed to access the designated
bootable target device 120. That is, boot code 114 may read the
operating system-specific boot metadata 122 from a known location
in bootable target device 120. The specific location and format of
boot-related metadata may vary from system to system; for example,
in many operating systems, boot metadata 122 is stored in the first
few blocks of bootable target device 120.
[0026] Operating system specific boot metadata 122 may include the
location or offsets of one or more partitions (e.g., in the form of
a partition table), such as partitions 130A-130N (which may be
generically referred to herein as partitions 130), to which access
may be required during subsequent phases of the boot process. In
some environments the boot metadata 122 may also include one or
more software modules, such as a file system reader, that may be
required to access one or more partitions 130. The file system
reader may then be read into memory at the host 101 (such as memory
112), and used to load one or more additional or secondary boot
programs (i.e., additional boot code) from a partition 130. The
additional or secondary boot programs may then be loaded and
executed, resulting for example in an initialization of an
operating system kernel, followed by an execution of one or more
scripts in a prescribed sequence, ultimately leading to the host
reaching a desired "run level" or mode of operation. Various
background processes (such as network daemon processes in operating
systems derived from UNIX, volume managers, etc.) and designated
application processes (e.g., a web server or a database management
server configured to restart automatically upon reboot) may also be
started up during later boot phases. When the desired mode of
operation is reached, host 101 may allow a user to log in and begin
desired user-initiated operations, or may begin providing a set of
preconfigured services (such as web server or database server
functionality). The exact nature and sequence of operations
performed during boot may vary from one operating system to
another.
[0027] If host 101 is a newly provisioned host without an installed
operating system, or if host 101 is being reinstalled or upgraded
with a new version of its operating system, the boot process may be
followed by installation of desired portions of the operating
system. For example, the boot process may end with a prompt being
displayed to the user or administrator, allowing the user to
specify a device from which operating system modules may be
installed, and to select from among optional operating system
modules. In some environments, the installation of operating system
components on a newly provisioned host may be automated--e.g., one
or more scripts run during (or at the end of) the boot process may
initiate installation of desired operating system components from a
specified device.
[0028] As noted above, traditionally, computer hosts 101 have
usually been configured to boot off a local disk (i.e., disks
attached to the host) or local removable media. For example, hosts
configured to use a UNIX.TM.-based operating system may be
configured to boot off a "root" file system on a local disk, while
hosts configured with a version of the Windows.TM. operating system
from Microsoft Corporation may be configured to boot off a "system
partition" on a local disk. However, in some storage environments
it may be possible to configure a host 101 to boot off a virtual
bootable target device, that is, a device that has been aggregated
from one or more backing physical storage devices by a virtualizer
or virtualization coordinator, where the backing physical storage
may be accessible via a network instead of being locally accessible
at the host 101. The file systems and/or partitions expected by the
operating system at the host may be emulated as being resident in
the virtual bootable target device. FIG. 2 is a block diagram
illustrating one embodiment of a system 200, where an off-host
virtualizer 210 is configured to present one or more logical
volumes 240 as a bootable target device 250 for use by host 101
during a boot operation. Off-host virtualizer 210 may be coupled to
host 101 and to one or more physical storage devices 220 (i.e.,
physical storage devices 220A-220N) over a network 260. As
described below in further detail, network 260 may be implemented
using a variety of physical interconnects and protocols, and in
some embodiments may include a plurality of independently
configured networks, such as fibre channel fabrics and/or IP-based
networks.
[0029] In general, virtualization refers to a process of creating
or aggregating logical or virtual devices out of one or more
underlying physical or logical devices, and making the virtual
devices accessible to device consumers for storage operations. The
entity or entities that perform the desired virtualization may be
termed virtualizers. Virtualizers may be incorporated within hosts
(e.g., in one or more software layers within host 101) or at
external devices such as one or more virtualization switches,
virtualization appliances, etc., which may be termed off-host
virtualizers. In FIG. 2, for example, off-host virtualizer 210 may
be configured to aggregate storage from physical storage devices
220 (i.e., physical storage devices 220A-220N)) into logical
volumes 240 (i.e., logical volumes 240A-240M). In the illustrated
embodiment, each physical storage device 220 and logical storage
device 240 may be configured as a block device, i.e., a device that
provides a collection of linearly addressed data blocks that can be
read or written. In such an embodiment, off-host virtualizer 210
may be said to perform block virtualization. A variety of advanced
storage functions may be supported by a block virtualizer such as
off-host virtualizer 210 in different embodiments, such as the
ability to create snapshots or point-in-time copies, replicas, and
the like. In one embodiment of block virtualization, one or more
layers of software may rearrange blocks from one or more physical
block devices, such as disks. The resulting rearranged collection
of blocks may then be presented to a storage consumer, such as an
application or a file system at host 101, as one or more aggregated
devices with the appearance of one or more basic disk drives. That
is, the more complex structure resulting from rearranging blocks
and adding functionality may be presented as if it were one or more
simple arrays of blocks, or logical block devices. In some
embodiments, multiple layers of virtualization may be implemented.
That is, one or more block devices may be mapped into a particular
virtualized block device, which may be in turn mapped into still
another virtualized block device, allowing complex storage
functions to be implemented with simple block devices. Further
details on block virtualization, and advanced storage features
supported by block virtualization, are provided below.
[0030] In addition to aggregating storage into logical volumes,
off-host virtualizer 210 may also be configured to emulate storage
within one or more logical volumes 240 as a bootable target device
250. That is, off-host virtualizer 210 may be configured to
generate operating system-specific boot metadata 122 to make a
range of storage within the one or more logical volumes 240 appear
as a bootable partition (e.g., a partition 130) and/or file system
to host 101. The generation and presentation of operating system
specific metadata, such as boot metadata 122, for the purpose of
making a logical volume appear as an addressable storage device
(e.g., a LUN) to a host may be termed "volume tunneling". The
virtual addressable storage device presented to the host using such
a technique may be termed a "virtual LUN". Volume tunneling may be
employed for other purposes in addition to the emulation of
bootable target devices, e.g., to support dynamic mappings of
logical volumes to virtual LUNs, to provide an isolating layer
between front-end virtual LUNs and back-end or physical LUNs,
etc.
[0031] FIG. 3a is a block diagram illustrating the mapping of
blocks within a logical volume to a virtual LUN according to one
embodiment. In the illustrated embodiment, a source logical volume
305 comprising N blocks of data (numbered from 0 through (N-1)) may
be encapsulated or tunneled through a virtual LUN 310 comprising
(N+H) blocks. Off-host virtualizer 210 may be configured to
logically insert operating system specific boot metadata in a
header 315 comprising the first H blocks of the virtual LUN 310,
and the remaining N blocks of virtual LUN 310 may map to the N
blocks of source logical volume 305. A host 101 may be configured
to boot off virtual LUN 310, for example by setting the boot target
device for the host to the identifier of the virtual LUN 310.
Metadata contained in header 315 may be set up to match the format
and content expected by boot code 114 at a LUN header of a bootable
device for a desired operating system, and the contents of logical
volume 305 may include, for example, the contents expected by boot
code 114 in one or more partitions 130. In some embodiments, the
metadata and/or the contents of the logical volume may be
customized for the particular host being booted: for example, some
of the file system contents or scripts accessed by the host 101
during various boot phases may be modified to support requirements
specific to the particular host 101. Examples of such customization
may include configuration parameters for hardware devices at the
host (e.g., if a particular host employs multiple Ethernet network
cards, some of the networking-related scripts may be modified),
customized file systems, or customized file system sizes. In
general, the generated metadata required for volume tunneling may
be located at a variety of different offsets within the logical
volume address space, such as within a header 315, a trailer, at
some other designated offset within the virtual LUN 310, or at a
combination of locations within the virtual LUN 310. The number of
data blocks dedicated to operating system specific metadata (e.g.,
the length of header 315), as well as the format and content of the
metadata, may vary with the operating system in use at host
101.
[0032] The metadata inserted within virtual LUN 310 may be stored
in persistent storage, e.g., within some blocks of a physical
storage device 220 or at off-host virtualizer 210, in some
embodiments, and logically concatenated with the mapped blocks 320.
In other embodiments, the metadata may be generated on the fly,
whenever a host 101 accesses the virtual LUN 310. In some
embodiments, the metadata may be generated by an external agent
other than off-host virtualizer 210. The external agent may be
capable of emulating metadata in a variety of formats for different
operating systems, including operating systems that may not have
been known when the off-host virtualizer 210 was deployed. In one
embodiment, off-host virtualizer 210 may be configured to support
more than one operating system; i.e., off-host virtualizer 210 may
logically insert metadata blocks corresponding to any one of a
number of different operating systems when presenting virtual LUN
310 to a host 101, thereby allowing hosts intended to use different
operating systems to share virtual LUN 310. In some embodiments, a
plurality of virtual LUNs emulating bootable target devices, each
corresponding to a different operating system, may be set up in
advance, and off-host virtualizer 210 may be configured to select a
particular virtual LUN for presentation to a host for booting. In
large data centers, a set of relatively inexpensive servers (which
may be termed "boot servers") may be designated to serve as a pool
of off-host virtualizers dedicated to provide emulated bootable
target devices for use as needed throughout the data center.
Whenever a newly provisioned host in the data center needs to be
booted and/or installed, a bootable target device presented by one
of the boot servers may be used, thus supporting consistent
configurations at the hosts of the data center as the data center
grows.
[0033] For some operating systems, off-host virtualizer 210 may
emulate a number of different boot-related volumes using a
plurality of partitions within the virtual LUN 310. FIG. 3b is a
block diagram illustrating an exemplary virtual LUN 310 according
to one embodiment, where the virtual LUN includes three emulated
partitions 341A-341C. An off-host virtualizer 210 (not shown in
FIG. 3b) may be configured to present virtual LUN 310 to a host bus
adapter 330 and/or disk driver 325 at host 101. Each partition 341
may be mapped to a respective volume 345 that may be accessed
during boot and/or operating system installation. In the depicted
example, partitions corresponding to three volumes 345A-345C used
respectively for a "/" (root) file system, a "/usr" file system and
a "/swap" file system, each of which may be accessed by a host 101
employing a UNIX-based operating system, are shown. In such
embodiments, where multiple volumes and/or file systems are
emulated within the same virtual LUN, additional operating system
specific metadata identifying the address ranges within the virtual
LUN where the corresponding partitions are located may be provided
by off-host virtualizer 210 to host 110. In the example depicted in
FIG. 3b, the address ranges for partitions 341A-341C are provided
in a virtual table of contents (VTOC) structure 340. The additional
metadata may be included with boot metadata 122 in some
embodiments. In other embodiments, the additional metadata may be
provided at some other location within the address space of the
virtual LUN, or provided to the host 101 using another mechanism,
such as extended SCSI mode pages or messages sent over a network
from off-host virtualizer 210 to host 101. In some embodiments, the
additional metadata may also be customized to suit the specific
requirements of a particular host 101; e.g., not all hosts may
require the same modules of an operating system to be installed
and/or upgraded.
[0034] As noted above and illustrated in FIG. 3b, in some
embodiments, off-host virtualizer 210 may be configured to present
an emulated logical volume 240 as an installable partition or
volume to host 101--i.e., a partition or volume to which at least a
portion of an operating system may be installed. The host 101 may
be configured to boot installation software (e.g., off removable
media such as a CD provided by the operating system vendor), and
then install desired portions of the operating system onto the
installable partition or volume. After the desired installation is
completed, in some embodiments the host 101 may be configured to
boot from the LUN containing the encapsulated volume.
[0035] FIG. 4 is a flow diagram illustrating aspects of the
operation of a system (such as system 200) supporting off-host
virtualization and emulation of a bootable target device, according
to one embodiment. Off-host virtualizer 210 may be configured to
aggregate storage within physical storage devices 220 into one or
more logical volumes 240 (block 405 of FIG. 4). The logical volumes
240 may be configured to implement a number of different
virtualization functions, such as snapshots or replication.
Off-host virtualizer 210 may then emulate the logical volumes as a
bootable target device 250 (block 415), for example by logically
inserting operating system-specific boot metadata 315 into a
virtual LUN 310 as described above. In some embodiments, as noted
above, a subset of the blocks of the logical volumes and/or the
metadata may be modified to provide data specific to the host being
booted (e.g., a customized boot process may be supported). The
emulated bootable target device may be made accessible to a host
101 (block 425), e.g., by setting the host's target bootable device
address to the address of the virtual LUN 310. The host 101 may
then boot off the emulated bootable target device (block 435), for
example, off a file system or partition resident in the logical
volume (such as a "root" file system in the case of hosts employing
UNIX-based operating systems, or a "system partition" in the case
of Windows operating systems). That is, the virtualizer may emulate
the particular file system or partition expected for booting by the
host as being resident in the logical volume in such
embodiments.
[0036] As noted earlier, the boot process at host 101 may include
several phases. During each successive phase, additional modules of
the host's operating system and/or additional software modules may
be activated, and various system processes and services may be
started. During one such phase, in some embodiments a
virtualization driver or volume manager capable of recognizing and
interacting with logical volumes may be activated at host 101. In
such embodiments, after the virtualization driver or volume manager
is activated, it may be possible for the host to switch to direct
interaction with the logical volumes 240 (block 455 of FIG. 4),
e.g., over network 360, instead of performing I/O to the logical
volumes through the off-host virtualizer 210. Direct interaction
with the logical volumes 240 may support higher levels of
performance than indirect interaction via off-host virtualizer 210,
especially in embodiments where off-host virtualizer 210 has
limited processing capabilities. In order to facilitate a
transition to direct access, off-host virtualizer 210 or some other
volume configuration server may be configured to provide
configuration information (such as volume layouts) related to the
logical volumes 240 to the virtualization driver or volume manager.
Once the transition to direct access occurs, the emulated bootable
target device 250 and the off-host virtualizer 210 may no longer be
used by host 101 until the next time host 101 is rebooted. During
the next reboot, host 101 may switch back to accessing logical
volumes 240 via the emulated bootable target device 250. In later
boot phases, when the virtualization driver or volume manager is
activated, direct access to the logical volumes may be resumed.
Such an ability to transition to direct access to logical volumes
240 may allow off-host virtualizers 180 to be implemented using
relatively low-end processors, since off-host virtualizers may be
utilized heavily only during boot-related operations in system 200,
and boot-related operations may be rare relative to production
application processing operations.
[0037] As noted previously, a number of different virtualization
functions may be implemented at a logical volume 240 by off-host
virtualizer 210 in different embodiments. In one embodiment, a
logical volume 240 may be aggregated from storage from multiple
physical storage devices 220, e.g., by striping successive blocks
of data across multiple physical storage devices, by spanning
multiple physical storage devices (i.e., concatenating physical
storage from multiple physical storage devices into the logical
volume), or by mirroring data blocks at two or more physical
storage devices. In another embodiment, a logical volume 240 that
is used by off-host virtualizer 210 to emulate a bootable target
device 250 may be a replicated volume. For example, the logical
volume 240 may be a replica or copy of a source logical volume that
may be maintained at a remote data center. Such a technique of
replicating bootable volumes may be useful for a variety of
purposes, such as to support off-site backup or to support
consistency of booting and/or installation in distributed
enterprises where hosts at a number of different geographical
locations may be required to be set up with similar configurations.
In some embodiments, a logical volume 240 may be a snapshot volume,
such as an instant snapshot or a space-efficient snapshot, i.e., a
point-in-time copy of some source logical volume. Using snapshot
volumes to boot and/or install systems may support the ability to
revert a host back to any desired previous configuration from among
a set of configurations for which snapshots have been created.
Support for automatic roll back (e.g., to a desired point in time)
on boot may also be implemented in some embodiments. In one
embodiment, a logical volume 240 used to emulate a bootable target
device may be configured as a virtual RAID ("Redundant Array of
Independent Disks") device or RAID volume, where parity based
redundancy computations are implemented to provide high
availability. Physical storage from a plurality of storage servers
may be aggregated to form the RAID volume, and the redundancy
computations may be implemented via a software protocol. A bootable
target device emulated from a RAID volume may be recoverable in the
event of a failure at one of its backing storage servers, thus
enhancing the availability of boot functionality supported by the
off-host virtualizer 210. A number of different RAID levels (e.g.,
RAID-3, RAID-4, or RAID-5) may be implemented in the RAID
volume.
[0038] In some embodiments, a logical volume 240 may include
multiple layers of virtual storage devices. FIG. 5 is a block
diagram illustrating a logical volume 240 comprising a multi-layer
hierarchy of virtual block devices according to one embodiment. In
the illustrated embodiment, logical volume 240 includes logical
block devices 504 and 506. In turn, logical block device 504
includes logical block devices 508 and 510, while logical block
device 506 includes logical block device 512. Logical block devices
508, 510, and 512 map to physical block devices 220A-C of FIG. 2,
respectively.
[0039] After host 101 has booted, logical volume 240 may be
configured to be mounted within a file system or presented to an
application or other volume consumer. Each block device within
logical volume 240 that maps to or includes another block device
may include an interface whereby the mapping or including block
device may interact with the mapped or included device. For
example, this interface may be a software interface whereby data
and commands for block read and write operations is propagated from
lower levels of the virtualization hierarchy to higher levels and
vice versa.
[0040] Additionally, a given block device may be configured to map
the logical block spaces of subordinate block devices into its
logical block space in various ways in order to realize a
particular virtualization function. For example, in one embodiment,
logical volume 240 may be configured as a mirrored volume, in which
a given data block written to logical volume 240 is duplicated, and
each of the multiple copies of the duplicated given data block are
stored in respective block devices. In one such embodiment, logical
volume 240 may be configured to receive an operation to write a
data block from a consumer, such as an application running on host
101. Logical volume 240 may duplicate the write operation and issue
the write operation to both logical block devices 504 and 506, such
that the block is written to both devices. In this context, logical
block devices 504 and 506 may be referred to as mirror devices. In
various embodiments, logical volume 240 may read a given data block
stored in duplicate in logical block devices 504 and 506 by issuing
a read operation to one mirror device or the other, for example by
alternating devices or defaulting to a particular device.
Alternatively, logical volume 240 may issue a read operation to
multiple mirror devices and accept results from the fastest
responder.
[0041] In some embodiments, it may be the case that underlying
physical block devices 220A-C have dissimilar performance
characteristics; specifically, devices 220A-B may be slower than
device 220C. In order to balance the performance of the mirror
devices, in one embodiment, logical block device 504 may be
implemented as a striped device in which data is distributed
between logical block devices 508 and 510. For example, even- and
odd-numbered blocks of logical block device 504 may be mapped to
logical block devices 508 and 510 respectively, each of which may
be configured to map in turn to all or some portion of physical
block devices 220A-B respectively. In such an embodiment, block
read/write throughput may be increased over a non-striped
configuration, as logical block device 504 may be able to read or
write two blocks concurrently instead of one. Numerous striping
arrangements involving various distributions of blocks to logical
block devices are possible and contemplated; such arrangements may
be chosen to optimize for various data usage patterns such as
predominantly sequential or random usage patterns. In another
aspect illustrating multiple layers of block virtualization, in one
embodiment physical block device 220C may employ a different block
size than logical block device 506. In such an embodiment, logical
block device 512 may be configured to translate between the two
physical block sizes and to map the logical block space defined by
logical block device 506 to the physical block space defined by
physical block device 220C.
[0042] The technique of volume tunneling to emulate a bootable
target device may be implemented using a variety of different
storage and network configurations in different embodiments. FIG. 6
is a block diagram illustrating an embodiment where the physical
storage devices include fibre channel LUNs 610 accessible through a
fibre channel fabric 620, and off-host virtualizer 210 includes a
virtualizing switch. A "fibre channel LUN", as used herein, may be
defined as a unit of storage addressable using a fibre channel
address. For example, a fibre channel address for storage
accessible via a fiber channel fabric may consist of a fabric
identifier, a port identifier, and a logical unit identifier. The
virtual LUN presented by off-host virtualizer to host 110 as a
bootable target device 250 in such an embodiment may be a virtual
fibre channel LUN. Fibre channel fabric 620 may include additional
switches in some embodiments, and host 101 may be coupled to more
than one switch. Some of the additional switches may also be
configured to provide virtualization functions. That is, in some
embodiments off-host virtualizer 210 may include a plurality of
cooperating virtualizing switches. In one embodiment, multiple
independently-configurable fibre channel fabrics may be employed:
e.g., a first set of fibre channel LUNs 610 may be accessible
through a first fabric, and a second set of fibre channel LUNs 610
may be accessible through a second fabric.
[0043] FIG. 7 is a block diagram illustrating one embodiment where
the Internet SCSI (iSCSI) protocol is used to access the physical
storage devices. iSCSI is a protocol used by storage initiators
(such as hosts 101 and/or off-host virtualizers 210) to send SCSI
storage commands to storage targets (such as disks or tape devices)
over an IP (Internet Protocol) network. The physical storage
devices accessible in an iSCSI-based storage network may be
addressable as iSCSI LUNs, just as SCSI devices locally attached to
a host may be addressable as SCSI LUNs, and physical storage
devices attached via fibre channel fabrics may be addressable as
fibre channel LUNs. In one embodiment, for example, an iSCSI
address may include an IP address or iSCSI qualified name (iqn), a
target device identifier, and a logical unit number. As shown in
FIG. 7, one or more iSCSI LUNs 710 may be attached directly to the
off-host virtualizer 210. For example, in one embodiment, the
off-host virtualizer 210 may itself be a computer system,
comprising its own processor, memory and physical storage devices
(e.g., iSCSI LUN 710A). The remaining iSCSI LUNs 710B-710N may be
accessible through other hosts or through iSCSI servers. In some
embodiments, all the physical storage devices may be attached
directly to the off-host virtualizer 210 and may be accessible via
iSCSI. In general, a host 101 may require an iSCSI-enabled network
adapter to participate in the iSCSI protocol. In some embodiments
where the physical storage devices include iSCSI LUNs, a network
boot protocol similar to BOOTP (a protocol that is typically used
to allow diskless hosts to boot using boot code provided by a boot
server) may be used to support a first phase boot of a host 101
that does not have an iSCSI-enabled adapter. Additional boot code
loaded during the first phase may allow the host to mount a file
system over iSCSI, and/or to perform further boot phases, despite
the absence of an iSCSI-enabled network card. That is, software
provided to the host 101 during an early boot phase (e.g., by
off-host virtualizer 210) may be used later in the boot process to
emulate iSCSI transactions without utilizing an iSCSI-enabled
network adapter at the host.
[0044] In some embodiments, host 101 may be configured to boot from
an emulated volume using a first network type such as iSCSI, and to
then switch to directly accessing the volume using a second network
type such as fibre channel. iSCSI-based booting may be less
expensive and/or easier to configure than fibre-channel based
booting in some embodiments. An off-host virtualizer 210 that uses
iSCSI (such as an iSCSI boot appliance) and at the same time
accesses fibre-channel based storage devices may allow such a
transition between the network type that is used for booting and
the network type that is used for subsequent I/O (e.g., for I/Os
requested by production applications).
[0045] In one embodiment, illustrated in FIG. 8, physical storage
devices 220 may be accessible via storage servers (e.g., 850A and
850B) configured to communicate with off-host virtualizer 210 and
host 101 using an advanced storage protocol. The advanced storage
protocol may support features, such as access security and tagged
directives for distributed I/O operations that may not be
adequately supported by the traditional storage protocols (such as
SCSI or iSCSI) alone. In such an embodiment, a storage server 850
may translate data access requests from the advanced storage
protocol to a lower level protocol or interface (such as SCSI) that
may be presented by the physical storage devices 220 managed at the
storage server. While the advanced storage protocol may provide
enhanced functionality, it may still allow block-level access to
physical storage devices 220. Storage servers 850 may be any device
capable of supporting the advanced storage protocol, such as a
computer host with one or more processors and one or more
memories.
[0046] FIG. 9 is a block diagram illustrating an embodiment where
some physical storage devices 220 may be accessible via a
target-mode host bus adapter 902. A host bus adapter (HBA) is a
hardware device that acts as an interface between a host 101 and an
I/O interconnect, such as a SCSI bus or fibre channel link.
Typically, an HBA is configured as an "initiator", i.e., a device
that initiates storage operations on the I/O interconnect, and
receives responses from other devices (termed "targets") such as
disks, disk array devices, or tape devices, coupled to the I/O
interconnect. However, some host bus adapters may be configurable
(e.g., by modifying the firmware on the HBA) to operate as targets
rather than initiators, i.e., to receive commands such as iSCSI
commands sent by initiators requesting storage operations. Such
host bus adapters may be termed "target-mode" host bus adapters,
and may be incorporated within off-host virtualizers 210 as shown
in FIG. 9 in some embodiments. The I/O operations corresponding to
the received commands may be performed at the physical storage
devices, and the response returned to the requesting initiator. In
some embodiments, all the physical storage devices 220 used to back
logical volumes 240 may be accessible via target-mode host bus
adapters.
[0047] As noted above, an off-host virtualizer 210 may comprise a
number of different types of hardware and software entities in
different embodiments. In some embodiments, an off-host virtualizer
210 may itself be a host with its own processor, memory, peripheral
devices and I/O devices, running an operating system and a software
stack capable of providing the block virtualization features
described above. In other embodiments, the off-host virtualizer 210
may include one or more virtualization switches and/or
virtualization appliances. A virtualization switch may be an
intelligent fiber channel switch, configured with sufficient
processing capacity to perform desired virtualization operations in
addition to supporting fiber channel connectivity. A virtualization
appliance may be an intelligent device programmed to perform
virtualization functions, such as providing mirroring, striping,
snapshot capabilities, etc. Appliances may differ from general
purpose computers in that their software is normally customized for
the function they perform, pre-loaded by the vendor, and not
alterable by the user. In some embodiments, multiple devices or
systems may cooperate to provide off-host virtualization; e.g.,
multiple cooperating virtualization switches may form a single
off-host virtualizer. In one embodiment, the aggregation of storage
within physical storage devices 220 into logical volumes 240 may be
performed by one off-host virtualizing device or host, while
another off-host virtualizing device may be configured to emulate
the logical volumes as bootable target devices and present the
bootable target devices to host 101.
[0048] FIG. 10 is a block diagram illustrating a computer
accessible medium 1000 including virtualization software 1010
configured to provide the functionality of off-host virtualizer 210
and host 101 described above. Virtualization software 1010 may be
provided to a computer system using a variety of
computer-accessible media including electronic media (e.g., flash
memory), magnetic media such as RAM (e.g., SDRAM, RDRAM, SRAM,
etc.), optical storage media such as CD-ROM, etc., as well as
transmission media or signals such as electrical, electromagnetic
or digital signals, conveyed via a communication medium such as a
network and/or a wireless link.
[0049] Although the embodiments above have been described in
considerable detail, numerous variations and modifications will
become apparent to those skilled in the art once the above
disclosure is fully appreciated. It is intended that the following
claims be interpreted to embrace all such variations and
modifications.
* * * * *