U.S. patent application number 11/128699 was filed with the patent office on 2007-01-04 for function-level just-in-time translation engine with multiple pass optimization.
This patent application is currently assigned to Microsoft Corporation. Invention is credited to Victor Tan.
Application Number | 20070006178 11/128699 |
Document ID | / |
Family ID | 37431763 |
Filed Date | 2007-01-04 |
United States Patent
Application |
20070006178 |
Kind Code |
A1 |
Tan; Victor |
January 4, 2007 |
Function-level just-in-time translation engine with multiple pass
optimization
Abstract
A JIT binary translator translates code at a function level of
the source code rather than at an opcode level. The JIT binary
translator of the invention grabs an entire x86 function out of the
source stream, rather than an instruction, translates the whole
function into an equivalent function of the target processor, and
executes that function all at once before returning to the source
stream, thereby reducing context switching. Also, since the JIT
binary translator sees the entire source code function context at
once the software emulator may optimize the code translation. For
example, the JIT binary translator might decide to translate a
sequence of x86 instructions into an efficient PPC equivalent
sequence. Many such optimizations result in a tighter emulated
binary.
Inventors: |
Tan; Victor; (Kirkland,
WA) |
Correspondence
Address: |
WOODCOCK WASHBURN LLP (MICROSOFT CORPORATION)
ONE LIBERTY PLACE - 46TH FLOOR
PHILADELPHIA
PA
19103
US
|
Assignee: |
Microsoft Corporation
Redmond
WA
|
Family ID: |
37431763 |
Appl. No.: |
11/128699 |
Filed: |
May 12, 2005 |
Current U.S.
Class: |
717/136 |
Current CPC
Class: |
G06F 8/52 20130101; G06F
9/45554 20130101; G06F 9/45516 20130101 |
Class at
Publication: |
717/136 |
International
Class: |
G06F 9/45 20060101
G06F009/45 |
Claims
1. A method of translating computer executable code of a first CPU
type to computer executable code of a second CPU type, comprising:
parsing a stream of said computer executable code of said first CPU
type to identify a sequence of CPU code instructions in said stream
of said computer executable code of said first CPU type that
corresponds to a function in said computer executable code of said
first CPU type; and generating a sequence of said executable code
of said second CPU type from said sequence of CPU code instructions
in said stream corresponding to said function.
2. A method as in claim 1, wherein said first CPU type is x86 and
said second CPU type is PowerPC.
3. A method as in claim 1, wherein said parsing step comprises the
step of instructing a compiler to create a list of instructions of
said first CPU type starting at the beginning of a function within
said stream of said computer executable code of said first CPU type
and ending said list of instructions of said first CPU type at a
point in the stream of said computer executable code of said first
CPU type when an end of function instruction is reached and there
are no outstanding condition branches in said list of instructions
of said first CPU type.
4. A method as in claim 3, comprising the further steps of
analyzing said list of instructions to find optimizations and
implementing said optimizations prior to said generating step.
5. A method as in claim 4, comprising the further steps of
analyzing said generated sequence of executable code of said second
CPU type to find optimizations and implementing said
optimizations.
6. A method as in claim 3, comprising the further steps of
compiling and storing said sequence of said executable code of said
second CPU type, and correlating a memory address at which said
compiled sequence is stored with a memory address of said beginning
of said function of said first CPU type.
7. A binary translation system that translates computer executable
code of a first CPU type to computer executable code of a second
CPU type, comprising: a parser that parses a stream of said
computer executable code of said first CPU type to identify a
sequence of CPU code instructions in said stream of said computer
executable code of said first CPU type that corresponds to a
function in said computer executable code of said first CPU type;
and code generator that generates a sequence of said executable
code of said second CPU type from said sequence of CPU code
instructions in said stream corresponding to said function.
8. A binary translation system as in claim 7, wherein said first
CPU type is x86 and said second CPU type is PowerPC.
9. A binary translation system as in claim 7, wherein said parser
creates a list of instructions of said first CPU type starting at
the beginning of a function within said stream of said computer
executable code of said first CPU type and ends said list of
instructions of said first CPU type at a point in the stream of
said computer executable code of said first CPU type when an end of
function instruction is reached and there are no outstanding
condition branches in said list of instructions of said first CPU
type.
10. A binary translation system as in claim 9, further comprising
an optimizer that analyzes said list of instructions to find
optimizations and implements said optimizations prior to providing
said list of instructions to said code generator.
11. A binary translation system as in claim 10, further comprising
a second optimizer that analyzes said generated sequence of
executable code of said second CPU type to find optimizations and
implements said optimizations.
12. A binary translation system as in claim 9, further comprising a
compiler that compiles and stores said sequence of said executable
code of said second CPU type.
13. A binary translation system as in claim 12, further comprising
a table for storing a memory address at which said compiled
sequence is stored and a memory address of said beginning of said
function of said first CPU type, said table correlating said memory
addresses with each other.
14. A computer readable medium that when inserted into a host
computer system creates a binary translation system that translates
computer executable code of a first CPU type to computer executable
code of a second CPU type, comprising: parser software that parses
a stream of said computer executable code of said first CPU type to
identify a sequence of CPU code instructions in said stream of said
computer executable code of said first CPU type that corresponds to
a function in said computer executable code of said first CPU type;
and code generator software that generates a sequence of said
executable code of said second CPU type from said sequence of CPU
code instructions in said stream corresponding to said
function.
15. A computer readable medium as in claim 14, wherein said first
CPU type is x86 and said second CPU type is PowerPC.
16. A computer readable medium as in claim 14, wherein said parser
software creates a list of instructions of said first CPU type
starting at the beginning of a function within said stream of said
computer executable code of said first CPU type and ends said list
of instructions of said first CPU type at a point in the stream of
said computer executable code of said first CPU type when an end of
function instruction is reached and there are no outstanding
condition branches in said list of instructions of said first CPU
type.
17. A computer readable medium as in claim 16, further comprising
optimizer software that analyzes said list of instructions to find
optimizations and implements said optimizations prior to providing
said list of instructions to said code generator software.
18. A computer readable medium as in claim 17, further comprising
second optimizer software that analyzes said generated sequence of
executable code of said second CPU type to find optimizations and
implements said optimizations.
19. A computer readable medium as in claim 16, further comprising a
compiler that compiles and stores said sequence of said executable
code of said second CPU type.
20. A computer readable medium as in claim 19, further comprising a
table that stores a memory address at which said compiled sequence
is stored and a memory address of said beginning of said function
of said first CPU type, said table correlating said memory
addresses with each other.
Description
FIELD OF THE INVENTION
[0001] The invention is directed to systems and methods for
virtualizing a legacy hardware environment in a host hardware
environment by converting code used by the legacy computer system
into code for execution by the host computer system and, more
particularly, the invention is directed to a just-in-time
translation engine that performs code translations at a function
level rather than at an instruction level and that optimizes the
resulting code by translating sequences of the legacy code
instructions into a corresponding sequence of host code
instructions.
BACKGROUND OF THE INVENTION
[0002] When updating hardware architectures of computer systems
such as game consoles to implement faster, more feature rich
hardware, developers are faced with the issue of backwards
compatibility to the legacy computer system for application
programs or games developed for the legacy computer system
platform. In particular, it is commercially desirable that the
updated hardware architecture support application programs or games
developed for the legacy hardware architecture. However, if the
updated hardware architecture differs substantially, or radically,
from that of the legacy hardware architecture, architectural
differences between the two systems may make it very difficult, or
even impossible, for legacy application programs or games to
operate on the new hardware architecture without substantial
hardware modification and/or software patches. Since customers
generally expect such backwards compatibility, a solution to these
problems is critical to the success of the updated hardware
architecture.
[0003] Recent advances in PC architecture and software emulation
have provided hardware architectures for computers, even game
consoles, that are powerful enough to enable the emulation of
legacy application programs or games in software rather than
hardware. Such software emulators translate the title instructions
for the application program or game on the fly into device
instructions understandable by the new hardware architecture. This
software emulation approach is particularly useful for backwards
compatibility for computer game consoles since the developer of the
game console maintains control over both the hardware and software
platforms and is quite familiar with the legacy games.
[0004] Most such software emulators translate code one CPU
instruction at a time. For example, a software emulator might pull
a single x86 instruction out of the source stream, translate it on
the fly to one or more pre-defined equivalents out of the
instruction set of the target processor (e.g., PowerPC (PPC)),
execute those PPC instructions on the target processor, and then
return to the source stream for the next instruction. This approach
is conceptually simple, but it has drawbacks. For example, this
approach involves many slow context switches back and forth between
the software emulator and the virtual machine (VM) implementing the
legacy application or game system written using the x86 instruction
set. This approach also robs the software emulator of any context
when translating instructions and forces the software emulator to
rely on simple instruction-mapping tables. This is a significant
performance disadvantage, for if the software emulator were able to
consider the instructions in context, then the software emulator
would be able to translate code blocks rather than instruction by
instruction, thereby significantly improving the translation
performance.
[0005] Accordingly, a technique is desired that improves the
performance of the instruction translation by providing a mechanism
for the instructions that are to be translated to be considered in
context. The present invention addresses this need in the art.
SUMMARY OF THE INVENTION
[0006] The invention addresses the above-mentioned need in the art
by translating code at a function level of the source code rather
than an opcode level. The software emulator of the invention grabs
an entire x86 function out of the source stream, translates the
whole function into an equivalent function of the target processor,
and executes that function all at once before returning to the
source stream. Not only does this technique reduce context
switching, but by seeing the entire x86 function context at once
the software emulator may optimize the code translation. For
example, the software emulator might decide to translate a sequence
of x86 instructions into an efficient PPC equivalent sequence. Many
such optimizations result in a tighter emulated binary, which is
particularly desirable for any software emulator, particularly game
emulators that must run code quickly.
[0007] Those skilled in the art will appreciate that, while an
exemplary embodiment of the invention is implemented in the Xbox
computer game system available from Microsoft Corporation, any
computer game console or other type of computer system in which
code translation is used could benefit from the function-level code
translation technique of the invention. Additional characteristics
of the invention will be apparent to those skilled in the art based
on the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The systems and methods for providing function-level
just-in-time code translation with multi-pass optimization in
accordance with the invention are further described with reference
to the accompanying drawings, in which:
[0009] FIG. 1A is a block diagram representing the logical layering
of the hardware and software architecture for an emulated operating
environment in a computer system;
[0010] FIG. 1B is a block diagram representing a virtualized
computing system wherein the emulation is performed by the host
operating system (either directly or via a hypervisor);
[0011] FIG. 1C is a block diagram representing an alternative
virtualized computing system wherein the emulation is performed by
a virtual machine monitor running side-by-side with a host
operating system;
[0012] FIG. 2 illustrates the relationship between the virtual
memory of the legacy game system implemented in a virtual machine
and the virtual memory of the host game system.
[0013] FIG. 3 illustrates a system for converting x86 code from the
legacy game system implemented in the virtual machine to PPC code
of the host game system using the techniques of the invention.
[0014] FIG. 4 illustrates a flow chart of the operation of the JIT
binary translator of the invention.
[0015] FIG. 5A is a block diagram representing an exemplary network
environment having a variety of computing devices in which the
invention may be implemented; and
[0016] FIG. 5B is a block diagram representing an exemplary
non-limiting host computing device in which the invention may be
implemented.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
Overview
[0017] The invention provides a system and method for translating
code at a function level of the source code rather than an opcode
level. The software emulator of the invention grabs an entire x86
function out of the source stream, rather than an instruction,
translates the whole function into an equivalent function of the
target processor, and executes that function all at once before
returning to the source stream, thereby reducing context switching.
Also, since the software emulator sees the entire source code
function context at once the software emulator may optimize the
code translation. For example, the software emulator might decide
to translate a sequence of x86 instructions into an efficient PPC
equivalent sequence. Many such optimizations result in a tighter
emulated binary.
[0018] Other more detailed aspects of the invention are described
below, but first, the following description provides a general
overview of and some common vocabulary for virtual machines,
emulators, and associated terminology as the terms have come to be
known in connection with operating systems and host processor
("CPU") virtualization techniques. In doing so, a set of vocabulary
is set forth that one of ordinary skill in the art may find useful
for the description that follows of the apparatus, systems and
methods for translating code at a function level of the source code
in accordance with the techniques of the invention.
Overview of Virtual Machines
[0019] Computers include general purpose central processing units
(CPUs) or "processors" that are designed to execute a specific set
of system instructions. A group of processors that have similar
architecture or design specifications may be considered to be
members of the same processor family. Examples of current processor
families include the Motorola 680X0 processor family, manufactured
by Motorola, Inc. of Phoenix, Ariz.; the Intel 80.times.86
processor family, manufactured by Intel Corporation of Sunnyvale,
Calif.; and the PowerPC processor family, which is manufactured by
International Business Machines (IBM) or Motorola, Inc. and used in
computers manufactured by Apple Computer, Inc. of Cupertino, Calif.
Although a group of processors may be in the same family because of
their similar architecture and design considerations, processors
may vary widely within a family according to their clock speed and
other performance parameters.
[0020] Each family of microprocessors executes instructions that
are unique to the processor family. The collective set of
instructions that a processor or family of processors can execute
is known as the processor's instruction set. As an example, the
instruction set used by the Intel 80.times.86 processor family is
incompatible with the instruction set used by the PowerPC processor
family. The Intel 80.times.86 instruction set is based on the
Complex Instruction Set Computer (CISC) format, while the Motorola
PowerPC instruction set is based on the Reduced Instruction Set
Computer (RISC) format. CISC processors use a large number of
instructions, some of which can perform rather complicated
functions, but which generally require many clock cycles to
execute. RISC processors, on the other hand, use a smaller number
of available instructions to perform a simpler set of functions
that are executed at a much higher rate.
[0021] The uniqueness of the processor family among computer
systems also typically results in incompatibility among the other
elements of hardware architecture of the computer systems. A
computer system manufactured with a processor from the Intel
80.times.86 processor family will have a hardware architecture that
is different from the hardware architecture of a computer system
manufactured with a processor from the PowerPC processor family.
Because of the uniqueness of the processor instruction set and a
computer system's hardware architecture, application software
programs are typically written to run on a particular computer
system running a particular operating system.
[0022] Generally speaking, computer manufacturers try to maximize
their market share by having more rather than fewer applications
run on the microprocessor family associated with the computer
manufacturers' product line. To expand the number of operating
systems and application programs that can run on a computer system,
a field of technology has developed in which a given computer
having one type of CPU, called a host, will include a virtualizer
program that allows the host computer to emulate the instructions
of an unrelated type of CPU, called a guest. Thus, the host
computer will execute an application that will cause one or more
host instructions to be called in response to a given guest
instruction, and in this way the host computer can both run
software designed for its own hardware architecture and software
written for computers having an unrelated hardware
architecture.
[0023] As a more specific example, a computer system manufactured
by Apple Computer, for example, may run operating systems and
programs written for PC-based computer systems. It may also be
possible to use virtualizer programs to execute concurrently on a
single CPU multiple incompatible operating systems. In this latter
arrangement, although each operating system is incompatible with
the other, virtualizer programs can host each of the several
operating systems and thereby allowing the otherwise incompatible
operating systems to run concurrently on the same host computer
system.
[0024] When a guest computer system is emulated on a host computer
system, the guest computer system is said to be a "virtual machine"
as the guest computer system only exists in the host computer
system as a pure software representation of the operation of one
specific hardware architecture. Thus, an operating system running
inside virtual machine software such as Microsoft's Virtual PC may
be referred to as a "guest" and/or a "virtual machine," while the
operating system running the virtual machine software may be
referred to as the "host." Similarly, the operating system in a
legacy game system running inside virtual machine or emulation
software inside a new game system may be referred to as the
"guest," while the operating system of the new game system running
the virtual machine or emulation software may be referred to as the
"host." The terms virtualizer, emulator, direct-executor, virtual
machine, and processor emulation are sometimes used interchangeably
to denote the ability to mimic or emulate the hardware architecture
of an entire computer system using one or several approaches known
and appreciated by those of skill in the art. Moreover, all uses of
the term "emulation" in any form is intended to convey this broad
meaning and is not intended to distinguish between instruction
execution concepts of emulation versus direct-execution of
operating system instructions in the virtual machine. Thus, for
example, Virtual PC software available from Microsoft Corporation
"emulates" (by instruction execution emulation and/or direct
execution) an entire computer that includes an Intel 80.times.86
Pentium processor and various motherboard components and cards, and
the operation of these components is "emulated" in the virtual
machine that is being run on the host machine. A virtualizer
program executing on the operating system software and hardware
architecture of the host computer, such as a computer system having
a PowerPC processor, mimics the operation of the entire guest
computer system.
[0025] The general case of virtualization allows one processor
architecture to run OSes and programs from other processor
architectures (e.g., PowerPC Mac programs on x86 Windows, and vice
versa), but an important special case is when the underlying
processor architectures are the same (run various versions of x86
Linux or different versions of x86 Windows on x86). In this latter
case, there is the potential to execute the Guest OS and its
applications more efficiently since the underlying instruction set
is the same. In such a case, the guest instructions are allowed to
execute directly on the processor without losing control or leaving
the system open to attack (i.e., the Guest OS is sandboxed). This
is where the separation of privileged versus non-privileged and the
techniques for controlling access to memory comes into play. For
virtualization where there is an architectural mismatch (PowerPC
<->x86), two approaches conventionally have been used:
instruction-by-instruction emulation (relatively slow) or
translation from the guest instruction set to the native
instruction set (more efficient, but uses the translation step). If
instruction emulation is used, then it is relatively easy to make
the environment robust; however, if translation is used, then it
maps back to the special case where the processor architectures are
the same.
[0026] In accordance with the invention, the guest operating system
is virtualized and thus an exemplary scenario in accordance with
the invention would be emulation of a Windows95.RTM.,
Windows98.RTM., Windows 3.1, or Windows NT 4.0 operating system on
a Virtual Server or an Xbox operating system on an Xbox game
console available from Microsoft Corporation. In various
embodiments, the invention thus describes systems and methods for
controlling guest access to some or all of the underlying physical
resources (memory, devices, etc.) of the host computer.
[0027] The virtualizer program acts as the interchange between the
hardware architecture of the host machine and the instructions
transmitted by the software (e.g., operating systems, applications,
etc.) running within the emulated environment. This virtualizer
program may be a host operating system (HOS), which is an operating
system running directly on the physical computer hardware (and
which may comprise a hypervisor). Alternately, the emulated
environment might also be a virtual machine monitor (VMM) which is
a software layer that runs directly above the hardware, perhaps
running side-by-side and working in conjunction with the host
operating system, and which can virtualize all the resources of the
host machine (as well as certain virtual resources) by exposing
interfaces that are the same as the hardware the VMM is
virtualizing. This virtualization enables the virtualizer (as well
as the host computer system itself) to go unnoticed by operating
system layers running above it.
[0028] Processor emulation thus enables a guest operating system to
execute on a virtual machine created by a virtualizer running on a
host computer system comprising both physical hardware and a host
operating system.
[0029] From a conceptual perspective, computer systems generally
comprise one or more layers of software running on a foundational
layer of hardware. This layering is done for reasons of
abstraction. By defining the interface for a given layer of
software, that layer can be implemented differently by other layers
above it. In a well-designed computer system, each layer only knows
about (and only relies upon) the immediate layer beneath it. This
allows a layer or a "stack" (multiple adjoining layers) to be
replaced without negatively impacting the layers above said layer
or stack. For example, software applications (upper layers)
typically rely on lower levels of the operating system (lower
layers) to write files to some form of permanent storage, and these
applications do not need to understand the difference between
writing data to a floppy disk, a hard drive, or a network folder.
If this lower layer is replaced with new operating system
components for writing files, the operation of the upper layer
software applications remains unaffected.
[0030] The flexibility of layered software allows a virtual machine
(VM) to present a virtual hardware layer that is in fact another
software layer. In this way, a VM can create the illusion for the
software layers above it that the software layers are running on
their own private computer system, and thus VMs can allow multiple
"guest systems" to run concurrently on a single "host system." This
level of abstraction is represented by the illustration of FIG.
1A.
[0031] FIG. 1A is a diagram representing the logical layering of
the hardware and software architecture for an emulated operating
environment in a computer system. In the figure, an emulation
program 54 runs directly or indirectly on the physical hardware
architecture 52. Emulation program 54 may be (a) a virtual machine
monitor that runs alongside a host operating system, (b) a
specialized host operating system having native emulation
capabilities, or (c) a host operating system with a hypervisor
component wherein the hypervisor component performs the emulation.
Emulation program 54 emulates a guest hardware architecture 56
(shown as broken lines to illustrate the fact that this component
is the "virtual machine," that is, hardware that does not actually
exist but is instead emulated by said emulation program 54). A
guest operating system 58 executes on the guest hardware
architecture 56, and software application 60 runs on the guest
operating system 58. In the emulated operating environment of FIG.
1A--and because of the operation of emulation program 54--software
application 60 may run in computer system 50 even if software
application 60 is designed to run on an operating system that is
generally incompatible with the host operating system and hardware
architecture 52.
[0032] FIG. 1B illustrates a virtualized computing system
comprising a host operating system software layer 64 running
directly above physical computer hardware 62 where the host
operating system (host OS) 64 provides access to the resources of
the physical computer hardware 62 by exposing interfaces that are
the same as the hardware the host OS is emulating (or
"virtualizing")--which, in turn, enables the host OS 64 to go
unnoticed by operating system layers running above it. Again, to
perform the emulation the host OS 64 may be a specially designed
operating system with native emulations capabilities or,
alternately, it may be a standard operating system with an
incorporated hypervisor component for performing the emulation (not
shown).
[0033] As shown in FIG. 1B, above the host OS 64 are two virtual
machine (VM) implementations, VM A 66, which may be, for example, a
virtualized Intel 386 processor, and VM B 68, which may be, for
example, a virtualized version of one of the Motorola 680.times.0
family of processors. Above each VM 66 and 68 are guest operating
systems (guest OSes) A 70 and B 72 respectively. Running above
guest OS A 70 are two applications, application A1 74 and
application A2 76, and running above guest OS B 72 is application
B1 78.
[0034] In regard to FIG. 1B, it is important to note that VM A 66
and VM B 68 (which are shown in broken lines) are virtualized
computer hardware representations that exist only as software
constructions and which are made possible due to the execution of
specialized emulation software(s) that not only presents VM A 66
and VM B 68 to Guest OS A 70 and Guest OS B 72 respectively, but
which also performs all of the software steps necessary for Guest
OS A 70 and Guest OS B 72 to indirectly interact with the real
physical computer hardware 62.
[0035] FIG. 1C illustrates an alternative virtualized computing
system wherein the emulation is performed by a virtual machine
monitor (VMM) 64' running alongside the host operating system 64''.
For certain embodiments the VMM 64' may be an application running
above the host operating system 64'' and interacting with the
physical computer hardware 62 only through the host operating
system 64''. In other embodiments, and as shown in FIG. 1C, the VMM
64' may instead comprise a partially independent software system
that on some levels interacts indirectly with the computer hardware
62 via the host operating system 64'' but on other levels the VMM
64' interacts directly with the computer hardware 62 (similar to
the way the host operating system interacts directly with the
computer hardware). And in yet other embodiments, the VMM 64' may
comprise a fully independent software system that on all levels
interacts directly with the computer hardware 62 (similar to the
way the host operating system 64'' interacts directly with the
computer hardware 62) without utilizing the host operating system
64'' (although still interacting with said host operating system
64'' insofar as coordinating use of the computer hardware 62 and
avoiding conflicts and the like).
[0036] All of these variations for implementing the virtual machine
are anticipated to form alternative embodiments of the invention as
described herein, and nothing herein should be interpreted as
limiting the invention to any particular emulation embodiment. In
addition, any reference to interaction between applications 74, 76,
and 78 via VM A 66 and/or VM B 68 respectively (presumably in a
hardware emulation scenario) should be interpreted to be in fact an
interaction between the applications 74, 76, and 78 and the
virtualizer that has created the virtualization. Likewise, any
reference to interaction between applications VM A 66 and/or VM B
68 with the host operating system 64 and/or the computer hardware
62 (presumably to execute computer instructions directly or
indirectly on the computer hardware 62) should be interpreted to be
in fact an interaction between the virtualizer that has created the
virtualization and the host operating system 64 and/or the computer
hardware 62 as appropriate.
Function-Level Just-in-Time Translation Engine with Multiple Pass
Optimization
[0037] The present invention relates to features of a system that
uses a software emulator to virtualize a legacy game system
platform, such as Xbox, on a host game system platform that is an
upgrade of the legacy game system platform. The software emulator
enables the host game system platform to run legacy games in a
seamless fashion. As noted above, the present invention provides a
software emulator with a just-in-time translation engine that
translates the code at a function level and optimizes the
translation so as to improve code translation efficiency. The
techniques of the invention will be described below with respect to
FIGS. 2-4.
[0038] In accordance with the invention, when the media loader of
the host game system console receives media containing a legacy
computer game and is asked by the operating system of the host game
system to boot the legacy computer game, the media loader instead
invokes the software emulator of the invention to provide backwards
compatibility for the operation of the legacy computer game. The
software emulator loads and runs the legacy computer game as a
standard game with the same rights and restrictions as any native
computer game of the host game system. At boot time, the software
emulator requests that two physical memory chunks be reserved: a 64
MB segment to host the virtualized legacy computer game, and a 64
MB segment to provide a conduit between the virtual machine that
implements the legacy computer game and host computer game
system.
[0039] FIG. 2 illustrates the relationship between the virtual
memory of the legacy game system implemented in a virtual machine
and the virtual memory of the host game system. In this example,
the legacy game system is assumed to be Xbox, available from
Microsoft Corporation. As illustrated, the legacy Xbox game system
is implemented in a virtual machine environment and assumes a
virtual address space 80 of 4 GB is available. As illustrated, the
legacy 4 GB virtual address space is assumed by the legacy Xbox
game system to have a section of memory 82 dedicated to the virtual
title of the inserted legacy game, a memory 84 dedicated to the
virtual legacy Xbox kernel, a 64 MB shared memory 86 that maps
directly to a 64 MB shared memory in a physical RAM 88 of the host
game system, and a virtual MMIO address space 90 in the upper
region of the 4 GB virtual address space. Those skilled in the art
will appreciate that the MMIO address space 90 in the legacy Xbox
game system contains pointers to the actual hardware devices that
are called by the drivers of the Xbox game system console's
operating system. The virtual address space accessed by the legacy
Xbox game as implemented in the virtual machine environment is
configured the same as the virtual address space in the native
legacy Xbox game system environment, thus tricking the legacy Xbox
game into thinking that it is operating in the native legacy Xbox
game system environment.
[0040] On the other hand, the virtual address space 92 of the
native host Xbox game system is characterized by an emulator binary
memory 94, the native host Xbox kernel 96, and a 64 MB physical
memory segment 98 that hosts the legacy Xbox virtual machine. A 64
MB shared memory 100 is also provided that maps directly to the 64
MB shared memory in the physical RAM 88 of the native host Xbox
game system. As will be explained in more detail below with respect
to FIG. 3, a recreated copy of the x86 Xbox kernel 84 as well as
the x86 title binaries originally passed to the game loader are
loaded in the 64 MB space 98 reserved to the virtual Xbox game
system. In the 64 MB shared memory space 100, on the other hand,
the native host Xbox game system loads its dispatcher program,
loads certain hand-optimized "glue" functions, and creates
structures for virtual machine (VM) state and the translated code
cache (FIG. 3). These functions are shared with the legacy Xbox
game running on the virtual machine via shared memory 88, which is
actually a physically shared section of RAM accessible to both the
virtual machine implementing the legacy Xbox and the emulator
engine of the native host Xbox operating system.
[0041] FIG. 3 illustrates a software emulation system for
converting x86 code from the legacy game system implemented in the
virtual machine to PPC code of the host game system using the
techniques of the invention. As illustrated, the software emulation
system of the invention includes four major components:
[0042] a just-in-time (JIT) binary translator 102 that provides
just-in-time binary translation of x86 code of the legacy Xbox game
system to PPC code or other processor code of the native host Xbox
game system;
[0043] a legacy Xbox virtual machine (VM) 104 that recreates most
of the legacy Xbox environment in reproduced x86 Xbox kernel 106
and untranslated title code store 108 and the legacy title
environment in stored title resources and state store 110;
[0044] a shared memory 88 that permits communication between the
operating system of the native host Xbox game system and the VM 104
and hosts the dispatcher 112 and the translated code cache 114
while tracking VM state 116; and
[0045] an Xbox exception handler 118 that emulates the hardware
devices of the native host Xbox system using device emulation 120
on the native Xbox kernel 122 for use by the Xbox VM 104 while
running a legacy Xbox game.
[0046] After initialization of a legacy Xbox game in the legacy
Xbox virtual machine 104, the operating system of the native host
Xbox game system passes control to the dispatcher 112, which
resides in the shared memory space 88. Fundamentally, the
dispatcher 112 directs code execution for the virtualized legacy
Xbox game. It maintains a mapping in a hash table between every x86
function referenced in the x86 space and an equivalent, translated
PPC (or other host processor) function in the translated code cache
114. The job of the dispatcher 112 is to chain translated PPC (or
other host processor) functions together in the sequence expected
by the virtualized x86 legacy Xbox title. The first task of
dispatcher 112 is to simulate booting the legacy x86 Xbox kernel
106 and legacy x86 title in title memory 110. If the host OS of the
native host Xbox game system performs no significant
pre-translation of emulated binaries, at first the dispatcher 112
has no cached PPC (or other host processor) equivalents for the
requested x86 functions. To fill these gaps, the dispatcher 112
calls to the JIT binary translator 102 for just-in-time function
translation.
[0047] Those skilled in the art will appreciate that translating
x86 code to PPC code, for example, is problematic in some respects.
For one thing, the x86 ISA contains several complex functions with
no simple PPC ISA equivalents. For another, the PPC processor of
the native host Xbox game system may be configured to interpret
data as Big-Endian, whereas legacy Xbox titles expect Little-Endian
interpretation. In addition, naive translation of legacy Xbox x86
code can result in a huge magnification of instructions and cache
misses on the native host Xbox system hardware. The JIT binary
translator of the invention takes steps to mitigate this
"translation bloat" as will be described below.
[0048] As illustrated in FIG. 3, the JIT binary translator of the
invention is implemented in five stages (102a, 102b, 102c, 102d,
102e), each of which will be described in turn.
[0049] Step 1: x86 Fetch and Parse. In step 102a, the JIT binary
translator 102 is invoked by the dispatcher 112 and handed an
extended instruction pointer (EIP) 112b referencing x86 code in the
4 GB address space 80 of the virtual machine 104. In this first
stage of binary translation, an address translation is performed to
locate the corresponding memory address in the software emulator's
own 4 GB virtual address space 92. The software emulator then
parses the x86 function op-codes from the 4 GB address space 80
into a structure corresponding to the x86 code function. If the
function should prove to be larger than the pre-allocated structure
space in the virtual address space 92, then the JIT binary
translator 102 will halt execution.
[0050] Step 2: x86 Code Optimization. Once the JIT binary
translator 102 has loaded its target x86 function, it performs some
initial optimizations in step 102b. Sequences of x86 code known to
create PPC inefficiencies are flagged for future reference. For
example, the optimizer makes a note of non-volatile store/load
operations that do not require endian byte reversal.
[0051] Step 3: PPC Descriptor Generation. The optimizer hands its
product to the JIT middle tier at step 102c, which performs a naive
translation of the optimized x86 instructions into corresponding
groups of PPC instructions. Typically, a single x86 instruction
corresponds to multiple PPC instructions. Very complicated x86
instructions such as fsin are replaced by hand-coded PPC "glue"
functions stored in the shared memory 88.
[0052] Step 4: PPC Binary Executable Optimization. In step 102d,
the PPC binary executable (BE) optimizer takes the sequence of PPC
instructions generated at step 102c and attempts to reduce the
instruction count, cycle count, and likely cache miss rate as much
as possible. Any "translation bloat" remaining in the PPC code
after this stage can only be compensated by the speed of the CPU of
the host computer system.
[0053] Step 5: PPC Compilation and Store. Lastly, in step 102e the
JIT binary translator 102 maps the PPC descriptions into 32-bit PPC
machine instructions. The entire translated function is stored in
the translated code cache 114 in the shared memory 88, and the
starting address of the function is stored as an instruction
address register (IAR) 112a next to the original EIP 112b in a hash
table of the dispatcher 112. This allows the software emulator to
remember the mapping of input code blocks to translated code blocks
so that recompiling the same code block can be avoided by checking
the hash table of the dispatcher 112 before calling the JIT binary
translator 102. Control is then ceded by the software emulator and
the thread returns to the virtual machine 104.
[0054] When the virtual machine 104 resumes, the dispatcher 112
once again tries to map its desired EIP to an IAR. This time, the
lookup is successful, and the dispatcher 112 jumps code execution
to the named IAR. The desired PPC function corresponding to the one
or more x86 instructions in the legacy Xbox command sequence
executes, operating on resources within the 4 GB memory space of
the legacy Xbox virtual machine (104). When the legacy Xbox virtual
machine completes processing of the desired PPC function, control
jumps back to the dispatcher 112 by way of an interrupt with a
request for the next x86 function and the entire JIT binary
translation cycle begins again. Since computer games are generally
coded as enormous loops, after the initial few seconds of
execution, most x86 functions have been translated and are present
in the translated code cache 114 as optimized PPC code (or other
processor code if the native host Xbox game system uses a different
processor).
[0055] Those skilled in the art will appreciate that the JIT binary
translator 102 is a just-in-time compiler that will not translate
x86 functions into PPC code until the very moment those functions
are needed. The techniques of the invention are designed to prevent
perceived delays when the JIT binary translator 102 encounters a
large function for the first time. A couple of options may be
considered to address this problem:
[0056] Pre-compile larger functions in the binary. The software
emulator could spend some time before booting the application
program or game to identify problematic functions and compile them
before game play begins. This would eliminate the perceived jitter,
but would also mean longer boot delays.
[0057] Perform a two-stage compilation of some functions. The JIT
binary translator 102 could skip performance optimizations for some
functions in order to get them running more quickly. Another thread
running on a secondary CPU could optimize the code in good time and
then replace the op-codes in the code cache.
[0058] Device requests and system calls by the legacy Xbox game
create exceptions when the virtualized legacy Xbox game wants to
speak to the legacy Xbox hardware but is unaware that it is
operating on the platform of the native host Xbox game system. As
with many operating systems, in the legacy Xbox operating system,
games communicate with most devices by writing to well-known Memory
Mapped I/O (MMIO) locations. As illustrated in FIG. 2, these MMIO
locations were, in the case of the Xbox operating system, in the
upper region 90 of the 4 GB virtual memory space. As described in
U.S. Patent Application No. (Microsoft Docket No. 312634.01), also
assigned to the present assignee and incorporated herein by
reference, an access control list (ACL) may be used to restrict
and/or reduce page permissions (e.g., to read only or to no read or
write) such that the virtual machine 104 implementing the legacy
Xbox game lacks read and write privileges to these MMIO addresses
in memory 90. As a result, when the legacy Xbox game running in the
virtual machine 104 attempts to access its expected device memory
90, the host Xbox operating system detects invalid Xbox MMIO device
addresses at 126 and halts the thread. A memory access violation
message is sent to the hypervisor 128 which, in turn, passes VM
state information to the Xbox exception handler 118 to resolve the
memory access violation.
[0059] The memory access violation and any intentional system calls
forwarded to the Xbox exception handler 118 by the hypervisor 128
are processed to determine the intended target device using the
MMIO address provided in the MMIO write from the legacy Xbox game.
Since memory access violations often indicate a virtual device
request, the Xbox exception handler 118 may simply check the
virtual machine state provided by the hypervisor 128 (from VM state
register 116) and determine the intended target device. Control is
then given to an appropriate Xbox device emulator 120 in the Xbox
exception handler 118, which translates and relays the request of
the virtual machine 104 to the appropriate functions of the Xbox
kernel 122 or to native host Xbox libraries. Since it cannot be
assumed that the native host Xbox system shares any hardware with
the legacy Xbox system, simple instruction forwarding is not an
option. Of course, if hardware is shared, then instruction
forwarding may be used.
[0060] As illustrated in FIG. 3, some native hardware requests to
Xbox physical devices 124, such as hard drive I/O, produce
asynchronous callbacks in the form of device interrupts 130. When
the native host Xbox kernel 122 receives such an interrupt, it
halts the JIT binary translator 102 and supplies the interrupt data
to an appropriate Xbox device emulator 120 in the Xbox exception
handler 118 that, in turn, translates the reply and stores it in
the shared memory space 88. Control is then returned to the virtual
machine 104 by simulating a legacy Xbox interrupt so that the
virtual machine 104 may handle the new data.
[0061] FIG. 4 illustrates the operation of the JIT binary
translator 102 of the invention. As illustrated, the JIT binary
translator 102 starts compiling input source code at step 132 by
starting at a provided address. The JIT binary translator 102 thus
starts to build a stream of machine executable code for execution.
However, in accordance with the invention, the parser 102a of the
JIT binary translator 102 identifies functions within the machine
code at step 134 by recognizing code patterns and acting
accordingly. For example, a source function may be defined as
having a prolog, a body, and an epilog that together perform a task
and return with processed variables. The prolog introduces the
function and defines variables and the epilog ends the function to
return control flow as appropriate and to return the variable
values. Typically, the epilog is a RET or IRET function. On the
other hand, the body includes code statements and conditions for
executing other statements, including conditional branches, which
may or may not be nested.
[0062] Several examples of how the parser 102a parses simple
functions from the code list follows.
[0063] A. Adding of integers TABLE-US-00001 int add(int i, int j) :
prolog { : mov eax, i return (i+j); : add eax, j } : epilog
[0064] B. Multiplying of integers TABLE-US-00002 int multiply(int
i, int j) : prolog { : mov eax, i return (i*j); : imul eax, j } :
epilog
[0065] C. Calculate j+(i*j) for integers i,j TABLE-US-00003 int
multiplyadd(int i, int j) : prolog { : push j : push i return
add(multiply(i,j), j); : call multiply : push eax : push j : call
add } : epilog
[0066] D. Example with conditional jumps
[0067] The following example illustrates outstanding condition
branches requiring resolution before the function is considered
complete: TABLE-US-00004 int arithmetic (int i, int j, int
operation) { : prolog if (operation == ADD) : cmp operation,ADD { :
jnz NotAdd return (i+j); : mov eax,i : add eax,j : ret } : NotAdd:
else if (operation == SUBTRACT) : cmp operation,SUBTRACT { : jnz
NotSubtract return (i-j); : mov eax,i : sub eax,j : ret } :
NotSubtract: else if (operation == MULTIPLY) : cmp
operation,MULTIPLY { : jnz NotMultiply return (i*j); : mov eax,i :
imul eax,j : ret } : NotMultiply: else if (operation == DIVIDE) :
cmp operation,DIVIDE { : jnz NotDivide return (i/j); : mov eax,i :
idiv eax,j : ret } : NotDivide: } : epilog
[0068] As illustrated in the above examples, the parser 102a treats
the prolog, body, and epilog as one functional block. The block is
identified by analyzing the code to identify the prolog and epilog
and to identify branch operations. As illustrated at step 134, a
function is known to be complete if there are no outstanding
conditional branches when the epilog is reached. In other words, if
RET or IRET is encountered by the parser 102a and no conditional
branches are outstanding, then the JIT binary translator 102 knows
that the end of the machine code function has been reached.
[0069] The resulting functional block of code provided by the
parser 102a may be optimized at step 136 by optimizer 102b of the
JIT binary translator 102 to improve processing efficiency. For
example, the PowerPC processor is natively big endian and data
loaded in big endian format requires one (or possibly a maximum of
two) PowerPC instruction whereas the x86 is natively little endian
and data loaded in little format may require one or more (possibly
up to 7) PowerPC instructions. Thus, one obvious optimization that
may be performed by optimizer 102b is to store the data in big
endian format whenever possible and to avoid converting the data to
little endian format. This optimization results in less
instructions that must be processed at run time.
[0070] As another simple example, suppose a block of source code is
written to calculate the value of i, where i=j*k. The code could be
written as: TABLE-US-00005 k=0 jump to routine to calculate value
of j return value of j i=j*k
In this simple example, since k=0, the product will be zero no
matter what the calculated value is for j. Accordingly, this code
may be optimized to i=0. Those skilled in the art will appreciate
that in conventional systems, where each instructions is separately
translated, the jump routine would have to be resolved since the
context of the instruction would not have been known.
[0071] Once the function has been identified and the code
optimized, at step 138, the processor instructions making up the
function in the input machine code are converted into machine code
of the target processor (e.g., PowerPC from x86). Then, at step
140, the generated machine code is optimized by, for example,
reducing the instruction count, cycle count, and likely cache miss
rate as much as possible. The resulting optimized machine code for
the target processor is stored in the translated code cache 114 for
execution at step 142. Finally, at step 144, an entry is placed in
the dispatcher hash table identifying the optimized code block so
as to avoid recompiling the same functional block the next time it
is encountered in the input code stream.
[0072] Thus, the invention provides a mechanism whereby JIT binary
translator may more efficiently translate instructions written for
a first processor to instructions for a second processor based on
the context of the received instructions. In particular, the binary
translations are performed for functional blocks of code and
optimized so as to speed up the binary translation operation. Such
a JIT binary translator in accordance with the invention is
particularly advantageous when used with programs or games running
in a virtual machine environment where quick translations are
critical to smooth operation. Those skilled in the art will
appreciate that such techniques may be extended to all sorts of
applications, not just game systems. Moreover, the techniques of
the invention may be used to provide binary translations in other
computer systems implementing software emulation techniques.
Exemplary Networked and Distributed Environments
[0073] Although an exemplary embodiment of the invention may be
implemented in connection with the Xbox game system architecture,
one of ordinary skill in the art can appreciate that the invention
can be implemented in connection with any suitable host computer or
other client or server device, which can be deployed as part of a
computer network, or in a distributed computing environment. In
this regard, the invention pertains to any computer system or
environment having any number of memory or storage units, and any
number of applications and processes occurring across any number of
storage units or volumes, which may be used in connection with
virtualizing a guest OS in accordance with the invention. The
invention may apply to an environment with server computers and
client computers deployed in a network environment or distributed
computing environment, having remote or local storage. The
invention may also be applied to standalone computing devices,
having programming language functionality, interpretation and
execution capabilities for generating, receiving and transmitting
information in connection with remote or local services.
[0074] Distributed computing provides sharing of computer resources
and services by exchange between computing devices and systems.
These resources and services include the exchange of information,
cache storage and disk storage for files. Distributed computing
takes advantage of network connectivity, allowing clients to
leverage their collective power to benefit the entire enterprise.
In this regard, a variety of devices may have applications, objects
or resources that may implicate the processes of the invention.
[0075] FIG. 5A provides a schematic diagram of an exemplary
networked or distributed computing environment. The distributed
computing environment comprises computing objects 145a, 145b, etc.
and computing objects or devices 146a, 146b, 146c, etc. These
objects may comprise programs, methods, data stores, programmable
logic, etc. The objects may comprise portions of the same or
different devices such as PDAs, audio/video devices, MP3 players,
personal computers, etc. Each object can communicate with another
object by way of the communications network 147. This network may
itself comprise other computing objects and computing devices that
provide services to the system of FIG. 5A, and may itself represent
multiple interconnected networks. In accordance with an aspect of
the invention, each object 145a, 145b, etc. or 146a, 146b, 146c,
etc. may contain an application that might make use of an API, or
other object, software, firmware and/or hardware, to request use of
the virtualization processes of the invention.
[0076] It can also be appreciated that an object, such as 146c, may
be hosted on another computing device 145a, 145b, etc. or 146a,
146b, etc. Thus, although the physical environment depicted may
show the connected devices as computers, such illustration is
merely exemplary and the physical environment may alternatively be
depicted or described comprising various digital devices such as
PDAs, televisions, MP3 players, etc., software objects such as
interfaces, COM objects and the like.
[0077] There are a variety of systems, components, and network
configurations that support distributed computing environments. For
example, computing systems may be connected together by wired or
wireless systems, by local networks or widely distributed networks.
Currently, many of the networks are coupled to the Internet, which
provides an infrastructure for widely distributed computing and
encompasses many different networks. Any of the infrastructures may
be used for exemplary communications made incident to the
virtualization processes of the invention.
[0078] In home networking environments, there are at least four
disparate network transport media that may each support a unique
protocol, such as Power line, data (both wireless and wired), voice
(e.g., telephone) and entertainment media. Most home control
devices such as light switches and appliances may use power lines
for connectivity. Data Services may enter the home as broadband
(e.g., either DSL or Cable modem) and are accessible within the
home using either wireless (e.g., HomeRF or 802.11B) or wired
(e.g., Home PNA, Cat 5, Ethernet, even power line) connectivity.
Voice traffic may enter the home either as wired (e.g., Cat 3) or
wireless (e.g., cell phones) and may be distributed within the home
using Cat 3 wiring. Entertainment media, or other graphical data,
may enter the home either through satellite or cable and is
typically distributed in the home using coaxial cable. IEEE 1394
and DVI are also digital interconnects for clusters of media
devices. All of these network environments and others that may
emerge as protocol standards may be interconnected to form a
network, such as an intranet, that may be connected to the outside
world by way of the Internet. In short, a variety of disparate
sources exist for the storage and transmission of data, and
consequently, moving forward, computing devices will require ways
of sharing data, such as data accessed or utilized incident to
program objects, which make use of the virtualized services in
accordance with the invention.
[0079] The Internet commonly refers to the collection of networks
and gateways that utilize the TCP/IP suite of protocols, which are
well-known in the art of computer networking. TCP/IP is an acronym
for "Transmission Control Protocol/Internet Protocol." The Internet
can be described as a system of geographically distributed remote
computer networks interconnected by computers executing networking
protocols that allow users to interact and share information over
the network(s). Because of such wide-spread information sharing,
remote networks such as the Internet have thus far generally
evolved into an open system for which developers can design
software applications for performing specialized operations or
services, essentially without restriction.
[0080] Thus, the network infrastructure enables a host of network
topologies such as client/server, peer-to-peer, or hybrid
architectures. The "client" is a member of a class or group that
uses the services of another class or group to which it is not
related. Thus, in computing, a client is a process, i.e., roughly a
set of instructions or tasks, that requests a service provided by
another program. The client process utilizes the requested service
without having to "know" any working details about the other
program or the service itself. In a client/server architecture,
particularly a networked system, a client is usually a computer
that accesses shared network resources provided by another
computer, e.g., a server. In the example of FIG. 5A, computers
146a, 146b, etc. can be thought of as clients and computers 145a,
145b, etc. can be thought of as the server where server 145a, 145b,
etc. maintains the data that is then replicated in the client
computers 146a, 146b, etc., although any computer can be considered
a client, a server, or both, depending on the circumstances. Any of
these computing devices may be processing data or requesting
services or tasks that may implicate an implementation of the
virtualization processes of the invention.
[0081] A server is typically a remote computer system accessible
over a remote or local network, such as the Internet. The client
process may be active in a first computer system, and the server
process may be active in a second computer system, communicating
with one another over a communications medium, thus providing
distributed functionality and allowing multiple clients to take
advantage of the information-gathering capabilities of the server.
Any software objects utilized pursuant to making use of the
virtualized architecture(s) of the invention may be distributed
across multiple computing devices or objects.
[0082] Client(s) and server(s) communicate with one another
utilizing the functionality provided by protocol layer(s). For
example, HyperText Transfer Protocol (HTTP) is a common protocol
that is used in conjunction with the World Wide Web (WWW), or "the
Web." Typically, a computer network address such as an Internet
Protocol (IP) address or other reference such as a Universal
Resource Locator (URL) can be used to identify the server or client
computers to each other. The network address can be referred to as
a URL address. Communication can be provided over a communications
medium, e.g., client(s) and server(s) may be coupled to one another
via TCP/IP connection(s) for high-capacity communication.
[0083] FIG. 5A illustrates an exemplary networked or distributed
environment, with a server in communication with client computers
via a network/bus, in which the invention may be employed. In more
detail, a number of servers 145a, 145b, etc., are interconnected
via a communications network/bus 147, which may be a LAN, WAN,
intranet, the Internet, etc., with a number of client or remote
computing devices 146a, 146b, 146c, 146d, 146e, etc., such as a
portable computer, handheld computer, thin client, networked
appliance, or other device, such as a VCR, TV, oven, light, heater
and the like. It is thus contemplated that the invention may apply
to any computing device in connection with which it is desirable to
implement guest interfaces and operating systems in accordance with
the invention.
[0084] In a network environment in which the communications
network/bus 147 is the Internet, for example, the servers 145a,
145b, etc. can be Web servers with which the clients 146a, 146b,
146c, 146d, 146e, etc. communicate via any of a number of known
protocols such as HTTP. Servers 145a, 145b, etc. may also serve as
clients 146a, 146b, 146c, 146d, 146e, etc., as may be
characteristic of a distributed computing environment.
[0085] Communications may be wired or wireless, where appropriate.
Client devices 146a, 146b, 146c, 146d, 146e, etc. may or may not
communicate via communications network/bus 147, and may have
independent communications associated therewith. For example, in
the case of a TV or VCR, there may or may not be a networked aspect
to the control thereof. Each client computer 146a, 146b, 146c,
146d, 146e, etc. and server computer 145a, 145b, etc. may be
equipped with various application program modules or objects 148
and with connections or access to various types of storage elements
or objects, across which files or data streams may be stored or to
which portion(s) of files or data streams may be downloaded,
transmitted or migrated. Any one or more of computers 145a, 145b,
146a, 146b, etc. may be responsible for the maintenance and
updating of a database 149 or other storage element, such as a
database or memory 149 for storing data processed according to the
invention. Thus, the invention can be utilized in a computer
network environment having client computers 146a, 146b, etc. that
can access and interact with a computer network/bus 147 and server
computers 145a, 145b, etc. that may interact with client computers
146a, 146b, etc. and other like devices, and databases 149.
Exemplary Computing Device
[0086] FIG. 5B and the following discussion are intended to provide
a brief general description of a suitable host computing
environment in connection with which the invention may be
implemented. It should be understood, however, that handheld,
portable and other computing devices, portable and fixed gaming
devices, and computing objects of all kinds are contemplated for
use in connection with the invention. While a general purpose
computer is described below, this is but one example, and the
invention may be implemented with a thin client having network/bus
interoperability and interaction. Thus, the invention may be
implemented in an environment of networked hosted services in which
very little or minimal client resources are implicated, e.g., a
networked environment in which the client device serves merely as
an interface to the network/bus, such as an object placed in an
appliance. In essence, anywhere that data may be stored or from
which data may be retrieved or transmitted to another computer is a
desirable, or suitable, environment for operation of the
virtualization techniques in accordance with the invention.
[0087] Although not required, the invention can be implemented in
whole or in part via an operating system, for use by a developer of
services for a device or object, and/or included within application
software that operates in connection with the virtualized OS of the
invention. Software may be described in the general context of
computer-executable instructions, such as program modules, being
executed by one or more computers, such as client workstations,
servers or other devices. Generally, program modules include
routines, programs, objects, components, data structures and the
like that perform particular tasks or implement particular abstract
data types. Typically, the functionality of the program modules may
be combined or distributed as desired in various embodiments.
Moreover, those skilled in the art will appreciate that the
invention may be practiced with other computer system
configurations and protocols. Other well known computing systems,
environments, and/or configurations that may be suitable for use
with the invention include, but are not limited to, personal
computers (PCs), automated teller machines, server computers,
hand-held or laptop devices, multi-processor systems,
microprocessor-based systems, programmable consumer electronics,
network PCs, appliances, lights, environmental control elements,
minicomputers, mainframe computers and the like. As noted above,
the invention may also be practiced in distributed computing
environments where tasks are performed by remote processing devices
that are linked through a communications network/bus or other data
transmission medium. In a distributed computing environment,
program modules may be located in both local and remote computer
storage media including memory storage devices, and client nodes
may in turn behave as server nodes.
[0088] FIG. 5B illustrates an example of a suitable host computing
system environment 150 in which the invention may be implemented,
although as made clear above, the host computing system environment
150 is only one example of a suitable computing environment and is
not intended to suggest any limitation as to the scope of use or
functionality of the invention. Neither should the computing
environment 150 be interpreted as having any dependency or
requirement relating to any one or combination of components
illustrated in the exemplary operating environment 150.
[0089] With reference to FIG. 5B, an exemplary system for
implementing the invention includes a general purpose computing
device in the form of a computer 160. Components of computer 160
may include, but are not limited to, a processing unit 162, a
system memory 164, and a system bus 166 that couples various system
components including the system memory to the processing unit 162.
The system bus 166 may be any of several types of bus structures
including a memory bus or memory controller, a peripheral bus, and
a local bus using any of a variety of bus architectures. By way of
example, and not limitation, such architectures include Industry
Standard Architecture (ISA) bus, Micro Channel Architecture (MCA)
bus, Enhanced ISA (EISA) bus, Video Electronics Standards
Association (VESA) local bus, Peripheral Component Interconnect
(PCI) bus (also known as Mezzanine bus), and PCI Express
(PCIe).
[0090] Computer 160 typically includes a variety of computer
readable media. Computer readable media can be any available media
that can be accessed by computer 160 and includes both volatile and
nonvolatile media, removable and non-removable media. By way of
example, and not limitation, computer readable media may comprise
computer storage media and communication media. Computer storage
media includes both volatile and nonvolatile, removable and
non-removable media implemented in any method or technology for
storage of information such as computer readable instructions, data
structures, program modules or other data. Computer storage media
includes, but is not limited to, RAM, ROM, EEPROM, flash memory or
other memory technology, CDROM, digital versatile disks (DVD) or
other optical disk storage, magnetic cassettes, magnetic tape,
magnetic disk storage or other magnetic storage devices, or any
other medium which can be used to store the desired information and
which can be accessed by computer 160. Communication media
typically embodies computer readable instructions, data structures,
program modules or other data in a modulated data signal such as a
carrier wave or other transport mechanism and includes any
information delivery media. The term "modulated data signal" means
a signal that has one or more of its characteristics set or changed
in such a manner as to encode information in the signal. By way of
example, and not limitation, communication media includes wired
media such as a wired network or direct-wired connection, and
wireless media such as acoustic, RF, infrared and other wireless
media. Combinations of any of the above should also be included
within the scope of computer readable media.
[0091] The system memory 164 includes computer storage media in the
form of volatile and/or nonvolatile memory such as read only memory
(ROM) 168 and random access memory (RAM) 170. A basic input/output
system 172 (BIOS), containing the basic routines that help to
transfer information between elements within computer 160, such as
during start-up, is typically stored in ROM 168. RAM 170 typically
contains data and/or program modules that are immediately
accessible to and/or presently being operated on by processing unit
162. By way of example, and not limitation, FIG. 5B illustrates
operating system 174, application programs 176, other program
modules 178, and program data 180.
[0092] The computer 160 may also include other
removable/non-removable, volatile/nonvolatile computer storage
media. By way of example only, FIG. 5B illustrates a hard disk
drive 182 that reads from or writes to non-removable, nonvolatile
magnetic media, a magnetic disk drive 184 that reads from or writes
to a removable, nonvolatile magnetic disk 186, and an optical disk
drive 188 that reads from or writes to a removable, nonvolatile
optical disk 190, such as a CD-ROM or other optical media. Other
removable/non-removable, volatile/nonvolatile computer storage
media that can be used in the exemplary operating environment
include, but are not limited to, magnetic tape cassettes, flash
memory cards, digital versatile disks, digital video tape, solid
state RAM, solid state ROM and the like. The hard disk drive 182 is
typically connected to the system bus 166 through a non-removable
memory interface such as interface 192, and magnetic disk drive 184
and optical disk drive 188 are typically connected to the system
bus 166 by a removable memory interface, such as interface 194.
[0093] The drives and their associated computer storage media
discussed above and illustrated in FIG. 5B provide storage of
computer readable instructions, data structures, program modules
and other data for the computer 160. In FIG. 5B, for example, hard
disk drive 182 is illustrated as storing operating system 196,
application programs 198, other program modules 200 and program
data 202. Note that these components can either be the same as or
different from operating system 174, application programs 176,
other program modules 178 and program data 180. Operating system
196, application programs 198, other program modules 200 and
program data 202 are given different numbers here to illustrate
that, at a minimum, they are different copies. A user may enter
commands and information into the computer 160 through input
devices such as a keyboard 204 and pointing device 206, commonly
referred to as a mouse, trackball or touch pad. Other input devices
(not shown) may include a microphone, joystick, game pad, satellite
dish, scanner, or the like. These and other input devices are often
connected to the processing unit 162 through a user input interface
208 that is coupled to the system bus 166, but may be connected by
other interface and bus structures, such as a parallel port, game
port or a universal serial bus (USB). These are the kinds of
structures that are virtualized by the architectures of the
invention. A graphics interface 210, such as one of the interfaces
implemented by the Northbridge, may also be connected to the system
bus 166. Northbridge is a chipset that communicates with the CPU,
or host processing unit 162, and assumes responsibility for
communications such as PCI, PCIe and accelerated graphics port
(AGP) communications. One or more graphics processing units (GPUs)
212 may communicate with graphics interface 210. In this regard,
GPUs 212 generally include on-chip memory storage, such as register
storage and GPUs 212 communicate with a video memory 214. GPUs 212,
however, are but one example of a coprocessor and thus a variety of
coprocessing devices may be included in computer 160, and may
include a variety of procedural shaders, such as pixel and vertex
shaders. A monitor 216 or other type of display device is also
connected to the system bus 166 via an interface, such as a video
interface 218, which may in turn communicate with video memory 214.
In addition to monitor 216, computers may also include other
peripheral output devices such as speakers 220 and printer 222,
which may be connected through an output peripheral interface
224.
[0094] The computer 160 may operate in a networked or distributed
environment using logical connections to one or more remote
computers, such as a remote computer 226. The remote computer 226
may be a personal computer, a server, a router, a network PC, a
peer device or other common network node, and typically includes
many or all of the elements described above relative to the
computer 160, although only a memory storage device 228 has been
illustrated in FIG. 5B. The logical connections depicted in FIG. 5B
include a local area network (LAN) 230 and a wide area network
(WAN) 232, but may also include other networks/buses. Such
networking environments are commonplace in homes, offices,
enterprise-wide computer networks, intranets and the Internet.
[0095] When used in a LAN networking environment, the computer 160
is connected to the LAN 230 through a network interface or adapter
234. When used in a WAN networking environment, the computer 160
typically includes a modem 236 or other means for establishing
communications over the WAN 232, such as the Internet. The modem
236, which may be internal or external, may be connected to the
system bus 166 via the user input interface 208, or other
appropriate mechanism. In a networked environment, program modules
depicted relative to the computer 160, or portions thereof, may be
stored in the remote memory storage device. By way of example, and
not limitation, FIG. 5B illustrates remote application programs 238
as residing on memory device 228. It will be appreciated that the
network connections shown are exemplary and other means of
establishing a communications link between the computers may be
used.
[0096] There are multiple ways of implementing the invention, e.g.,
an appropriate API, tool kit, driver code, operating system,
control, standalone or downloadable software object, etc. which
enables applications and services to use the virtualized
architecture(s), systems and methods of the invention. The
invention contemplates the use of the invention from the standpoint
of an API (or other software object), as well as from a software or
hardware object that receives any of the aforementioned techniques
in accordance with the invention. Thus, various implementations of
the invention described herein may have aspects that are wholly in
hardware, partly in hardware and partly in software, as well as in
software.
[0097] As mentioned above, while exemplary embodiments of the
invention have been described in connection with various computing
devices and network architectures, the underlying concepts may be
applied to any computing device or system in which it is desirable
to emulate guest software. For instance, the various algorithm(s)
and hardware implementations of the invention may be applied to the
operating system of a computing device, provided as a separate
object on the device, as part of another object, as a reusable
control, as a downloadable object from a server, as a "middle man"
between a device or object and the network, as a distributed
object, as hardware, in memory, a combination of any of the
foregoing, etc. One of ordinary skill in the art will appreciate
that there are numerous ways of providing object code and
nomenclature that achieves the same, similar or equivalent
functionality achieved by the various embodiments of the
invention.
[0098] As mentioned, the various techniques described herein may be
implemented in connection with hardware or software or, where
appropriate, with a combination of both. Thus, the methods and
apparatus of the invention, or certain aspects or portions thereof,
may take the form of program code (i.e., instructions) embodied in
tangible media, such as floppy diskettes, CD-ROMs, hard drives, or
any other machine-readable storage medium, wherein, when the
program code is loaded into and executed by a machine, such as a
computer, the machine becomes an apparatus for practicing the
invention. In the case of program code execution on programmable
computers, the computing device generally includes a processor, a
storage medium readable by the processor (including volatile and
non-volatile memory and/or storage elements), at least one input
device, and at least one output device. One or more programs that
may implement or utilize the virtualization techniques of the
invention, e.g., through the use of a data processing API, reusable
controls, or the like, are preferably implemented in a high level
procedural or object oriented programming language to communicate
with a computer system. However, the program(s) can be implemented
in assembly or machine language, if desired. In any case, the
language may be a compiled or interpreted language, and combined
with hardware implementations.
[0099] The methods and apparatus of the invention may also be
practiced via communications embodied in the form of program code
that is transmitted over some transmission medium, such as over
electrical wiring or cabling, through fiber optics, or via any
other form of transmission, wherein, when the program code is
received and loaded into and executed by a machine, such as an
EPROM, a gate array, a programmable logic device (PLD), a client
computer, etc., the machine becomes an apparatus for practicing the
invention. When implemented on a general-purpose processor, the
program code combines with the processor to provide a unique
apparatus that operates to invoke the functionality of the
invention. Additionally, any storage techniques used in connection
with the invention may invariably be a combination of hardware and
software.
[0100] While the invention has been described in connection with
the preferred embodiments of the various figures, it is to be
understood that other similar embodiments may be used or
modifications and additions may be made to the described embodiment
for performing the same function of the invention without deviating
therefrom. For example, while exemplary network environments of the
invention are described in the context of a networked environment,
such as a peer to peer networked environment, one skilled in the
art will recognize that the invention is not limited thereto, and
that the methods, as described in the present application may apply
to any computing device or environment, such as a gaming console,
handheld computer, portable computer, etc., whether wired or
wireless, and may be applied to any number of such computing
devices connected via a communications network, and interacting
across the network. Furthermore, it should be emphasized that a
variety of computer platforms, including handheld device operating
systems and other application specific operating systems are
contemplated, especially as the number of wireless networked
devices continues to proliferate.
[0101] While exemplary embodiments refer to utilizing the invention
in the context of a guest OS virtualized on a host OS, the
invention is not so limited, but rather may be implemented to
virtualize a second specialized processing unit cooperating with a
main processor for other reasons as well. Moreover, the invention
contemplates the scenario wherein multiple instances of the same
version or release of an OS are operating in separate virtual
machines according to the invention. It can be appreciated that the
virtualization of the invention is independent of the operations
for which the guest OS is used. It is also intended that the
invention applies to all computer architectures, not just the
Windows or Xbox architecture. Still further, the invention may be
implemented in or across a plurality of processing chips or
devices, and storage may similarly be effected across a plurality
of devices. Therefore, the invention should not be limited to any
single embodiment, but rather should be construed in breadth and
scope in accordance with the appended claims.
* * * * *