U.S. patent application number 14/563608 was filed with the patent office on 2017-01-05 for techniques for handling memory accesses by processor-independent executable code in a multi-processor environment.
This patent application is currently assigned to Ravello Systems Ltd.. The applicant listed for this patent is Ravello Systems Ltd.. Invention is credited to Leonid Shatz.
Application Number | 20170003988 14/563608 |
Document ID | / |
Family ID | 48744776 |
Filed Date | 2017-01-05 |
United States Patent
Application |
20170003988 |
Kind Code |
A9 |
Shatz; Leonid |
January 5, 2017 |
TECHNIQUES FOR HANDLING MEMORY ACCESSES BY PROCESSOR-INDEPENDENT
EXECUTABLE CODE IN A MULTI-PROCESSOR ENVIRONMENT
Abstract
A method and apparatus for virtual address mapping are provided.
The method includes determining an offset value respective of at
least a first portion of code stored on a code memory unit,
generating a first virtual code respective of the first portion of
code and a second virtual code respective of a second portion of
code stored on the code memory unit; mapping the first virtual code
to a first virtual code address and the second virtual code to a
second virtual code address; generating a first virtual data
respective of the first portion of data and a second virtual data
respective of the second portion of data; and mapping the first
virtual data to a first virtual data address and the second virtual
data to a second virtual data address.
Inventors: |
Shatz; Leonid; (Ra'anana,
IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Ravello Systems Ltd. |
Ra'anana |
|
IL |
|
|
Assignee: |
Ravello Systems Ltd.
Ra'anana
IL
|
Prior
Publication: |
|
Document Identifier |
Publication Date |
|
US 20150095612 A1 |
April 2, 2015 |
|
|
Family ID: |
48744776 |
Appl. No.: |
14/563608 |
Filed: |
December 8, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13483633 |
May 30, 2012 |
8918608 |
|
|
14563608 |
|
|
|
|
61584590 |
Jan 9, 2012 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 12/109 20130101;
G06F 2009/45583 20130101; G06F 2212/657 20130101; G06F 12/0882
20130101; G06F 9/45558 20130101; G06F 9/4555 20130101 |
International
Class: |
G06F 9/455 20060101
G06F009/455; G06F 12/10 20060101 G06F012/10 |
Claims
1. An apparatus for virtual address mapping, comprising: a first
memory unit including a plurality of code portions mapped to a
plurality of respective code virtual address starting points,
wherein each code virtual address starting point of the plurality
of respective code virtual address starting points is set apart
from at least one other code virtual address starting point of the
plurality of respective code virtual address starting points by an
offset of a plurality of offsets; a second memory unit including a
plurality of data portions, each data portion respective of a code
portion of the plurality of code portions, mapped to a plurality of
respective data virtual address starting points, wherein each data
virtual address starting point of the plurality of respective data
virtual address starting points is set apart from at least one
other data virtual address starting point of the plurality of
respective data virtual address starting points by the offset of
the plurality of offsets used to set apart a code virtual address
of the respective code portion; and a memory management unit
configured to map each code portion of the plurality of code
portions to a first memory unit address of the first memory unit,
wherein the memory management unit is further configured to map
each data portion of the plurality of data portions to a second
memory unit address of the second memory unit.
2. The apparatus of claim 1, further comprising: a memory including
the first memory unit and the second memory unit, wherein the first
memory unit address and the second memory unit address are
addresses of the memory.
3. The apparatus of claim 1, wherein each data portion of the
plurality of data portions is accessible by a respective processing
unit via a respective code portion of the plurality of code
portions.
4. The apparatus of claim 3, wherein access to each data portion of
the plurality of data portions is a program counter relative access
respective of the offset of the plurality of offsets used to set
apart a code virtual address of the respective code portion and a
number of the respective processing units by which the plurality of
data portions is accessible.
5. The apparatus of claim 4, wherein an offset value of each offset
of the plurality of offsets is greater than or equal to a length of
the code portion that is set apart by the offset.
6. The apparatus of claim 4, wherein all offset values of offsets
of the plurality of offsets are equal to a length of a longest code
portion of the plurality of code portions.
7. A method for virtual address mapping, comprising: determining an
offset value respective of at least a first portion of code stored
on a code memory unit, wherein the first portion of code is
associated with a first portion of data stored on a data memory
unit and the second portion of code is associated with a second
portion of data stored on the data memory unit; generating a first
virtual code respective of the first portion of code and a second
virtual code respective of a second portion of code stored on the
code memory unit; mapping, by a memory management unit, the first
virtual code to a first virtual code address and the second virtual
code to a second virtual code address; generating a first virtual
data respective of the first portion of data and a second virtual
data respective of the second portion of data; and mapping, by a
memory management unit, the first virtual data to a first virtual
data address and the second virtual data to a second virtual data
address.
8. The method of claim 7, wherein wherein the first virtual code
address has a first virtual code address starting point and the
second virtual code address has a second virtual code address
starting point, wherein the first virtual code address starting
point and the second virtual code address starting point are at
least set apart by the determined offset value.
9. The method of claim 7 wherein the first virtual data address has
a first virtual data address starting point and the second virtual
data address has a second virtual data address starting point,
wherein the first virtual data address starting point and the
second virtual data address starting point are at least set apart
by the determined offset value.
10. The method of claim 7, wherein the offset value is equal to a
length of the first portion of code stored on the code memory
unit.
11. The method of claim 7, wherein the offset value is greater than
a length of the first portion of code stored on the code memory
unit.
12. The method of claim 7, wherein the offset value is equal to an
integer multiplication of a memory page size used for memory
management.
13. The method of claim 7, wherein the offset value is different
for each executed instance of the first portion of code.
14. A non-transitory computer-readable medium having stored thereon
instruction to execute the method according to claim 7.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 13/483,633 filed on May 30, 2012, now allowed,
the contents of which are incorporated herein by reference.
TECHNICAL FIELD
[0002] The invention generally relates to virtual machines (VMs),
and more specifically to execution of a guest in a para-virtualized
environment, execution of operating systems on architectures with
virtual memory, and instruction sets with program-counter (PC)
relative addressing.
BACKGROUND
[0003] There are many cases where it is necessary to execute the
exact same executable code on a plurality of processors and to
provide fast memory access to data on a per-processor basis.
According to prior art techniques, the executable code is copied
for each processing unit instance into physical memory in a
different location and executed therefrom, with further allocation
for each processing unit storing data memory. In cache-based
systems, the repeated copying or instancing of the same
instructions into different areas of the memory often results in
waste and thrashing of the cache content, thereby lowering
performance or requiring larger memories.
[0004] In the Intel.RTM. IA-32e and Advanced Micro Devices.RTM.
(AMD) 64-bit processors' architectures, the legacy GS register
(which is used for segmentation addressing in 32-bit mode) is
retained in vestigial form for use as an extra base pointer to
operating system structures in 64-bit addressing space. The fast
access to per-processor structure is possible with the use of the
new kernel GS register and a "swapgs" instruction. This allows to
somewhat overcome the problems discussed above with regard to the
copying of the code. However, if GS and/or kernel GS registers are
in use by a guest operating system of a virtual machine (VM), these
registers cannot be used by the hypervisor's code to access the
per-processor structures of the hypervisor itself.
[0005] A guest operating system (or simply "guest") is an operating
system that is installed on a virtual machine in addition to the
host (main) operating system running on the hardware system. A
guest is controlled by a hypervisor. The hypervisor presents to the
guest a virtual operating platform and manages the execution of the
guest. Multiple instances of operating systems may share the
virtualized hardware resources. In full virtualization
architecture, the hypervisor sufficiently simulates the hardware on
which the guest executes, such that no modification is required to
the guest. Another virtualized environment is para-virtualization
in which a software interface is used to allow the handling and
modifying of the guest.
[0006] Regardless of the virtualization environment, or otherwise,
current solutions for support execution of the same portions of
code by multiple processors requires either copying of the code, or
does not allow sharing of the GS registers.
[0007] It would be therefore advantageous to provide a solution
that overcomes the deficiencies of the prior art.
SUMMARY
[0008] Certain embodiments disclosed herein include an apparatus
for virtual address mapping. The apparatus comprises a first memory
unit including a plurality of code portions mapped to a plurality
of respective code virtual address starting points, wherein each
code virtual address starting point of the plurality of respective
code virtual address starting points is set apart from at least one
other code virtual address starting point of the plurality of
respective code virtual address starting points by an offset of a
plurality of offsets; a second memory unit including a plurality of
data portions, each data portion respective of a code portion of
the plurality of code portions, mapped to a plurality of respective
data virtual address starting points, wherein each data virtual
address starting point of the plurality of respective data virtual
address starting points is set apart from at least one other data
virtual address starting point of the plurality of respective data
virtual address starting points by the offset of the plurality of
offsets used to set apart a code virtual address of the respective
code portion; and a memory management unit configured to map each
code portion of the plurality of code portions to a first memory
unit address of the first memory unit, wherein the memory
management unit is further configured to map each data portion of
the plurality of data portions to a second memory unit address of
the second memory unit.
[0009] Certain embodiments disclosed herein also include a method
for virtual address mapping. The method comprises determining an
offset value respective of at least a first portion of code stored
on a code memory unit, wherein the first portion of code is
associated with a first portion of data stored on a data memory
unit and the second portion of code is associated with a second
portion of data stored on the data memory unit; generating a first
virtual code respective of the first portion of code and a second
virtual code respective of a second portion of code stored on the
code memory unit; mapping, by a memory management unit, the first
virtual code to a first virtual code address and the second virtual
code to a second virtual code address; generating a first virtual
data respective of the first portion of data and a second virtual
data respective of the second portion of data; and mapping, by a
memory management unit, the first virtual data to a first virtual
data address and the second virtual data to a second virtual data
address.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The subject matter that is regarded as the invention is
particularly pointed out and distinctly claimed in the claims at
the conclusion of the specification. The foregoing and other
objects, features, and advantages of the invention will be apparent
from the following detailed description taken in conjunction with
the accompanying drawings.
[0011] FIG. 1 is a schematic diagram of a system having a plurality
of processing units that execute the same code from an instruction
memory;
[0012] FIG. 2 is a schematic diagram of the data memory according
to an embodiment of the invention; and
[0013] FIG. 3 is a diagram showing a single physical copy of
executable code used with respect to two processors while separate
data memory portions are used for the same code according to one
embodiment.
[0014] FIG. 4 is a flowchart illustrating a method for executing a
single physical copy of a portion of independently executable code
by at least two PUs during the access of different data blocks by
each PU according to one embodiment.
DETAILED DESCRIPTION
[0015] The embodiments of the disclosed embodiments are only
examples of the many possible advantageous uses and implementations
of the innovative teachings presented herein. In general,
statements made in the specification of the present application do
not necessarily limit any of the various claimed embodiments.
Moreover, some statements may apply to some inventive features but
not to others. In general, unless otherwise indicated, singular
elements may be in plural and vice versa with no loss of
generality. In the drawings, like numerals refer to like parts
through several views.
[0016] In a system where a plurality of processing units may
execute a shared code independently, it is necessary to address
data related issues. According to various embodiments disclosed
herein, a per-processing unit data can be efficiently addressed in
a program counter (PC) relative (PCR) mode where data is accessed
using a common offset value for each processor. As a result, while
each of the processing units access the exact same instruction code
stored in physical memory, the processor accesses a different area
in memory for manipulation of data.
[0017] FIG. 1 depicts a system 100 comprising a plurality of
processing units (PUs) 120-1 through 120-N (hereinafter referred to
collectively as Pus 120, or individually as a PU 120) connected by
a communication bus 110. The communication bus 110 may include, but
is not limited to, a serial communication line, a data bus, a
network, and any combination thereof. The network may be, but is
not limited to, a local area network (LAN), wide area network
(WAN), metro area network (MAN), the Internet, the worldwide web
(WWW), a wired or wireless network, and any combination thereof.
Each of the PU 120-1 through 120-N may be, but is not limited to, a
CPU, a controller, a microcontroller, a multi-core processor, a
core of a multi-core processor, and the like as well as
instantiations of same in a virtual environment.
[0018] A memory that may be partitioned virtually or physically to
an instruction memory 150 and a data memory 140 is connected to the
communication bus 110 (the memory is shown in FIG. 1 as the
partitioned data memory 140 and Instruction memory 150). The
instruction memory 150 contains at least a group of a plurality of
instructions that begin at a known address and that are to be
accessed by at least two of the plurality of PUs 120. Typically,
the system 100 comprises logical and physical addresses to access
the instruction memory 150 and the data memory 140.
[0019] In a non-limiting embodiment, the system 100 can operate in
a para-virtualized or full-virtualized mode where the execution of
a plurality of guests, a hypervisor, and a host over the PUs units
are allowed. As mentioned above, when a VM of a guest uses the GS
and/or kernel GS registers, these registers cannot be used by the
hypervisor's code to access the per-PU structures of the hypervisor
itself. To execute the exact same executable code by, for example,
two guests, two hosts, or one guest and one host executing on a
plurality of PUs, the GS registers cannot be utilized to provide
fast access to the per-PU structures.
[0020] According to certain embodiments disclosed herein, all
instances of the code are mapped by, for example, a memory mapper
of the host, to the same physical address of the instruction memory
150 of the system. As a result, it is assured that there is only a
single copy of the instructions to be executed. In addition, but
not by way of limitation, the mapping may further prevent cache
overloading for certain types of cache implementations (e.g.,
physically-indexed cache and physically tagged (PIPT) cache) when
used in conjunction with either the instruction memory 150 or the
data memory 140. It should be noted that, while the data memory 140
and the instruction memory 150 are shown as separate memories, it
is possible to have them in the same physical memory but in
different address spaces.
[0021] All access of data in the data memory 140 by the code in the
instruction memory 150 is performed as a PCR access with an offset
value. A basic memory map for the data portion is shown in FIG. 2.
The offset value is large enough to move outside of the memory page
boundaries of the code and is different for each instance executing
on a PU 120. In an embodiment, a calculation of a data address may
be performed as follows:
data_address (m)=[PC(m)]+dataoffset
where, virtual memory mappings may be created such that:
PC(m)=PC(1)+pcoffset*(m-1)
and virtual memory address of per-PU data block, i.e., the address
for each data block made available to each PU, is calculated for
each PU-m as
data_block(m)=data_block(1)+pcoffset*(m-1)
[0022] where m is an integer having values 1,2, . . . N, N is the
maximum number of processing units (PUs) in the system, and PC(m)
is the PC of a respective PU(m). The value of pcroffset must be
larger than the difference between the first instruction's address
that accesses data and the last instruction address that accesses
data for the same code portion. The value of dataoffset determines
the location of a specific data item within a data block pointed to
by [PC(m)]. Typically, this is rounded up to a memory management
unit (MMU) page size integer multiplier. For example, if the code
spans an address space of H'FFFF, then the offset value can be
H'10000, which ensures that the data for each PU 120 will be at a
separate location in the memory while the same code is used. The
basic memory map for the data portion is shown in FIG. 2.
[0023] The data offset is set for each code instruction as a
difference between the program counter and the data object in a
data memory block associated with a first processing unit such as,
e.g., PU 120-1. This is achieved by the virtual memory mappings as
discussed hereinabove. It should be understood that, according to
an embodiment, all PUs 120 have the same data offset. Furthermore,
the associated per-PU 120 data blocks are set apart from each other
by the same offset as instruction code blocks for each PU 120. With
the memory mappings defined as described hereinabove, each one of
the PUs 120 can access its per-PU 120 data block using a single
physical copy of the instruction code. This holds true for every
code instruction instance having access to per-PU data according to
the principles of this invention. It should be understood that data
offsets may vary from one instruction instance to another. However,
once determined, the data offsets shall remain equal for all PUs
120 relative to the PCR addressing mode.
[0024] An exemplary and non-limiting schematic diagram 300 of a
single physical copy of executable code used with respect to two
processors while separate data memory portions are used for the
same code according to an embodiment is shown in FIG. 3. Two PUs
310 and 320 are shown, each having a respective program counter 312
and 322, wherein the program counters used for the PCR address
access are explained hereinabove. Each of the PUs 310 and 320
accesses, at least in part, the same code portion 350 in a physical
memory 340. Using a memory management scheme, the physical code
(P-Code) 350 is mapped for each of the PUs 310 and 320, to two
different virtual codes (V-codes) 314 and 324, respectively, in a
virtual memory 330, and at a predefined offset 335, as explained in
more detail hereinabove. Specifically, the code is stored in the
code portion 350 of the physical memory 340, which is equivalent to
the physical memory 150 of FIG. 1, and the data is stored in data
portion 360 and data portion 370 of physical memory 340. The
physical memory 340 is equivalent to the physical memory 140 of
FIG. 1 when the memories 140 and 150 are in the same memory. In
this way, each of the PUs 310 and 320, by means of their respective
program counters 312 and 322, access the same P-Code 350 through
mapping of the respective V-codes 314 and 324. This ensures that a
single copy or instance of the common portion of code is used in
the physical memory 340.
[0025] Using the mapping scheme discussed hereinabove, the PUs 310
and 320 access physical data portions 360 and 370, respectively, of
the physical data memory 340. Such access is performed using data
PCR addressing, which is performed through the respective virtual
data (V-data) portions 316 and 326, placed at a distance which is
the same as predefined offset value 335. Hence, by using the
solution discussed hereinabove, the same code may be used a
plurality of times without having multiple copies thereof, while
the data portions remain separate and accessible by the respective
PU. While the description herein is with respect to two PUs and
their respective instruction and data blocks, such an embodiment is
merely an exemplary embodiment and should not be viewed as limiting
the disclosed embodiments.
[0026] FIG. 4 shows an exemplary and non-limiting flowchart 400
illustrating a method for executing a single physical copy of a
portion of code executed independently by at least two PUs while
accessing at least two different data blocks, wherein there is one
data block for each PU (e.g., the PUs 120). The method is performed
by at least one of a host operating system or a hypervisor. The
method is typically performed when preparing a portion of code to
be executed in the described environment and thereafter as access
to the physical memory is performed according to the method.
[0027] At S410, an offset value that is larger than or equal to the
length of the portion of the executable code is determined. At
S420, the different virtual addresses of a portion of common code,
to be executed by each PU, are mapped to a single physical address.
The virtual addresses allocated for each PU are set apart from each
other by the offset value determined at S410.
[0028] At S430, the address spaces in the virtual memory of data
blocks respective of each PU that needs to execute the portion of
common code are mapped to physical addresses. The virtual addresses
are set apart by the offset value. The data blocks may be used by
the PUs to execute the portion of common code to store and retrieve
data therein that is different for each PU and therefore cannot be
shared.
[0029] At S440, during execution of the portion of the common code
independently by each PU executing the common code, each such PU
accesses the same copy of the portion of common code in the
physical memory through the mappings of the respective virtual
address. Access to data blocks by each PU executing the portion of
common code is performed using a PCR addressing respective of the
virtual addresses of the data and the computed offset, as explained
hereinabove in greater detail.
[0030] The embodiments disclosed herein may be used in virtual
machines (VMs), and more specifically for execution of a guest in a
para-virtualized environment, and can be also useful for operating
systems running on architectures with virtual memories and
instruction sets with PCR addressing. It should be further noted
that the disclosed embodiments may be used exclusively for
addressing all data and instruction portions; however, this is not
required and the disclosed embodiments can be used in conjunction
with other methods of data and instruction access such that a
portion of the data and instructions are accessed in a PCR mode as
explained hereinabove, and other portions are accessed
differently.
[0031] A person of ordinary skill in the art would recognize that
both physical and virtual instantiations may benefit from the
embodiments disclosed herein. Hence, processing units may be both
physical devices and virtual devices executing on other virtual or
physical devices at an as deep as required hierarchy. Similarly,
the memories may be virtually mapped to physical memories directly,
and virtual memories may be mapped to other virtual memories that
are then mapped to physical memories in an as deep as required
hierarchy. All such embodiments should be considered an integral
part of the invention.
[0032] The embodiments disclosed herein may be implemented as
hardware, firmware, software, or any combination thereof. Moreover,
the software is preferably implemented as a program, for example as
a part of system program such as, and without limitations, an
operating system or hypervisor, tangibly embodied on a program
storage unit or tangible computer readable medium consisting of
parts, or of certain devices and/or a combination of devices. The
program may be uploaded to, and executed by, a machine comprising
any suitable architecture. Preferably, the machine is implemented
on a computer platform having hardware such as one or more central
processing units ("CPUs") and/or controllers, and/or
microprocessors, and other processing units, a memory, and
input/output interfaces. The memory may be a volatile memory,
non-volatile memory or any combination thereof. The computer
platform may also include an operating system and microinstruction
code. The various processes and functions described herein may be
either part of the microinstruction code or part of the application
program, or any combination thereof, which may be executed by a
CPU, whether or not such computer or processor is explicitly shown.
In addition, various other peripheral units may be connected to the
computer platform such as an additional data storage unit and a
printing unit. All or some of the servers maybe combined into one
or more integrated servers. Furthermore, a non-transitory computer
readable medium is any computer readable medium except for a
transitory propagating signal. The display segments and
mini-display segments may be shown on a display area that can be a
browser or another other appropriate application, either generic or
tailored for the purposes described in detail hereinabove.
[0033] All examples and conditional language recited herein are
intended for pedagogical purposes to aid the reader in
understanding the principles of the disclosed embodiments and the
concepts contributed by the inventor to furthering the art, and are
to be construed as being without limitation to such specifically
recited examples and conditions. Moreover, all statements herein
reciting principles, aspects, and embodiments of the disclosed
embodiments, as well as specific examples thereof, are intended to
encompass both structural and functional equivalents thereof.
Additionally, it is intended that such equivalents include both
currently known equivalents as well as equivalents developed in the
future, i.e., any elements developed that perform the same
function, regardless of structure.
* * * * *