U.S. patent application number 12/106437 was filed with the patent office on 2009-04-30 for integrated circuit with dma module for loading portions of code to a code memory for execution by a host processor that controls a video decoder.
Invention is credited to Aniruddha Sane, Manoj Kumar Vajhallya.
Application Number | 20090113177 12/106437 |
Document ID | / |
Family ID | 32302528 |
Filed Date | 2009-04-30 |
United States Patent
Application |
20090113177 |
Kind Code |
A1 |
Sane; Aniruddha ; et
al. |
April 30, 2009 |
INTEGRATED CIRCUIT WITH DMA MODULE FOR LOADING PORTIONS OF CODE TO
A CODE MEMORY FOR EXECUTION BY A HOST PROCESSOR THAT CONTROLS A
VIDEO DECODER
Abstract
A system, method, and apparatus for dynamically booting
processor code memory with a wait instruction is presented herein.
A wait instruction precedes the transfer of a new code portion to
the code memory. The wait instruction causes the processor to
temporarily cease using the code memory. When the processor ceases
using the code memory, the processor signals a direct memory access
(DMA) module to transfer a new code portion to the code memory. The
DMA module transfers the new code portion to the code memory and
transmits a signal to the processor when the transfer is completed.
The signal causes the processor to resume. When the processor
resumes, the processor begins executing the instructions at the
next code address.
Inventors: |
Sane; Aniruddha; (US)
; Vajhallya; Manoj Kumar; (US) |
Correspondence
Address: |
Christopher C. Winslade, Esq.;McANDREWS, HELD & MALLOY, LTD.
34th Floor, 500 West Madison Street
Chicago
IL
60661
US
|
Family ID: |
32302528 |
Appl. No.: |
12/106437 |
Filed: |
April 21, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10411632 |
Apr 11, 2003 |
7380114 |
|
|
12106437 |
|
|
|
|
60426583 |
Nov 15, 2002 |
|
|
|
Current U.S.
Class: |
712/205 ;
712/220; 712/E9.016 |
Current CPC
Class: |
G06F 9/3802 20130101;
G06F 9/30079 20130101; G06F 9/3867 20130101; G06F 9/4401
20130101 |
Class at
Publication: |
712/205 ;
712/220; 712/E09.016 |
International
Class: |
G06F 9/30 20060101
G06F009/30 |
Claims
1. A method for executing a program, said method comprising:
executing a portion of the program; and executing an instruction
responsive to executing the portion of the program, wherein
execution of the instruction causes cessation of program execution
until another portion of the program is available for
execution.
2. The method of claim 1, further comprising: executing the another
portion of the program.
3. The method of claim 2, wherein executing the another portion of
the program further comprises: receiving an indication that the
another portion of the program is available for execution; and
executing at least one instruction from the another portion of the
program responsive to receiving the indication.
4. A method for executing instructions, said method comprising:
fetching a first instruction; decoding the first instruction; and
waiting until a portion of a program is available for processing
before fetching a second instruction, wherein the first instruction
is a wait instruction.
5. The method of claim 4, further comprising: transmitting a first
signal, wherein the first instruction is a wait instruction.
6. The method of claim 5, further comprising: transferring the
portion of the program responsive to transmitting the first
signal.
7. The method of claim 4, further comprising: receiving a signal
indicating the portion of the program is available for
execution.
8. A circuit for executing a program, said circuit comprising: a
code memory for storing a portion of the program and a particular
instruction; a processor for executing the portion of the program
and the particular instruction; and wherein execution of the
instruction by the processor causes cessation of program execution
until another portion of the program is stored in the code
memory.
9. The circuit of claim 8, wherein the processor executes the
another portion of the program responsive to storage of the another
portion of the program in code memory.
10. The circuit of claim 8, further comprising: a direct memory
access module for loading the code memory with the another portion
of the program.
11. The circuit of claim 10, wherein the direct memory access
module transmits an indication that the another portion of the
program is stored in the code memory and wherein the processor
executes at least one instruction from the another portion of the
program responsive to receiving the indication.
12. The circuit of claim 9, wherein the processor further
comprises: a fetch stage for fetching the particular instruction
from the code memory; a decode stage for decoding the particular
instruction; an execution stage for executing the particular
instruction; and wherein the fetch stage waits until the another
particular portion of the program is stored in the code memory
before fetching a second instruction, responsive to the execution
stage executing the particular instruction.
13. A processor for executing instructions, said processor
comprising: a fetch stage connected to a code memory; a decode
stage connected to the fetch stage; an execution stage connected to
the decode stage; and a link connecting the execution stage to the
fetch stage, wherein the execution stage transmits a signal over
the link causing the fetch stage to cease fetching instructions
from the code memory.
Description
RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 10/411,632, "Integrated Circuit With DMA
Module For Loading Portions of Code To A Code Memory For Execution
By A Host Processor That Controls A Video Decoder", 14144US02,
filed Apr. 11, 2003, and claims the priority to U.S. Provisional
Application for Patent Ser. No. 60/426,583, "Dynamic Booting of
Processor Code Memory using Special Wait Instruction", 14144US01,
filed Nov. 15, 2002, by Sane, et. al.
FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] [Not Applicable]
MICROFICHE/COPYRIGHT REFERENCE
[0003] [Not Applicable]
BACKGROUND OF THE INVENTION
[0004] As applications of embedded processors become more complex,
the size of code for such applications is increasing, thereby
increasing the size of processor code memory. However, increasing
the size of the processor code memory is expensive and is also an
inefficient use of chip real estate.
[0005] Some processors solve this problem by using a cache in place
of the code memory. The cache stores only a portion of the code for
an application at any given time. When the code address points to a
code that is not in the cache at any particular point of time, a
cache miss occurs. When a cache miss occurs, the new code is
fetched into the code memory from system memory (such as DRAM). The
new code replaces some of the existing and in most cases, the Least
Recently Used (LRU) code.
[0006] Caching portions of the application code is expensive
because special hardware is required for detecting cache misses,
for translating cache misses into correct system memory accesses,
and for deciding which code to replace.
[0007] Another possible solution would be to keep the processor
under reset during the time new code is loaded into the code
memory. However, resetting the processor erases all the information
stored in the general purpose registers within the processor.
Accordingly, a swap routine is used to copy the registers to the
DRAM prior to a reset. The foregoing is disadvantageous because the
swap routine resides in and consumes a significant amount of the
code memory. In addition to the code space, time is also spent for
swapping.
BRIEF SUMMARY OF THE INVENTION
[0008] The present invention is directed to dynamically booting
processor code memory using a special wait instruction. A wait
instruction precedes the transfer of a new code portion to the code
memory. The wait instruction causes the processor to temporarily
cease using the code memory. When the processor ceases using the
code memory, the processor signals a direct memory access (DMA)
module to transfer a new code portion to the code memory. The DMA
module transfers the new code portion to the code memory and
transmits a signal to the processor when the transfer is completed.
The signal causes the processor to resume. When the processor
resumes, the processor begins executing the instructions at the
next code address.
[0009] The present invention is also directed to a scheme for
executing a program wherein the processor executes a portion of the
program. When a portion of code that is not currently in the code
memory is required, the processor instructs the DMA to fetch the
necessary code from the system memory and then executes a wait
instruction. Execution of the wait instruction causes the processor
to cease execution of the program until the next portion is
retrieved and provided to the processor.
[0010] These and other advantages and novel features of the present
invention, as well as details of illustrated embodiments thereof,
will be more fully understood from the following description and
drawings.
BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS
[0011] FIG. 1 is a flow diagram for executing a program in
accordance with an embodiment of the present invention;
[0012] FIG. 2 is a block diagram of an exemplary circuit in
accordance with an embodiment of the present invention;
[0013] FIG. 3 is a block diagram of an exemplary processor in
accordance with an embodiment of the present invention;
[0014] FIG. 4 is a timing diagram describing the operation of the
processor in accordance with an embodiment of the present
invention;
[0015] FIG. 5 is a flow diagram describing the operation of the
processor in accordance with an embodiment of the present
invention; and
[0016] FIG. 6 is an MPEG encoder configured in accordance with an
embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0017] Referring now to FIG. 1, there is illustrated a flow diagram
for executing a program in accordance with an embodiment of the
present invention. The program is a sequence of instructions that
can be divided into two or more portions. Initially, the first
portion of the program is available for execution.
[0018] Execution of the program is commenced at 105 by reading
instructions from the first portion of the program until the next
portion of the program (not present in the code memory) is
required. When the next portion of the program to be executed is
not in the code memory, processor instructs the DMA to fetch that
portion and a WAIT instruction is executed at 115 which halts
reading of instructions in the program until the another portion of
the program is available for execution at 120. When the another
portion of the program is available for execution at 120, the
processor begins executing the another portion of the program by
repeating 105-120.
[0019] Referring now to FIG. 2, there is illustrated a block
diagram of an exemplary circuit for executing a program 203 in
accordance with an embodiment of the present invention. The circuit
comprises a processor 205 for instructions, a code memory 210 for
storing instructions, a direct memory access (DMA) module 215 for
loading the code memory 210 with instructions, and a system memory
220 for the program.
[0020] The processor 205 executes individual instructions stored in
the code memory 210. The program 203 comprises a stream of
instructions. As programs become increasingly complex, the number
of instructions increases. In many cases, the size of the program
203 exceeds the size of the code memory 210. Therefore, the program
203 is divided into two or more portions 203(1) . . . 203(n),
wherein each portion 203(1) . . . 203(n) can be stored in the code
memory 210. Accordingly, one portion of the program 203(1) . . .
203(n) can be stored in the code memory 210 for execution by the
processor 205. When an instruction of the program 203 to be
executed by the processor 205 is in another portion 203(1) . . .
203(n) from the portion stored in the code memory 210, the direct
memory access module 215 transfers the another portion from the
system memory 220.
[0021] The direct memory access module 215 can load the code memory
210 with the another portion 203(1) . . . 203(n), during a time
when the processor 205 is not reading from the code memory 210.
When the instruction of the program 203 to be executed by the
processor 205 is in another portion 203(1) . . . 203(n), the
processor 205 can execute a WAIT instruction which causes the
processor 205 to access instructions in the code memory 210 until
the direct memory access module 215 loads the code memory 210 with
the another portion 203(1) . . . 203(n). Before executing the WAIT
instruction, the processor executes a set of instructions that tell
the DMA module which code needs to be fetched from the DRAM. When
the direct memory access module 215 loads the code memory 210 with
the another portion 203(1) . . . 203(n), the processor 205 accesses
instructions in the another portion 203(1) . . . 203(n) of the
program.
[0022] When the processor 205 executes the wait instruction, the
processor 205 signals the direct memory access module 215 by
transmitting a "waiting" signal over a link WAIT connecting the
processor 205 to the direct memory access module 215. Responsive
thereto, the direct memory access module 215, the direct memory
access module begins loading the code memory 210 with the another
portion 203(1) . . . 203(n) of the program 203.
[0023] After loading the code memory 210 with the another portion
203(1) . . . 203(n), the direct memory access module 215 transmits
a code_download_done signal over a link, code_download_done,
connecting the direct memory access module 215 to the processor.
Upon receiving the code_download_done signal over the link,
code_download_done, the processor 205 resumes executing the
instructions in the code memory 210, now storing instructions from
the another portion 203(1) . . . 203(n).
[0024] Referring now to FIG. 3, there is illustrated a block
diagram of an exemplary processor 205 in accordance with an
embodiment of the present invention. The processor 205 comprises a
pipeline for executing instructions stored in the code memory 210.
The processor 205 executes a sequence of individual instructions
stored in the code memory 210. Execution of the instructions
typically involves multiple phases. For example, in a Reduced
Instruction Set Computing (RISC) architecture, execution of
instructions involves a fetch, decode, execution, memory access,
and register write phase, each consuming a separate clock
cycle.
[0025] Although each instruction can take as many as five clock
cycles to execute, many RISC processors execute close to one
instruction every clock cycle by using a pipeline architecture. The
pipeline typically comprises a fetch stage 310 for the fetch phase,
a decode stage 315 for the decode phase, an execution stage 320 for
execution phase, a memory access stage 325 for the memory access
phase, and a register write stage 330 for the register write phase.
Each of the foregoing can perform their associated function for an
instruction in one clock cycle.
[0026] By separating the stages, each stage can perform the
associated function for a different instruction, thus allowing the
fetch stage 310 to fetch instruction, n+4, while the decode stage
315 decodes instruction, n+3, the execution stage 320
executes/calculates an address for instruction n+2, the memory
access stage 325 access data memory for instruction n+1, and the
register write stage 330 writes to a register for instruction n. At
the next clock cycle, the fetch stage 310 can fetch instruction
n+5, while the decode stage 315 decodes instruction n+4, the
execution stage 320 operates on instruction n+3, the memory access
stage operates on instruction n+2, and the register write stage 330
operates on instruction n+1.
[0027] As noted above, one portion of a program 203(1) . . . 203(n)
can be stored in the code memory 210 for execution by the processor
205. When an instruction of the program 203 to be executed by the
processor 205 is in another portion 203(1) . . . 203(n) from the
portion stored in the code memory 210, the processor 205 can
program the DMA to get the required portion of the code from DRAM
and execute a WAIT instruction.
[0028] The WAIT instruction is fetched by the fetch stage 310, and
decoded by the decode stage 315. After the WAIT instruction is
decoded by the decode stage 315, the WAIT instruction is executed
by the execution stage 320. The execution stage 320 executes the
WAIT instruction by sending a signal to the fetch stage 310 via
connection 335 commanding the fetch stage 310 to halt fetching
instructions from the code memory 210 for the duration of the
signal.
[0029] After the execution stage 320 transmits the signal halting
the fetch stage 310, the execution stage 320 signals the direct
memory access module 215 by transmitting a waiting signal over a
link WAIT connecting the processor 205 to the direct memory access
module 215. Responsive thereto, the direct memory access module
begins loading the code memory 210 with the another portion 203(1)
. . . 203(n) of the program 203.
[0030] After loading the code memory 210 with the another portion
203(1) . . . 203(n), the direct memory access module 215 transmits
a code_download_done signal over a link, code_download_done, to the
execution stage 320. Upon receiving the code_download_done signal
over the link, code_download_done, the execution stage 320
deasserts the signal over connection 335. When the execution stage
320 deasserts the signal over connection 335, the fetch stage 310
resumes fetching instructions from the code memory 210.
[0031] Referring now to FIG. 4, there is illustrated a timing
diagram describing the operation of the processor 205 for an
exemplary stream of instructions. The exemplary stream of
instructions are as follows:
TABLE-US-00001 Address Instruction 0x0 WAIT 0x1 MOV 0x2 ADD
[0032] During clock cycle 0, the fetch stage 310 fetches the
instruction at address 0x0. At clock cycle 1, the fetch stage 310
passes the instruction at address 0x0 to the decode stage 315 and
fetches the instruction at address 0x1. During the clock cycle 1,
the decode stage 315 decodes the instruction received from the
fetch stage. In the present example, the instruction is WAIT.
[0033] During clock cycle 2, the fetch stage 310 fetches the
instruction at address 0x2, and passes the instruction at address
0x1 to the decode stage 315. The decode stage 315 passes the WAIT
instruction to the execution stage 320 and decodes the instruction
received from the fetch stage 310. In the present example, the
instruction is MOV. The execution stage 320 executes the WAIT
instruction by providing the halt signal to the fetch stage 310 via
connection 330 and the signal over the connection, WAIT, connecting
the processor 205 to the direct memory access module 215.
[0034] Responsive thereto, the direct memory access module begins
loading the code memory 210 with the another portion 203(1) . . .
203(n) of the program 203 during cycles 3-6. Additionally, at clock
cycle 3, the instructions already in the pipeline can continue to
progress. For example, the fetch stage 310 can provide the
instruction at address 0x2, ADD, to the decode stage 315 for
decoding. The decode stage 315 can latch the instruction stored
therein during clock cycle 2, MOV, for the execution stage 320 to
be executed after the WAIT instruction is executed.
[0035] At clock cycle 7, the code memory 210 is loaded with the
another portion 203(1) . . . 203(n) and the direct memory access
module 215 transmits a code_download_done signal over a link,
code_download_done, to the execution stage 320. Upon receiving the
code_download_done signal over the link, code_download_done, the
execution stage 320 deasserts the signals over connections WAIT,
and 335. At the next cycle, cycle 8, the fetch stage 310 resumes
fetching instructions from the code memory 210 at address 0x3. The
execution stage 320 executes the instructions that were in the
pipeline at the time the WAIT instruction was decoded, e.g., the
MOV and ADD instructions, during cycles 8 and 9. After the
execution stage 320 executes the instructions that were in the
pipeline at the time the WAIT instruction was decoded, the
execution stage 320 begins executing instructions from the another
portion 203(1) . . . 203(n) of the program 203.
[0036] Referring now to FIG. 5, there is illustrated a block
diagram for executing an instruction by the processor 205 in
accordance with an embodiment of the present invention. The
processor 505 fetches (505) and decodes (510) an instruction. If at
515, the instruction is not a WAIT instruction, the instruction is
executed and 505 is repeated.
[0037] If at 515, the instruction is a WAIT instruction, the
processor 205 halts fetching instructions (520). At 525, the
processor 205 signals the direct memory access module 215. The
processor 205 then waits until the direct memory access module 215
returns a signal to the processor 205 (525). While the processor
205 is waiting, the direct memory access module 215 can transfer
another portion of the program 203 to the code memory 210. When the
direct memory access module 215 returns the signal to the processor
205, the processor 205 resumes fetching instructions from the code
memory 210, repeating 505.
[0038] Referring now to FIG. 6, there is illustrated a block
diagram of a decoder configured in accordance with certain aspects
of the present invention. A processor, that may include a CPU 690,
reads the MPEG transport stream 230 into a transport stream buffer
632 within an SDRAM 630. The data is output from the transport
stream presentation buffer 632 and is then passed to a data
transport processor 635. The data transport processor then
demultiplexes the MPEG transport stream into it PES constituents
and passes the audio transport stream to an audio decoder 660 and
the video transport stream to a video transport processor 640 and
then to an MPEG video decoder 645 that decodes the video. The audio
data is sent to the output blocks and the video is sent to a
display engine 650. The display engine 650 is responsible for and
operable to scale the video picture, render the graphics, and
construct the complete display among other functions. Once the
display is ready to be presented, it is passed to a video encoder
655 where it is converted to analog video using an internal digital
to analog converter (DAC). The digital audio is converted to analog
in the audio digital to analog converter (DAC) 665.
[0039] In one embodiment of the invention, various ones of the
aforementioned modules, such as the processor 690, the video
transport processor 340, audio decoder 660, or MPEG video decoder
645 can comprise a processor configured such as processor 205.
[0040] One embodiment of the present invention may be implemented
as a board level product, as a single chip, application specific
integrated circuit (ASIC), or with varying levels integrated on a
single chip with other portions of the system as separate
components. The degree of integration of the monitoring system will
primarily be determined by speed and cost considerations. Because
of the sophisticated nature of modern processors, it is possible to
utilize a commercially available processor, which may be
implemented external to an ASIC implementation of the present
system. Alternatively, if the processor is available as an ASIC
core or logic block, then the commercially available processor can
be implemented as part of an ASIC device with various functions
implemented as firmware.
[0041] While the invention has been described with reference to
certain embodiments, it will be understood by those skilled in the
art that various changes may be made and equivalents may be
substituted without departing from the scope of the invention. In
addition, many modifications may be made to adapt particular
situation or material to the teachings of the invention without
departing from its scope. Therefore, it is intended that the
invention not be limited to the particular embodiment(s) disclosed,
but that the invention will include all embodiments falling within
the scope of the appended claims.
* * * * *