U.S. patent application number 10/273829 was filed with the patent office on 2003-06-05 for system and method for caching dram using an egress buffer.
This patent application is currently assigned to Sun Microsystems, Inc.. Invention is credited to Kohn, Leslie D., Wong, Michael K..
Application Number | 20030105907 10/273829 |
Document ID | / |
Family ID | 23354532 |
Filed Date | 2003-06-05 |
United States Patent
Application |
20030105907 |
Kind Code |
A1 |
Kohn, Leslie D. ; et
al. |
June 5, 2003 |
System and method for caching DRAM using an egress buffer
Abstract
A system and method includes a server that includes a processor
and a memory system coupled that are coupled to a bus system. A
network interface is coupled to the processor and an egress buffer
is coupled to the processor and the network interface by an egress
bus.
Inventors: |
Kohn, Leslie D.; (Fremont,
CA) ; Wong, Michael K.; (San Mateo, CA) |
Correspondence
Address: |
MARTINE & PENILLA, LLP
710 LAKEWAY DRIVE
SUITE 170
SUNNYVALE
CA
94085
US
|
Assignee: |
Sun Microsystems, Inc.
Santa Clara
CA
|
Family ID: |
23354532 |
Appl. No.: |
10/273829 |
Filed: |
October 17, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60345315 |
Oct 22, 2001 |
|
|
|
Current U.S.
Class: |
710/305 ;
710/54 |
Current CPC
Class: |
G06F 9/30007 20130101;
G09G 2352/00 20130101; G06F 1/3203 20130101; H04L 45/745 20130101;
G09G 2360/121 20130101; G06F 1/3275 20130101; H04L 49/9089
20130101; G06F 9/3879 20130101; G06F 12/0811 20130101; G06F 13/1689
20130101; G06F 3/1423 20130101; H04L 69/22 20130101; H04L 47/2441
20130101; G06F 1/3225 20130101; G09G 2370/022 20130101; H04L 9/40
20220501; Y02D 10/00 20180101; G06F 9/3851 20130101; H04L 47/10
20130101; G09G 2370/20 20130101; G06F 9/3891 20130101; G06F 9/30043
20130101; G06F 12/084 20130101; G06F 21/72 20130101; G11C 11/4074
20130101; H04L 49/9057 20130101; H04L 49/90 20130101; G06F 11/108
20130101; G06F 12/0813 20130101 |
Class at
Publication: |
710/305 ;
710/54 |
International
Class: |
G06F 013/14 |
Claims
What is claimed is:
1. A server comprising: a processor coupled to a bus system; a
memory system coupled to the bus system; a network interface
coupled to the processor; and an egress buffer coupled to the
processor and the network interface by an egress bus.
2. The server of claim 1, wherein the processor includes a
plurality of processors.
3. The server of claim 2, wherein the plurality of processors are
included on a first die.
4. The server of claim 2, wherein the plurality of processors are
included on a plurality of dies.
5. The server of claim 1, wherein the egress buffer includes high
speed random access memory.
6. The server of claim 1, wherein the egress buffer includes random
access memory that has an operating speed of about 400 MHz.
7. The server of claim 1, wherein the egress buffer and the egress
bus have a data throughput rate that is greater than or equal to
about twice the amount of a data stream to be served.
8. The server of claim 1, wherein the egress buffer includes a
double data rate buffer.
9. The server of claim 1, wherein the egress bus has a bandwidth
that is greater than or equal to about twice the amount of a data
stream to be served.
10. The server of claim 1, wherein the egress bus includes a 32-bit
data bus.
11. A method of serving data comprising: receiving a request for
data in a processor in a server; retrieving the requested data;
processing the retrieved data in the processor; storing the
processed data in an egress buffer that is coupled to the processor
and a network interface; and serving the stored data from the
egress buffer through the network interface.
12. The method of claim 11, wherein the egress buffer that is
coupled to the processor and the network interface by an egress
bus.
13. The method of claim 11, wherein the requested data includes a
data stream.
14. The method of claim 13, wherein the egress bus has a bandwidth
of about twice a bandwidth of the data stream.
15. The method of claim 13, wherein the egress bus includes a
32-bit data bus.
16. The method of claim 11, wherein the processed data is stored in
the egress buffer substantially simultaneously with the stored data
being served from the egress buffer.
17. The method of claim 11, wherein processing the retrieved data
in the processor includes at least one of a group consisting of
formatting the data, encrypting the data, and decrypting the
data.
18. A method of serving a data stream comprising: receiving a
request for a data stream in a processor in a server; retrieving
the requested data stream; processing the retrieved data stream in
the processor; storing the processed data stream in an egress
buffer that is coupled to the processor and a network interface by
an egress bus having a bandwidth that is greater than or equal to
about twice the data stream; and serving the stored data stream
from the egress buffer through the network interface.
19. The method of claim 18, wherein the data stream includes at
least one of a group consisting of audio and video.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority from U.S. Provisional
Patent Application No. 60/345,315 filed on Oct. 22, 2001 and
entitled "High Performance Web Server," which is incorporated
herein by reference in its entirety for all purposes.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates generally to microprocessors,
and more particularly, to methods and systems for microprocessors
to serve data from memory systems.
[0004] 2. Description of the Related Art
[0005] A typical server computer, such as a web server, has one
main memory. A server serves data from the memory to a client
computer that requested the data. FIG. 1 shows a typical web server
102 and client computer 110 that are linked by a network 104, such
as the Internet or other network. FIG. 2 is a high-level block
diagram of a typical web server 102. As shown, the web server 102
includes a processor 202, a memory system 203 that includes a ROM
204, a main memory DRAM 206 and a mass storage device 210, each
connected by a peripheral bus system 208. The peripheral bus system
208 may include one or more buses connected to each other through
various bridges, controllers and/or adapters, such as are well
known in the art. For example, the peripheral bus system 208 may
include a "system bus" that is connected through an adapter to one
or more expansion buses, such as a Peripheral Component
Interconnect (PCI) bus. Also coupled to the peripheral bus system
208 are a network interface 212, a number (N) of input/output (I/O)
devices 216-1 through 216-N and a peripheral cryptographic
processor 220. 141 I/O devices 216-1 through 216-N may include, for
example, a keyboard, a pointing device, a display device and/or
other conventional I/O devices. Mass storage device 210 may include
any suitable device for storing large volumes of data, such as a
magnetic disk or tape, magneto-optical (MO) storage device, or any
of various types of Digital Versatile Disk (DVD) or Compact Disk
(CD) based storage.
[0006] Network interface 212 provides data communication between
the computer system and other computer systems on the network 104.
Hence, network interface 212 may be any device suitable for or
enabling the web server 102 to communicate data with a remote
processing system (e.g., client computer 110) over a data
communication link, such as a conventional telephone modem, an
Integrated Services Digital Network (ISDN) adapter, a Digital
Subscriber Line (DSL) adapter, a cable modem, a satellite
transceiver, an Ethernet adapter, or the like. 161 The web server
102 typically processes large quantities of data, for example,
streaming data such as streaming video or streaming audio or other
types of data or serving a website and other web data. FIG. 3 is a
flowchart of the method operations 300 of the web server 102
serving a large volume of data such as a 10 MB data stream. In
operation 305, the web server 102 receives a request for the 10 MB
data stream from the client 110. If the 10 MB data stream required
processing such as being encrypted, then the 10 MB data stream must
first be retrieved from the DRAM 206 into the processor 202. In
operation 310, the data stream is retrieved from the DRAM 206
and/or other portions of the memory system 203.
[0007] In operation 315, the data stream is processed in the
processor 202. In operation 320, the processed data stream is
stored in the DRAM 206. In operation 325, the data stream is served
through the network interface 212 to the network 104 to the client
110.
[0008] The processed data stream must be stored in the memory
system 203 because the processor 202 and the network interface 212
typically have different data processing rates. By way of example,
the processor 202 can process data at a rate of about 2 GHz or even
greater. The peripheral bus system 208 typically operates at about
166 MHz, therefore the network interface 212 typically does not
operates as fast as 2 GHz and cannot serve the data as fast as the
processor can process the data. As a result the processed data must
be temporarily stored in the memory system 203 so that the network
interface 212 can serve the processed data at the optimal rate for
the network interface 212. Alternatively, the network interface 212
may be able to output data faster than the processor can process
the data, therefore, the processed data can be built up in the
memory system 203 and the network interface 203 can serve the data
from the memory system 203 at a high rate.
[0009] Now, as described in FIG. 3 above, the 10 MB data stream
must transfer across the peripheral bus system 208 between the DRAM
206 and the processor 202 three times. Therefore, a 10 MB data
stream being served results in a 30 MB data stream flowing between
the DRAM 206 and the processor 202. These multiple passes between
the DRAM 206 and the processor 202 consume large portion of the
total I/O bandwidth of the processor 202 I/O which can limit the
ability of the processor 202 to perform other operations besides
serving the 10 MB data stream.
[0010] What is needed is a system and method to reduce the
bandwidth usage of the processor to memory system interface.
SUMMARY OF THE INVENTION
[0011] Broadly speaking, the present invention fills these needs by
providing a system method for caching DRAM to reduce the bandwidth
usage of the processor to memory system interface. It should be
appreciated that the present invention can be implemented in
numerous ways, including as a process, an apparatus, a system,
computer readable media, or a device. Several inventive embodiments
of the present invention are described below.
[0012] One embodiment includes a server that includes a processor
and a memory system coupled that are coupled to a bus system. A
network interface is coupled to the processor and an egress buffer
is coupled to the processor and the network interface by an egress
bus.
[0013] The processor can also include multiple processors. The
multiple processors can be included on a first die or chip.
Alternatively, the multiple processors can be included on multiple
separate dies or chips.
[0014] The egress buffer can include a high-speed random access
memory. In one embodiment, the egress buffer includes random access
memory that has an operating speed of about 400 MHz.
[0015] The egress buffer and the egress bus can have a data
throughput rate that is greater than or equal to about twice the
amount of a data stream to be served.
[0016] The egress buffer can also include a double data rate
buffer.
[0017] The egress buffer can also include a double data rate
buffer.
[0018] The egress bus has a bandwidth that is greater than or equal
to about twice the amount of a data stream to be served. The egress
bus can also include a 32-bit data bus.
[0019] One embodiment includes a system and method of serving data
that includes receiving a request for data in a processor in a
server. The requested data is retrieved. The retrieved data is
processed in the processor. The processed data is stored in an
egress buffer that is coupled to the processor and a network
interface. The stored data is served from the egress buffer through
the network interface.
[0020] The egress buffer is coupled to the processor and the
network interface by an egress bus.
[0021] The requested data can include a data stream.
[0022] The egress bus has a bandwidth of about twice a bandwidth of
the data stream.
[0023] The egress bus can include a 32-bit data bus.
[0024] The processed data can be stored in the egress buffer
substantially simultaneously with the stored data being served from
the egress buffer.
[0025] Processing the retrieved data in the processor can also
include formatting the data, encrypting the data, and decrypting
the data among other processes.
[0026] Another embodiment includes a system and method of serving a
data stream that includes receiving a request for a data stream in
a processor in a server. The requested data stream is retrieved.
The retrieved data stream is processed in the processor. The
processed data stream is stored in an egress buffer that is coupled
to the processor and a network interface by an egress bus. The
egress bus has a bandwidth that is greater than or equal to about
twice the data stream. The stored data stream is served from the
egress buffer through the network interface. The data stream can
include audio or video or any other streaming media.
[0027] Other aspects and advantages of the invention will become
apparent from the following detailed description, taken in
conjunction with the accompanying drawings, illustrating by way of
example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] The present invention will be readily understood by the
following detailed description in conjunction with the accompanying
drawings, and like reference numerals designate like structural
elements.
[0029] FIG. 1 shows a typical web server and client computer that
are linked by a network, such as the Internet or other network.
[0030] FIG. 2 is a high-level block diagram of a typical web
server.
[0031] FIG. 3 is a flowchart of the method operations of the web
server serving a large volume of data such as a 10 MB data
stream.
[0032] FIG. 4 shows a block diagram of a server in accordance with
one embodiment of the present invention.
[0033] FIG. 5 is a flow chart of the method operations of serving
data using an egress buffer in accordance with one embodiment of
the present invention.
[0034] FIG. 6 shows a block diagram of a processor according to one
embodiment of the present invention.
DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS
[0035] Several exemplary embodiments for caching DRAM to reduce the
bandwidth usage of the processor to memory system interface will
now be described. It will be apparent to those skilled in the art
that the present invention may be practiced without some or all of
the specific details set forth herein.
[0036] One embodiment of the present invention includes an egress
buffer that can be used to temporarily store processed data from
the processor that will be served by the network interface. The
egress buffer thereby reduces the demand on the bandwidth usage of
the processor to memory system interface by about two-thirds.
[0037] FIG. 4 shows a block diagram of a server 400 in accordance
with one embodiment of the present invention. The server 400 can be
a web server or other type of server. The server 400 includes a bus
system 408 that couples a processor 402 and a memory system 404.
The processor 402 includes at least one processor core 402A. The
server 400 also includes an egress buffer 420 that is coupled to
the processor 402 and a network interface 412.
[0038] The egress buffer 420 is coupled to the processor 402 and
the network interface 412 via a dedicated egress bus 422. The
egress bus 422 can be as wide as necessary, for example, the egress
bus 422 can be 32-bits (i.e., lines) wide but the egress bus 422
could be narrower or wider such as 16-bits or 64-bits. The egress
buffer 420 can be large enough to buffer the desired data
throughput of the network interface 412 as will be described in
more detail below. Referring to the above example of a 10 gigabit
data throughput, the egress buffer 420 would need to be 32
megabytes or possibly larger.
[0039] In one embodiment, the egress buffer 420 includes a very
high-speed ram such as a fast cycle time RAM (FCRAM) that operates
as fast as about 400 MHz or more. The FCRAM allows the egress
buffer 420 to serve the data across the egress bus 422 to the
network interface 412 at the speed of the network interface
412.
[0040] In one embodiment, the server 400 can include multiple
processors on multiple processor chips or dies. The egress bus 422
can also couple a single egress buffer 420 to all of the multiple
processors. An egress bus controller can be included to manage the
data flow between the multiple processors and the egress buffer
420.
[0041] FIG. 5 is a flow chart of the method operations 500 of
serving data using an egress buffer in accordance with one
embodiment of the present invention. In operation 505, a request
for data is received in the server 400. The request can be from an
application within the server 400 or due to a request received from
an external data requester, such as a client computer 110 in FIG. 1
that is linked to the server 400 by a network.
[0042] The processor 402 retrieves the requested data, in operation
510. The data can be retrieved from numerous sources such as from
the memory system 404 or other sources via the system data bus 408.
In operation 515, the processor 402 processes the retrieved data
such as packetizing the data or performing some other formatting,
encryption, decryption, or other processing to the retrieved
data.
[0043] The processed data is stored in the egress buffer 420 via
the egress bus 422, in operation 520. In operation 525, the network
interface 412, 412' retrieves the processed data from the egress
buffer 420, via the egress bus 422 and serves the data to the data
requestor.
[0044] FIG. 6 shows a block diagram of a processor 402' according
to one embodiment of the present invention. The processor 402'
includes a processor core 402A' and an integrated network interface
412'. Because the integrated network interface 412' is included on
the processor die 402' with the processor core 402A', the network
interface 412' can output data faster than the network interface
412 described in FIG. 4 above.
[0045] In one embodiment, a dedicated bus 422A couples the
processor core 402A' to the egress bus 422, through a process data
switch 430. The process data switch 430 is also coupled to the
network interface 412' via a bus 422B. Alternatively, the network
interface 412' can be coupled to the egress buffer 420 by a
separate, dedicated bus. The process data switch 430 directs the
data from the processor core 402A' to the egress buffer 420 or the
memory system 404 and controls the data flow across the egress bus
422 so that the data flows either to the network interface 412' or
from the processor core 402A'.
[0046] In alternative embodiments, the egress bus 422 can also link
other components on the processor die to the egress buffer 420.
[0047] In one embodiment, the egress buffer 420 is an about 400
MHz, double data rate (DDR) buffer. When combined with a 32-bit
wide egress bus 422, a 400 MHz DDR buffer produces 800
MHz.times.32-bit wide egress bus 422 to produce 3.2 GB per second
throughput with a relatively small actual buffer of only two or
four bits per 32 bit lines of the egress bus 422. 3.2 GB per second
throughput of the egress bus 422 and egress buffer 420 equates to
slightly more than 24 gigabits per second. A 24 gigabit per second
egress buffer 420 can support two 10-gigabit-per-second data
streams: A first 10 gigabit data stream is input to the egress
buffer 420 while a second 10 gigabit data stream output from the
egress buffer 420 to the network interface 412, 412'. The speed of
the egress buffer 420 memory must be sufficient to support the
network interface 412, 412' data demand rate.
[0048] The egress buffer 420
[0049] Because the egress buffer 420 is coupled to the processor
core 402A' by the dedicated egress bus 422, the egress bus 422 can
deliver the data much quicker than a shared data bus such as the
I/O interface 432 between the memory system 404 and the processor
core 402A'. Further, because the egress buffer 420 uses much higher
speed type RAM (e.g., FCRAM), the egress buffer 420 can serve the
data faster than standard DRAM.
[0050] The egress buffer 420 can also substantially smooth out the
data interface between the data processing rate of the processor
core 402A and the rate the network interface 412 can serve the
data. Often the difference in processing rates (i.e., the transient
variation) can vary as the processor performs other operations or
the network is busy and reduces the rate the network interface 412
can serve the data. The amount of transient variation increases as
the size of the egress buffer 420 increases.
[0051] The egress buffer 420 FCRAM can operate in any range from
about 100 MHz or even slower to about 400 MHz or greater. The
higher speed of the egress buffer 420, the greater the efficiency
of the processor serving the data to the network interface.
Alternatively, lower speed egress buffer 420 FCRAM would also
increase the efficiency by reducing the demand across the system
bus 408 and specifically across the interface between the memory
system 404 and the processor 402.
[0052] The egress buffer 420 could be within a single die or chip
with the processor 402. However, typically the egress buffer 420
would not be part of the processor die because of the physical size
of the memory is relatively large as compared to the size of the
microprocessor devices in the processor 402 and therefore including
the egress buffer is not an efficient use of the space on processor
die.
[0053] The network interface 412, 412' can have any bandwidth such
as about a 4 gigabit per second or about a 10 gigabit per second.
The network interface 412, 412' has direct access to the egress
buffer 420 via the dedicated egress bus 422.
[0054] As used herein the term "about" means +/-10%. By way of
example, the phrase "about 250" indicates a range of between 225
and 275.
[0055] With the above embodiments in mind, it should be understood
that the invention might employ various computer-implemented
operations involving data stored in computer systems. These
operations are those requiring physical manipulation of physical
quantities. Usually, though not necessarily, these quantities take
the form of electrical or magnetic signals capable of being stored,
transferred, combined, compared, and otherwise manipulated.
Further, the manipulations performed are often referred to in
terms, such as producing, identifying, determining, or
comparing.
[0056] Any of the operations described herein that form part of the
invention are useful machine operations. The invention also relates
to a device or an apparatus for performing these operations. The
apparatus may be specially constructed for the required purposes,
or it may be a general-purpose computer selectively activated or
configured by a computer program stored in the computer. In
particular, various general-purpose machines may be used with
computer programs written in accordance with the teachings herein,
or it may be more convenient to construct a more specialized
apparatus to perform the required operations.
[0057] The invention can also be embodied as computer readable code
on a computer readable medium. The computer readable medium is any
data storage device that can store data, which can be thereafter be
read by a computer system. Examples of the computer readable medium
include hard drives, network attached storage (NAS), read-only
memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic
tapes, and other optical and non-optical data storage devices. The
computer readable medium can also be distributed over a network
coupled computer systems so that the computer readable code is
stored and executed in a distributed fashion.
[0058] It will be further appreciated that the instructions
represented by the operations in FIG. 5 are not required to be
performed in the order illustrated, and that all the processing
represented by the operations may not be necessary to practice the
invention. Further, the processes described in FIG. 5 can also be
implemented in software stored in any one of or combinations of the
RAM, the ROM, or the hard disk drive.
[0059] Although the foregoing invention has been described in some
detail for purposes of clarity of understanding, it will be
apparent that certain changes and modifications may be practiced
within the scope of the appended claims. Accordingly, the present
embodiments are to be considered as illustrative and not
restrictive, and the invention is not to be limited to the details
given herein, but may be modified within the scope and equivalents
of the appended claims.
* * * * *