U.S. patent application number 13/540406 was filed with the patent office on 2012-11-01 for antialiasing system and method.
This patent application is currently assigned to ATI Technologies, Inc.. Invention is credited to Raja KODURI, Andrew S. POMIANOWSKI, Arcot J. PREETHAM.
Application Number | 20120274655 13/540406 |
Document ID | / |
Family ID | 36928445 |
Filed Date | 2012-11-01 |
United States Patent
Application |
20120274655 |
Kind Code |
A1 |
PREETHAM; Arcot J. ; et
al. |
November 1, 2012 |
ANTIALIASING SYSTEM AND METHOD
Abstract
A system and method for improved antialiasing in video
processing is described herein. Embodiments include multiple video
processors (VPUs) in a system. Each VPU performs some combination
of pixel sampling and pixel center sampling (also referred to as
multisampling and supersampling). Each VPU performs sampling on the
same pixels or pixel centers, but each VPU creates samples
positioned differently from the other VPUs corresponding samples.
The VPUs each output frame data that has been multisampled and/or
supersampled into a compositor that composites the frame data to
produce an antialiased rendered frame. The antialiased rendered
frame has an effectively doubled antialiasing factor.
Inventors: |
PREETHAM; Arcot J.;
(Sunnyvale, CA) ; POMIANOWSKI; Andrew S.; (Palo
Alto, CA) ; KODURI; Raja; (Santa Clara, CA) |
Assignee: |
ATI Technologies, Inc.
Markham
CA
|
Family ID: |
36928445 |
Appl. No.: |
13/540406 |
Filed: |
July 2, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11140156 |
May 27, 2005 |
8212838 |
|
|
13540406 |
|
|
|
|
Current U.S.
Class: |
345/611 |
Current CPC
Class: |
G09G 5/363 20130101;
G09G 2320/02 20130101; G06T 5/002 20130101; G06T 5/009 20130101;
G09G 2320/0261 20130101; G06T 2200/12 20130101; G06T 2207/10016
20130101 |
Class at
Publication: |
345/611 |
International
Class: |
G09G 5/00 20060101
G09G005/00 |
Claims
1. A video processing apparatus, comprising: a plurality of video
processing units (VPUs), wherein each VPU processes data
corresponding to one or more video frames, including sampling
pixels of the one or more frames to generate a plurality of samples
such that each VPU generates different samples; and an interlink
module that receives the plurality of samples from each VPU and
combines the samples in an output video frame.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 11/140,156, filed May 27, 2005, now U.S. Pat.
No. 8,212,838, which is scheduled to issue on Jul. 3, 2012, which
is incorporated by reference as if fully set forth.
[0002] U.S. patent application Ser. No. 11/140,156 is related to
the following United States patent applications:
[0003] Multiple Video Processing Unit (VPU) Memory Mapping, U.S.
application Ser. No. 11/139,917, invented by Philip J. Rogers,
Jeffrey Gongxian Cheng, Dmitry Semiannokov, and Raja Koduri, filed
on May 27, 2005;
[0004] Applying Non-Homogeneous Properties to Multiple Video
Processing Units (VPUs), U.S. application Ser. No. 11/140,163,
invented by Timothy M. Kelley, Jonathan L. Campbell, and David A.
Gotwalt, filed on May 27, 2005;
[0005] Frame Synchronization in Multiple Video Processing Unit
(VPU) Systems, U.S. application Ser. No. 11/140,114, invented by
Raja Koduri, Timothy M. Kelley, and Dominik Behr, filed on May 27,
2005;
[0006] Synchronizing Multiple Cards in Multiple Video Processing
Unit (VPU) Systems, U.S. application Ser. No. 11/139,744, invented
by Syed Athar Hussain, James Hunkins, and Jacques Vallieres, filed
on May 27, 2005;
[0007] Compositing in Multiple Video Processing Unit (VPU) Systems,
U.S. application Ser. No. 11/140,165, invented by James Hunkins and
Raja Koduri, filed on May 27, 2005;
[0008] Dynamic Load Balancing in Multiple Video Processing Unit
(VPU) Systems, U.S. application Ser. No. 11/139,893, invented by
Jonathan L. Campbell and Maurice Ribble, filed on May 27, 2005;
and
[0009] Computing Device with Flexibly Configurable Expansion Slots,
and Method of Operation, U.S. application Ser. No. 11/140,040,
invented by Yaoqiang (George) Xie and Roumen Saltchev, filed May
27, 2005.
[0010] Each of the foregoing applications is incorporated herein by
reference in its entirety.
TECHNICAL FIELD
[0011] The invention is in the field of graphics and video
processing.
BACKGROUND
[0012] Graphics and video processing hardware and software continue
to become more capable, as well as more accessible, each year.
Graphics and video processing circuitry is typically present on an
add-on card in a computer system, but is also found on the
motherboard itself. The graphics processor is responsible for
creating the picture displayed by the monitor. In early text-based
personal computers (PCs) this was a relatively simple task.
However, the complexity of modern graphics-capable operating
systems has dramatically increased the amount of information to be
displayed. In fact, it is now impractical for the graphics
processing to be handled by the main processor, or central
processing unit (CPU) of a system. As a result, the display
activity has typically been handed off to increasingly intelligent
graphics cards which include specialized coprocessors referred to
as graphics processing units (GPUs) or video processing units
(VPUs).
[0013] In theory, very high quality complex video can be produced
by computer systems with known methods. However, as in most
computer systems, quality, speed and complexity are limited by
cost. For example, cost increases when memory requirements and
computational complexity increase. Some systems are created with
much higher than normal cost limits, such as display systems for
military flight simulators. These systems are often entire
one-of-a-kind computer systems produced in very low numbers.
However, producing high quality, complex video at acceptable speeds
can quickly become prohibitively expensive for even "high-end"
consumer-level systems. It is therefore an ongoing challenge to
create VPUs and VPU systems that are affordable for mass
production, but have ever-improved overall quality and
capability.
[0014] Another challenge is to create VPUs and VPU systems that can
deliver affordable, higher quality video, do not require excessive
memory, operate at expected speeds, and are seamlessly compatible
with existing computer systems.
[0015] There are various aspects of video processing that typically
require some trade-off between quality and performance to be made.
One example is correcting for aliasing, usually referred to as
anti-aliasing or "AA". Aliasing is a well known effect created by
the appearance in a displayed frame of artifacts of the rendering
process. Rendering is performed by the VPU, and involves drawing
the pixels to be displayed. Aliasing includes edge aliasing and
surface aliasing. Edge aliasing creates stair steps in an edge that
should look smooth. Surface aliasing includes flashing or "popping"
of very thin polygons, sometimes referred to as moire patterns.
Existing AA techniques for alleviating these effects include
multisampling and supersampling. Multisampling addresses edge
aliasing by creating multiple samples of pixels which are used to
generate intermediate points between pixels. The samples are
averaged to determine the displayed pixel color value. The
displayed edge in the multisampled image has a softened stair step
effect. Multisampling has no affect on surface aliasing.
[0016] Supersampling will address both edge aliasing and surface
aliasing. However, supersampling is computationally more expensive
than multisampling and is rarely performed in consumer systems.
Pixel centers, as opposed to pixels, carry texture information. In
supersampling, each pixel is rendered multiple times with different
pixel centers to yield multiple color values which are then
averaged to give a final pixel color. This gives the entire image a
softened effect.
[0017] One reason it is inefficient to do either multisampling or
supersampling in conventional systems is that the pixel data must
be run through the video processing pipeline in the VPU more than
once to create offset samples with respect to pixels or pixel
centers. This increases the number of computations, and increases
processing time.
INCORPORATION BY REFERENCE
[0018] All publications and patent applications mentioned in this
specification are herein incorporated by reference to the same
extent as if each individual publication or patent application was
specifically and individually indicated to be incorporated by
reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIG. 1 is a block diagram of a video processing system
according to an embodiment.
[0020] FIG. 2 is a diagram of video processing with anitialiasing
according to an embodiment.
[0021] FIGS. 3-8 are pixel diagrams that illustrate several modes
of antialiasing according to various embodiments.
[0022] FIGS. 9A and 9B show results for 6.times. MSAA and 12.times.
MSAA, respectively.
[0023] FIG. 10A shows a screen produced without SSAA.
[0024] FIG. 10B shows the same screen as FIG. 10A produced with
2.times. SSAA.
[0025] FIG. 11A shows a screen produced without SSAA.
[0026] FIG. 11B shows the same screen as FIG. 11A produced with
2.times. SSAA.
[0027] FIG. 12 is a block diagram of a video processing system
including antialiasing according to an embodiment.
[0028] FIG. 13 is a block diagram of various components of a video
processing system including antialiasing according to an
embodiment.
[0029] FIG. 14 is a more detailed block diagram of a video
processing system, which is a configuration similar to that of FIG.
13 according to an embodiment.
[0030] FIG. 15 is a diagram of a one-card video processing system
according to an embodiment.
[0031] FIG. 16 is a diagram of a one-card video processing system
according to an embodiment.
[0032] FIG. 17 is a diagram of a two-card video processing system
according to an embodiment.
[0033] FIG. 18 is a diagram of a two-card video processing system
according to an embodiment.
[0034] FIG. 19 is a block diagram of an interlink module (IM)
according to an embodiment.
DETAILED DESCRIPTION
[0035] A system and method for antialiasing (AA) that alleviates
both edge aliasing effects and surface aliasing effects is
described herein. Embodiments include applying a combination of
multisampling and supersampling techniques in a system with at
least one graphics processing unit (GPU) or video processing unit
(VPU). As used herein, GPU and VPU are interchangeable terms. In
one embodiment, the system is programmable such that sample
positions are programmably offset within a pixel from initial
positions by one or more VPUs. The initial positions are
determined, for example, by a common video driver of the system. In
one embodiment, each of the multiple VPUs processes the same video
frame in parallel and offsets samples within the same pixels to
different programmable positions in each VPU. Video frames
processed by each of the multiple VPUs are merged (or combined or
composited) to create a frame to be displayed. In the frame to be
displayed, the AA sampling factor is effectively multiplied by the
number of VPUs. For example, if each VPU performs 2.times.
sampling, the frame to be displayed includes 4.times. sampling. In
various embodiments, the driver is programmable to direct the VPUs
to perform multisampling by a selectable multiplying factor,
supersampling by a selectable multiplying factor, or a combination
of multisampling by a selectable multiplying factor and
supersampling by a selectable multiplying factor.
[0036] FIG. 1 is a block diagram of a video processing system 100
according to an embodiment. The system 100 includes an application
102. The application 102 is an end user application that requires
video processing capability, such as a video game application. The
application 102 communicates with application programming interface
(API) 104. Several APIs are available for use in the video
processing context. APIs were developed as intermediaries between
the application software, such as the application 102, and video
hardware on which the application runs. With new chipsets and even
entirely new hardware technologies appearing at an increasing rate,
it is difficult for applications developers to take into account,
and take advantage of, the latest hardware features. It is also
becoming impossible to write applications specifically for each
foreseeable set of hardware. APIs prevent applications from having
to be too hardware specific. The application can output graphics
data and commands to the API in a standardized format, rather than
directly to the hardware. Examples of available APIs include
DirectX (from Microsoft) and OpenGL (from Silicon Graphics).
[0037] The API 104 can be any one of the available APIs for running
video applications. The API 104 communicates with a driver 106. The
driver 106 is typically written by the manufacturer of the video
hardware, and translates the standard code received from the API
into a native format understood by the hardware. The driver allows
input, from, for example, an application, process or user, to
direct settings. Such settings, in embodiments described herein,
include settings for selecting the multisampling factors, the
supersampling factors, or combinations thereof. For example, a user
can select settings via a user interface (UI), including a UI
supplied to the user with video processing hardware and software as
described herein.
[0038] In one embodiment, the video hardware includes two video
processing units, VPU A 108 and VPU B 110. In other embodiments
there can be less than two to or more than two VPUs. In various
embodiments, VPU A 108 and VPU B 110 are identical. In various
other embodiments, VPU A 108 and VPU B 110 are not identical. The
various embodiments, which include different configurations of a
video processing system, will be described in greater detail
below.
[0039] The driver 106 issues commands to VPU A 108 and VPU B 110.
The commands issued to VPU A 108 and VPU B 110 at the same time are
for processing the same frame to be displayed. VPU A 108 and VPU B
110 each execute a series of commands for processing the frame,
including offsetting sample positions with respect to pixels and/or
pixel centers in a programmable manner from the sample positions as
received from the API. The driver 106 programmably instructs VPU A
108 and VPU B 110 to multisample and/or supersample pixels and/or
pixel centers by an antialiasing (AA) factor. In one embodiment,
VPU A and VPU B offset samples with respect to the same pixels
and/or pixel centers, but offset them to different sample
positions.
[0040] When either of VPU A 108 and VPU B 110 finishes executing
the commands for the frame, the frame data is sent to a compositor
114. The compositor 114 is optionally included in an interlink
module 112, as described more fully below. The frame data from each
of VPU A 108 and VPU B 110 is merged, or combined, or composited in
the compositor 114 to generate a frame to be rendered to a display
116. In the frame to be displayed, the AA sampling factor is
effectively multiplied by the number of VPUs. For example, if each
VPU performs 2.times. sampling, the frame to be displayed includes
4.times. sampling. In various embodiments, the driver 104 is
programmable to direct VPU A 108 and VPU B 110 to perform
multisampling by a selectable multiplying factor, supersampling by
a selectable multiplying factor, or a combination of multisampling
by a selectable multiplying factor and supersampling by a
selectable multiplying factor. As used herein, the terms combine,
merge, composite, mix, or interlink all refer to the same
capabilities of the IM 112 and compositor 114 as described
herein.
[0041] FIG. 2 is a diagram of video processing 200 with AA
according to an embodiment. As previously described with reference
to FIG. 1, VPU A 208 and VPU B 210 each process video data
according to instructions from a programmable driver (not shown).
An illustration of a sampling pattern 213 output from VPU A 208 is
shown. The sampling pattern 213 is a 12.times.12 grid that
demonstrate 2.times. sampling. For each pixel, 2 pixel samples are
placed in the 12.times.12 grid. The 12.times.12 dimension is for
example purposes only, and any other workable dimension is
contemplated. In the example shown, the darkened square is a pixel
center and the ".times."es are pixel samples. The pixel samples are
offset from an initial default location specified by the API (not
shown). The offset locations are programmable in the driver and are
specified in commands from the driver to the VPU A 208.
[0042] Throughout the description, for convenience, the sample
pattern output by a VPU will also be referred to as being the
output of the VPU. For example, sample pattern 213 is also referred
to as output 213 of VPU A 208. Persons of ordinary skill in the art
will understand and appreciate that the sample pattern output by a
VPU (or as referred to herein as the output of the VPU) is, in most
embodiments, not output to the display. Rather, the sample pattern
output by the VPU (or portion thereof) is used to generate a frame,
or portion thereof, that is ultimately output to a display, such as
a LCD, flat panel, CRT or the like. That is, the output sample
pattern is in the present and most embodiments used as an input to
a further portion of the VPU to generate the frame (or portion
thereof) output to a display.
[0043] The samples are averaged by the VPU A 208 in linear space in
a known manner. However, the pixel data is typically in gamma
space, and so must be converted to linear space in a degamma
operation prior to averaging. The VPU A 208 performs the degamma
operation, performs the averaging operation, and then performs a
gamma operation so that the output of the VPU is in gamma space.
This is conventionally done because of quality improvement in the
displayed image. So to restate, in conventional systems, the output
of the VPU is automatically in gamma space. However, in various
embodiments herein, it is desirable to have the output in linear
space for the combining or compositing operation as described
below. Accordingly, the VPU A 208 performs an additional degamma
operation to convert the output 213 to linear space. In one
embodiment, the texture unit in the video pipeline of the VPU A 208
is used to perform the degamma operation. In other embodiments,
this degamma operation can be performed external to the VPU, for
example in the compositor 212.
[0044] As an example of gamma correction, U.S. Pat. No. 5,398,076,
entitled "Gamma Correcting Processing of Video Signals" (assigned
to ATI Technologies, Inc.) describes a method of processing video
signals including gamma correction of pixel data. In addition, a
gamma correction circuit is described in U.S. Pat. No. 6,020,921,
entitled "Simple Gamma Correction Circuit for Multimedia" (assigned
to ATI Technologies, Inc.). In one embodiment, gamma correction is
performed according to the function:
If (X<=0.00304)
Y=12.92*X;
Else
Y=1.055*pow(x, 1.0/2.4)-0.055
[0045] In one embodiment, a degamma operation is performed by
according to the function
If (X<=0.03928)
Y=X/12.92
Else
Y=pow(X+0.055)/1.055, 2.4)
[0046] In one embodiment, the algorithm performed by the compositor
212 can also be stated as follows: flatten each of the three colors
of each pixel on both input streams (from VPU A 208 and VPU B 210);
add each individual color between VPU A 208 and VPU B 210; divide
by 2 and pass to the next step (for example,
slave_green+master_green)/2.fwdarw.pre-output green); and convert
the pre-output pixel back into gamma corrected color values. In one
embodiment, a gamma correction lookup table is used.
[0047] Similarly, an output 215 from VPU B 210 is shown. The output
215 is a 12.times.12 grid that demonstrates 2.times. sampling. For
each pixel, 2 pixel samples are placed in the 12.times.12 grid. The
12.times.12 dimension is for example purposes only, and any other
workable dimension is contemplated. In the example shown, the
darkened square is a pixel center and the ".times."es are pixel
samples. The pixel samples are offset from an initial default
location specified by the API (not shown). The offset locations are
programmable in the driver and are specified in commands from the
driver to the VPU B 210.
[0048] The samples are averaged by the VPU B 210 in linear space in
a known manner. However, the pixel data is typically in gamma
space, and so must be converted to linear space in a degamma
operation prior to averaging. The VPU B 210 performs the degamma
operation, performs the averaging operation, and then performs a
gamma operation so that the output of the VPU is in gamma space.
This is conventionally done because of quality improvement in the
displayed image. So to restate, in conventional systems, the output
of the VPU is automatically in gamma space. However, in various
embodiments herein, it is desirable to have the output in linear
space for the combining or compositing operation as described
below. Accordingly, the VPU B 210 performs an additional degamma
operation to convert the output 215 to linear space. In one
embodiment, the texture unit in the video pipeline of the VPU B 210
is used to perform the degamma operation.
[0049] The linear outputs 213 and 215 are combined in a compositor
214. The compositor 214 is optionally included in an interlink
module 212, as described more fully below. The frame data from each
of VPU A 208 and VPU B 210 is merged, or combined, or composited in
the compositor 214 to generate a frame to be rendered to a display
(not shown). The compositing operation is in linear space. The
compositor 214 completes the compositing operation and performs a
gamma operation on the result to produce gamma corrected frame data
to be displayed. Output 217 includes gamma corrected pixel data and
shows how the outputs 213 and 215 have been combined. Each of
outputs 213 and 215 are 2.times. multisampled, and the output 217
is 4.times. multisampled. Accordingly, a much improved
multisampling result is achieved with one pass through the video
pipeline as illustrated in video processing embodiment 200. As
described below with reference to FIGS. 3-8, other antialiasing
modes are programmably selectable to include various combinations
of multisampling and supersampling (sampling pixel centers).
[0050] Referring to FIGS. 3-8, several modes of antialiasing
according to the embodiment described are illustrated. In each of
FIGS. 3-8, the pixels for VPU A are represented as stars, the pixel
centers for VPU A are represented as a blacked-in grid block, the
pixels for VPU B are represented as striped grid blocks, and the
pixel centers for VPU B are represented as concentric circles. FIG.
3 is a diagram that shows the mode previously described with
reference to FIG. 2. This mode is referred to as 4.times. MSAA with
1.times. SSAA, or 4.times. multisampling AA with 1.times.
supersampling AA (where "4" will be referred to as the MS factor,
and "1" will be referred to as the SS factor). Each of VPU A and
VPU B sample the pixels as shown in 313 and 315, respectively.
After 313 and 315 are combined or composited, the output to be
displayed is 317, as shown.
[0051] FIG. 4 is a diagram that shows an 8.times. MSAA with
1.times. SSAA mode, or 8.times. multisampling AA with 1.times.
supersampling AA. Each of VPU A and VPU B sample the pixels as
shown in 413 and 415, respectively. After 413 and 415 are combined
or composited, the output to be displayed is 417, as shown.
[0052] FIG. 5 is a diagram that shows a 12.times. MSAA with
1.times. SSAA mode, or 12.times. multisampling AA with 1.times.
supersampling AA. Each of VPU A and VPU B sample the pixels as
shown in 513 and 515, respectively. After 513 and 515 are combined
or composited, the output to be displayed is 517, as shown.
[0053] FIG. 6 is a diagram that shows a 4.times. MSAA with 2.times.
SSAA mode, or 4.times. multisampling AA with 2.times. supersampling
AA. Each of VPU A and VPU B sample the pixels as shown in 613 and
615, respectively. In this mode, the samples are offset by each of
VPU A and VPU B differently. After 613 and 615 are combined or
composited, the output to be displayed is 617, as shown.
[0054] FIG. 7 is a diagram that shows an 8.times.MSAA with
2.times.SSAA mode, or 8.times. multisampling AA with 2.times.
supersampling AA. Each of VPU A and VPU B sample the pixels as
shown in 713 and 715, respectively. In this mode, the pixel samples
are offset by each of VPU A and VPU B differently. After 713 and
715 are combined or composited, the output to be displayed is 717,
as shown.
[0055] FIG. 8 is a diagram that shows a 12.times. MSAA with
2.times. SSAA mode, or 12.times. multisampling AA with 2.times.
supersampling AA. Each of VPU A and VPU B sample the pixels as
shown in 813 and 815, respectively. In this mode, the pixel samples
are offset by each of VPU A and VPU B differently. After 813 and
815 are combined or composited, the output to be displayed is 817,
as shown.
[0056] FIG. 3-8 are given as examples of AA modes that can be
configured. Any other combinations are also contemplated. For
examples different combination of MS factors and SS factors, or SS
alone without MS are all possible. AA factors and MS factors not
explicitly shown are all contemplated.
[0057] FIGS. 9 and 11 are diagrams of AA results. FIGS. 9A and 9B
show results for 6.times. MSAA and 12.times. MSAA, respectively. As
can be seen, the edge aliasing effect is reduced when the MS factor
increased.
[0058] FIGS. 10A and 10B are diagrams that illustrate the
improvement in quality due to reduction of surface aliasing
resulting from SS. FIG. 10A shows a screen produced without SSAA.
FIG. 10B shows the same screen produced with 2.times. SSAA. The
moire effect is significantly reduced in FIG. 10B.
[0059] FIGS. 11A and 11B are diagrams are close up views of the
screens of FIGS. 10A and 10B, respectively. FIG. 11A shows a screen
produced without SSAA. FIG. 11B shows the same screen produced with
2.times. SSAA.
[0060] The antialiasing methods and apparatus described are also
applicable to other types of sampling not specifically described,
including subsampling and oversampling. The methods and apparatus
described are also applicable to temporal antialiasing. For
example, in one embodiment, each of multiple VPUs can process a
different frame in time. The frames are then composited as
described herein.
[0061] Various other embodiments also include each of multiple VPUs
rendering a same frame in a different manner. For example, one VPU
performs multisampling by one factor and another VPU performs
sampling by another factor. Similarly, one VPU can perform
multisampling on a frame and another VPU can perform supersampling
on a frame. The frames generated by each VPU are composited as
described herein. In yet other embodiments, one VPU can perform
sampling by one sampling factor (where sampling may be any type of
sampling) while another VPU performs sampling by another factor.
The frames generated by each VPU are composited as described
herein. The sampling factor for each VPU is configurable. In one
embodiment, the sampling behavior of each VPU is configurable by
the user through a UI. In one embodiment, the efficiency of the
sampling configuration used may form the basis for configuration by
the user through a UI, or for automatic configuration.
Alternatively, the performance of relative VPUs may form the basis
for configuration by the user through a UI, or for automatic
configuration.
[0062] Various systems that can embody the antialiasing methods
described herein will now be described.
[0063] FIG. 12 is a block diagram of a system 1200 according to an
embodiment. The system 1200 includes components or elements that
may reside on various components of a video-capable computer
system. In one embodiment, an application 1202, a driver 1204, and
a shared memory 1205 reside on a host computer system, while
remaining components reside on video-specific components, including
one or more video cards, but the invention is not so limited. Any
of the components shown could reside anywhere, or alternatively,
various components could access other components remotely via a
network. The application 1202 is an end user application that
requires video processing capability, such as a video game
application. The application 1202 communicates with application
programming interface (API) 1204. The API 1204 can be any one of
the available graphics, or video, or 3D APIs including DirectX
(from Microsoft) and OpenGL (from Silicon Graphics).
[0064] The API 1204 communicates with a driver 1206. The driver
1206 is written specifically for the system 1200, and translates
the standard code received from the API 1204 into a native format
understood by the VPU components, which will be explained more
fully below.
[0065] In one embodiment, the system 1200 further includes two
VPUs, VPU A 1208 and VPU B 1210. The invention is not limited to
two VPUs. Aspects of the invention as described herein would be
workable with one VPU with modifications available to one of
ordinary skill in the art. However, the system would be less
efficient with one VPU than with more than one VPU. Various
embodiments also include more than two VPUs. Systems with more than
two are workable with modifications available to one of ordinary
skill in the art, and would provide better efficiency in at least
some respects than a system with two VPUs. In various embodiments
VPU A 1208 and VPU B 1210 can be video cards that each includes a
video processor and other associated hardware. As will be explained
further below, the invention is not so limited. For example, more
than one VPU can be resident on one card or board. However, as
referred to herein a VPU is intended to include at least a video
processor.
[0066] VPU A 1208 and VPU B 1210 receive commands and data from the
driver 1206 through respective ring buffers A 1222, and B 1224. The
commands instruct VPU A 1208 and VPU B 1210 to perform a variety of
operations on the data in order to ultimately produce a rendered
frame for a display 1230.
[0067] The driver 1206 has access to a shared memory 1205. In one
embodiment, the shared memory 1205, or system memory 1205, is
memory on a computer system that is accessible to other components
on the computer system bus, but the invention is not so
limited.
[0068] In one embodiment, the shared memory 1205, VPU A 1208 and
VPU B 1210 all have access to a shared communication bus 1234, and
therefore to other components on the bus 1234. In one embodiment,
the shared communication bus 1234 is a peripheral component
interface express (PCIE) bus, but the invention is not so
limited.
[0069] The PCIE bus is specifically described in the following
documents, which are incorporated by reference herein in their
entirety:
[0070] PCI Express.TM., Base Specification, Revision 1.1, Mar. 28,
2005;
[0071] PCI Express.TM., Card Electromechanical Specification,
Revision 1.1, Mar. 28, 2005;
[0072] PCI Express.TM., Base Specification, Revision 1.a, Apr. 15,
2003; and
[0073] PCI Express.TM., Card Electromechanical Specification,
Revision 1.0a, Apr. 15, 2003.
[0074] The Copyright for all of the foregoing documents is owned by
PCI-SIG.
[0075] In one embodiment, VPU A 1208 and VPU B 1210 communicate
directly with each other using a peer-to-peer protocol over the bus
1234, but the invention is not so limited. In other embodiments,
there may be a direct dedicated communication mechanism between VPU
A 1208 and VPU B 1210.
[0076] VPU A 1208 and VPU B 1210 each have a local video memory
1226 and 1228, respectively, available. In various embodiments, one
of the VPUs functions as a master VPU and the other VPU functions
as a slave VPU, but the invention is not so limited. In other
embodiments, the multiple VPUs could be peers under central control
of another component. In one embodiment, VPU A 1208 acts as a
master VPU and VPU B 1210 acts as a slave VPU.
[0077] In one such embodiment, various coordinating and combining
functions are performed by an interlink module (IM) 1212 that is
resident on a same card as VPU A 1208. This is shown as IM 1212
enclosed with a solid line. In such an embodiment, VPU A 1208 and
VPU B 1210 communicate with each other via the bus 1234 for
transferring inter-VPU communications (e.g., command and control)
and data. For example, when VPU B 1210 transfers an output frame to
IM 1212 on VPU A 1208 for compositing (as shown in FIGS. 1 and 2),
the frame is transferred via the bus 1234.
[0078] In various other embodiments, the IM 1212 is not resident on
a VPU card, but is an independent component with which both VPU A
1208 and VPU B 1210 communicate. One such embodiment includes the
IM 1212 in a "dongle" that is easily connected to VPU A 1208 and
VPU B 1210. This is indicated in the figure by the IM 1212 enclosed
by the dashed line. In such an embodiment, VPU A 1208 and VPU B
1210 perform at least some communication through an IM connection
1232. For example, VPU A 1208 and VPU B 1210 can communicate
command and control information using the bus 1234 and data, such
as frame data, via the IM connection 1232.
[0079] There are many configurations of the system 1200
contemplated as different embodiments of the invention. FIGS. 13-17
as described below illustrate just some of these embodiments.
[0080] FIG. 13 is a block diagram of various components of a system
1300 according to an embodiment. The system 1300 includes a master
VPU card 1352 and a slave VPU card 1354. The master VPU card 1352
includes a master VPU 1308, and the slave VPU card 1354 includes a
slave VPU B 1310. In one embodiment, VPUs 1308 and 1310 each
communicate via a PICE bus 1334. In one embodiment, the PCIE bus
1334 is a X16 bus that is split into two X8 PCIE buses 1335. Each
of the VPUs A 1308 and B 1310 is connected to a bus 1335. In one
embodiment, VPU A 1308 and VPU B 1310 communicate only through the
bus 1335. In alternative embodiments, VPU A 1308 and VPU B 1310
communicate partially through bus 1335 and partially through
dedicated intercard connection 1337. In yet other embodiments, VPU
A 1308 and VPU B 1310 communicate exclusively through the
connection 1337.
[0081] The master VPU card 1352 includes an IM 1312. In an
embodiment in which VPU A 1308 and VPU B 1310 communicate via the
bus 1335, each VPU processes a frame, including sampling as
explained with reference to FIGS. 1 and 2. As an example in FIG.
13, 4.times. MSAA is shown being performed by the system 1300.
Master VPU A 1308 generates an output 1309 and slave VPU B 1310
generates an output 1311. The outputs 1309 and 1311 are input to
the IM 1312 for combining as previously described. In one
embodiment, the slave VPU B 1310 transfers it output 1311 to the IM
1312 via the buses 1335 and 1334 as shown by the dotted path 1363.
In one embodiment, the slave VPU B 1310 transfers it output 1311 to
the IM 1312 via the dedicated intercard connection 1337 as shown by
the dotted path 1361. The IM 1312 combines the outputs 1309 and
1311 as previously described to produce a frame for display that
includes 4.times. MSAA. This frame is output to a display 1330 by
the IM 1312 via a connector 1341.
[0082] The master VPU card 1352 includes connectors 1340 and 1341.
The slave VPU card 1354 includes connectors 1342 and 1343.
Connectors 1340, 1341, 1342 and 1343 are connectors appropriate for
the purpose of transmitting the required signals as known in the
art. For example, the connector 1341 is a "digital video in" (DVI)
connector in one embodiment. There could be more or less than the
number of connectors shown in the FIG. 1300.
[0083] In one embodiment, the various embodiments described herein
are configurable by a user to employ any number of available VPUs
for video processing. For example, the system 1300 includes two
VPUs, but the user could choose to use only one VPU in a
pass-through mode. In such a configuration, one of the VPUs would
be active and one would not. In such a configuration, the
antialiasing as described herein would not be available. However,
the enabled VPU could perform conventional antialiasing. The dotted
path 1365 from VPU card B 1354 to the display 1330 indicates that
slave VPU B 1310 can be used alone for video processing in a
pass-through mode. Similarly, the master VPU A 1308 can be used
alone for video processing in a pass-through mode.
[0084] FIG. 14 is a more detailed block diagram of a system 1400,
which is a configuration similar to that of FIG. 13 according to an
embodiment. The system 1400 includes two VPU cards, a master VPU
card 1452 and a slave VPU card 1454. The master VPU card 1452
includes a master VPU A 1408, and the slave VPU card 1454 includes
a slave VPU B 1410.
[0085] The master VPU card 1452 also includes a receiver 1448 and a
transmitter 1450 for receiving and transmitting, in one embodiment,
TDMS signals. A dual connector 1445 is a DMS connector in an
embodiment. The master card further includes a DVI connector 1446
for outputting digital video signals, including frame data, to a
display. The master VPU card 1452 further includes a video digital
to analog converter (DAC). An interlink module (IM) 1412 is
connected between the VPU A 1408 and the receivers and transmitters
as shown. The VPU A 1408 includes an integrated transceiver
(labeled "integrated") and a digital video out (DVO) connector.
[0086] The slave VPU card 1454 includes two DVI connectors 1447 and
1448. The slave VPU card 1454 includes a DVO connector and an
integrated transceiver. As an alternative embodiment to
communication over a PCIE bus (not shown), the master VPU card 1408
and the slave VPU card 1410 communicate via a dedicated intercard
connection 1437.
[0087] FIGS. 15-17 are diagrams of further embodiments of system
configurations. FIG. 15 is a diagram of a one-card system 1500
according to an embodiment. The system 1500 includes a "supercard"
or "monstercard" 1558 that includes more than one VPU. In one
embodiment, the supercard 1558 includes two VPUs, a master VPU A
1508 and a slave VPU B 1510. The supercard 1558 further includes an
IM 1512 that includes a compositor for combining or compositing
data from both VPUs as previously described. It is also possible,
in other embodiments, to have a dedicated on-card inter-VPU
connection for inter-VPU communication (not shown). In one
embodiment, the master VPU A 1508 and the slave VPU B 1510 are each
connected to an X8 PCIE bus 1535 which comes from a X16 PCIE bus
1534.
[0088] The system 1500 includes all of the multiple VPU (also
referred to as multiVPU) functionality previously described,
including the antialiasing capabilities described. For example, the
master VPU A 1508 processes and outputs a sampled frame 1509 to the
IM 1512. The slave VPU B 1510 processes and outputs a sampled frame
1511, which is transferred to the IM 1512 for combining or
compositing. The transfer is performed via the PCIE bus 1534 or via
a dedicated inter-VPU connection (not shown), as previously
described with reference to FIG. 1300. In either case, the
composited frame is output from the IM 1512 to a display 1530.
[0089] It is also possible to disable the multiVPU capabilities and
use one of the VPUs in a pass-through mode to perform video
processing alone. This is shown for example by the dashed path 1565
which illustrates the slave VPU B 1510 connected to a display 1530
to output frame data for display. The master VPU A 1508 can also
operate alone in pass-through mode by outputting frame data on path
1566.
[0090] FIG. 16 is a diagram of a one-card system 1600 according to
an embodiment. The system 1600 includes a "supercard" or
"monstercard" 1656 that includes more than one VPU. In one
embodiment, the supercard 1656 includes two VPUs, a master VPU A
1608 and a slave VPU B 1610. The supercard 1656 further includes an
IM 1612 that includes a compositor for combining or compositing
data from both VPUs as previously described. It is also possible,
in other embodiments, to have a dedicated on-card inter-VPU
connection for inter-VPU communication (not shown). In one
embodiment, the master VPU A 1608 and the slave VPU B 1610 are each
connected to a X16 PCIE bus 1634 through an on-card bridge
1681.
[0091] The system 1600 includes all of the multiVPU functionality
previously described, including the antialiasing capabilities
described. For example, the master VPU A 1608 processes and outputs
a sampled frame 1609 to the IM 1612. The slave VPU B 1610 processes
and outputs a sampled frame 1611, which is transferred to the IM
1612 for combining or compositing. The transfer is performed via
the PCIE bus 1634 or via a dedicated inter-VPU connection (not
shown), as previously described with reference to FIG. 1300. In
either case, the composited frame is output from the IM 1612 to a
display (not shown).
[0092] It is also possible to disable the multiVPU capabilities and
use one of the VPUs in a pass-through mode to perform video
processing alone. This is shown for example by the dashed path 1665
which illustrates the slave VPU B 1610 connected to an output for
transferring a frame for display. The master VPU A 1608 can also
operate alone in pass-through mode by outputting frame data on path
1666.
[0093] FIG. 17 is a diagram of a two-card system 1700 according to
an embodiment. The system 1700 includes two peer VPU cards 1760 and
1762. VPU card 1760 includes a VPU A 1708, and VPU card 1762
includes a VPU B 1710. In one embodiment, VPU A 1708 and VPU 1710
are identical. In other embodiments VPU A 1708 and VPU B 1710 are
not identical. VPU A 1708 and VPU 1710 are each connected to an X8
PCIE bus 1735 that is split from an X16 PCIE bus 1734. VPU A 1708
and VPU 1710 are further each connected to output data through a
card connector to an interlink module (IM) 1712. In one embodiment,
the IM 1712 is an integrated circuit in a "dongle" that is easily
connectable to VPU card 1760 and VPU card 1762. In one embodiment,
the IM 1712 is an integrated circuit specifically designed to
include all of the compositing functionality previously described.
The IM 1712 merges or composites the frame data output by VPU A
1708 and VPU 1710 and outputs a displayable composited frame to a
display 1730.
[0094] FIG. 18 is a diagram of a two-card system 1800 according to
an embodiment. The system 1800 is similar the system 1700, but is
configured to operate in a by-pass mode. The system 1800 includes
two peer VPU cards 1860 and 1862. VPU card 1860 includes a VPU A
1808, and VPU card 1862 includes a VPU B 1810. In one embodiment,
VPU A 1808 and VPU 1810 are identical. In other embodiments VPU A
1808 and VPU B 1810 are not identical. VPU A 1808 and VPU B 1810
are each connected to an X8 PCIE bus 1835 that is split from a X16
PCIE bus 1834. VPU A 1808 and VPU 1810 are further each connected
through a card connector to output data to an interlink module (IM)
1812. In one embodiment, the IM 1812 is an integrated circuit in a
"dongle" that is easily connectable to VPU card 1860 and VPU card
1862. In one embodiment, the IM 1812 is an integrated circuit
specifically designed to include all of the compositing
functionality previously described. The IM 1812 is further
configurable to operate in a pass-through mode in which one of the
VPUs operates alone and the other VPU is not enabled. In such a
configuration, the antialiasing as described herein would not be
available. However, the enabled VPU could perform conventional
antialiasing. In FIG. 18, VPU A 1808 is enabled and VPU B 1810 is
disabled, but either VPU can operate in by-pass mode to output to a
display 1830.
[0095] The configurations as shown herein, for example in FIGS.
13-18, are intended as non-limiting examples of possible
embodiments. Other configurations are within the scope of the
invention as defined by the claims. For example, other embodiments
include a first VPU installed on or incorporated in a computing
device, such as a personal computer (PC), a notebook computer, a
personal digital assistant (PDA), a TV, a game console, a handheld
device, etc. The first VPU can be an integrated VPU (also known as
an integrated graphics processor, or IGP), or a non-integrated VPU.
A second VPU is installed in or incorporated in a docking station
or external enclosed unit. The second VPU can be an integrated VPU
or a non-integrated VPU.
[0096] In one embodiment, the docking station is dedicated to
supporting the second VPU. The second VPU and the first VPU
communicate as described herein to cooperatively perform video
processing and produce an output as described. However, in such an
embodiment, the second VPU and the first VPU communicate via a
cable or cables, or another mechanism that is easy to attach and
detach. Such an embodiment is especially useful for allowing
computing devices which may be physically small and have limited
video processing capability to significantly enhance that
capability through cooperating with another VPU.
[0097] It will be appreciated by those of ordinary skill in the art
that further alternative embodiments could include multiple VPUs on
a single die (e.g., two VPUs on a single die) or multiple cores on
a single silicon chip.
[0098] FIG. 19 is a block diagram of an interlink module (IM) 1912
according to an embodiment. All rendering commands are fetched by
each VPU in the system. In any one of the multiVPU configurations
described herein, after the VPUs execute the fetched commands, the
IM 1912 merges the streams of pixels and control lines from the
multiple VPUs and outputs a single digital video output (DVO)
stream.
[0099] The IM 1912 includes a master input port that receives a DVO
stream from a master VPU. The master VPU input can be from a TDMS
receiver in a "dongle" configuration such as those shown in FIGS.
17 and 18. The master VPU input can alternatively come from a
master VPU on a master VPU card in a multi-card configuration, as
shown for example in FIGS. 13 and 14. A synchronization register
1902 receives the DVO data from the master VPU.
[0100] The IM 1912 further includes a slave input port that
receives a DVO stream from a slave VPU. The slave VPU input can be
from a TDMS receiver in a "dongle" configuration such as those
shown in FIGS. 17 and 18 or a card configuration as in FIGS. 13 and
14. The slave VPU input can alternatively come from a slave VPU on
a "super" VPU card configuration, as shown for example in FIGS. 15
and 16. The IM 1912 includes FIFOs 1904 on the slave port to help
synchronize the input streams between the master VPU and the slave
VPU.
[0101] The input data from both the master VPU and the slave VPU
are transferred to an extended modes mixer 1914 and to a
multiplexer (MUX) 1916. In one embodiment, the extended modes mixer
provides the compositing functionality to perform antialiasing
according to the embodiments described herein. The antialiasing
functionality as described herein is also referred to as "superAA".
The IM 1912 is configurable to operate in multiple compositing
modes, including the superAA antialiasing mode as described herein.
In one embodiment, the superAA mode is one of multiple "extended"
modes. Compositing modes include alternate frame rendering (AFR)
modes in which frames are rendered alternately by different VPUs.
Compositing modes further include "blacking" modes in which each
VPU is given a different part of a frame to process. The parts of
the frame not processed are designated as containing "black"
pixels. When the parts of the frame processed by both VPUs are
combined, either by the extended modes mixer 1914, or by selecting
only non-black pixels, the entire frame is displayed.
[0102] Control logic including a black register 1906 and a MUX path
logic and black comparator 1908 determines which compositing mode
the IM 1912 operates in. The output of the MUX path logic and black
comparator 1908 is a select input to the MUX 1916 and extended
modes mixer 1914 and dictates which of these components outputs
data. Data is output to a TDMS transmitter 1918 or a DAC 1920.
[0103] In one embodiment, the inter-component communication among
the VPUs and the IM 1912 includes I2C buses and protocols.
[0104] The modes are set through a combination of I2C register bits
1924 and TMDS control bits 1922 as shown in Table 1.
TABLE-US-00001 TABLE 1 Operational Modes and Control Bits Category
TMDS Cntr Main Sub 12C Bits Bits Notes Passthru Slave
INTERLINK_ENABLE = 0 n/a Uses 1.sup.st I2C access to
CONTROL_BITS_2: Bit determine path 3 = x Passthru Master
INTERLINK_ENABLE = 0 n/a Uses 1.sup.st I2C access to
CONTROL_BITS_2: Bit determine path 3 = x Interlink AFR_MANUAL
INTERLINK_ENABLE = 1 AFR_MAN_ON* = 0 xAFR_MAS state CONTROL_BITS_2:
Bit AFR_AUTO* = 1 changes controls the 3 = 0 next data path
Interlink AFR_AUTO INTERLINK_ENABLE = 1 AFR_MAN_ON* = 0
CONTROL_BITS_2: Bit AFR_AUTO* = 0 3 = 0 Interlink BLACKING
INTERLINK_ENABLE = 1 AFR_MAN_ON* = 1 Uses black pixels to
CONTROL_BITS_2: Bit AFR_AUTO* = x determine data path 3 = 0
Interlink Super AA INTERLINK_ENABLE = x n/a CONTROL_BITS_2: Bit
CONTROL_BITS_2: Bit 4-7 determines 3 = 1 extended mode
[0105] There are two separate data paths through the IM 1912. The
two input pixel streams from the respective VPUs are either
processed through the MUX 1916 (in pass-thru mode, or "standard"
interlink modes), or through the mixer 1914 in extended modes,
including super AA mode. As used herein, "interlink" or interlink
mode"implies any multiVPU mode that is not a pass-through mode. In
the MUX 1916, just one pixel from either VPU A or VPU B is selected
to pass through, and no processing of pixels is involved. In the
extended modes mixer 1914, processing is done on a pixel by pixel
basis. However, the pixels are processed, averaged together, and
reprocessed. In one embodiment, the processing steps involve using
one or more lookup tables to generate intermediate or final
results.
[0106] The selection between the MUX 1916 path and the mixer 1914
path is determined by I2C register bits and control bits. For
example, the mixer 1914 path is selected if:
ENABLE_INTERLINK=1 (I2C register)
and CONTROL_BITS.sub.--2: Bit 3 and Bit 4=1 (ExtendedModes and
SuperAA)
[0107] (else MUX).
[0108] Aspects of the invention described above may be implemented
as functionality programmed into any of a variety of circuitry,
including but not limited to programmable logic devices (PLDs),
such as field programmable gate arrays (FPGAs), programmable array
logic (PAL) devices, electrically programmable logic and memory
devices and standard cell-based devices, as well as application
specific integrated circuits (ASICs) and fully custom integrated
circuits. Some other possibilities for implementing aspects of the
invention include: microcontrollers with memory (such as
electronically erasable programmable read only memory (EEPROM)),
embedded microprocessors, firmware, software, etc. Furthermore,
aspects of the invention may be embodied in microprocessors having
software-based circuit emulation, discrete logic (sequential and
combinatorial), custom devices, fuzzy (neural) logic, quantum
devices, and hybrids of any of the above device types. Of course
the underlying device technologies may be provided in a variety of
component types, e.g., metal-oxide semiconductor field-effect
transistor (MOSFET) technologies like complementary metal-oxide
semiconductor (CMOS), bipolar technologies like emitter-coupled
logic (ECL), polymer technologies (e.g., silicon-conjugated polymer
and metal-conjugated polymer-metal structures), mixed analog and
digital, etc.
[0109] Unless the context clearly requires otherwise, throughout
the description and the claims, the words "comprise," "comprising,"
and the like are to be construed in an inclusive sense as opposed
to an exclusive or exhaustive sense; that is to say, in a sense of
"including, but not limited to." Words using the singular or plural
number also include the plural or singular number respectively.
Additionally, the words "herein," "hereunder," "above," "below,"
and words of similar import, when used in this application, refer
to this application as a whole and not to any particular portions
of this application. When the word "or" is used in reference to a
list of two or more items, that word covers all of the following
interpretations of the word: any of the items in the list, all of
the items in the list and any combination of the items in the
list.
[0110] The above description of illustrated embodiments of the
invention is not intended to be exhaustive or to limit the
invention to the precise form disclosed. While specific embodiments
of, and examples for, the invention are described herein for
illustrative purposes, various equivalent modifications are
possible within the scope of the invention, as those skilled in the
relevant art will recognize. The teachings of the invention
provided herein can be applied to other systems, not only for the
system including graphics processing or video processing as
described above.
[0111] For example, an antialiased image produced as described
herein may be output to a variety of display devices, including
computer displays that display moving pictures and printers that
print static images.
[0112] The various operations described may be performed in a very
wide variety of architectures and distributed differently than
described. As an example, in a distributed system a server may
perform some or all of the rendering process. In addition, though
many configurations are described herein, none are intended to be
limiting or exclusive. For example, the invention can also be
embodied in a system that includes an integrated graphics processor
(IGP) or video processor and a discrete graphics or video
processor, where frame data processed by each of the integrated and
discrete processors is merged or composited as described. Further,
the invention can also be embodied in a system that includes the
combination of one or more IGP devices with one or more discrete
graphics or video processors.
[0113] In other embodiments not shown, the number of VPUs can be
more than two.
[0114] In other embodiments, some or all of the hardware and
software capability described herein may exist in a printer,
camera, television, handheld device, mobile telephone, or some
other device. The antialiasing techniques described herein may be
applied as part of a process of constructing animation from a video
sequence.
[0115] The elements and acts of the various embodiments described
above can be combined to provide further embodiments. These and
other changes can be made to the invention in light of the above
detailed description.
[0116] In general, in the following claims, the terms used should
not be construed to limit the antialiasing method and system to the
specific embodiments disclosed in the specification and the claims,
but should be construed to include any processing systems that
operate under the claims to provide antialiasing. Accordingly, the
antialiasing method and system is not limited by the disclosure,
but instead the scope of the antialiasing method and system is to
be determined entirely by the claims.
[0117] While certain aspects of the method and apparatus for
antialiasing are presented below in certain claim forms, the
inventors contemplate the various aspects of the method and
apparatus for antialiasing in any number of claim forms. For
example, while only one aspect of the method and apparatus for
antialiasing may be recited as embodied in computer-readable
medium, other aspects may likewise be embodied in computer-readable
medium. Accordingly, the inventors reserve the right to add
additional claims after filing the application to pursue such
additional claim forms for other aspects of the method and
apparatus for antialiasing.
* * * * *