U.S. patent application number 15/100203 was filed with the patent office on 2017-01-12 for content adaptive dominant motion compensated prediction for next generation video coding.
This patent application is currently assigned to NTEL Corporation. The applicant listed for this patent is Ntel Corporation. Invention is credited to Neelesh GOKHALE, ATUL PURI.
Application Number | 20170013279 15/100203 |
Document ID | / |
Family ID | 50731602 |
Filed Date | 2017-01-12 |
United States Patent
Application |
20170013279 |
Kind Code |
A1 |
PURI; ATUL ; et al. |
January 12, 2017 |
CONTENT ADAPTIVE DOMINANT MOTION COMPENSATED PREDICTION FOR NEXT
GENERATION VIDEO CODING
Abstract
Techniques related to dominant motion compensated prediction for
next generation video coding are described.
Inventors: |
PURI; ATUL; (Redmond,
WA) ; GOKHALE; Neelesh; (Seattle, WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Ntel Corporation |
Santa Clara |
CA |
US |
|
|
Assignee: |
NTEL Corporation
Santa Clara
CA
|
Family ID: |
50731602 |
Appl. No.: |
15/100203 |
Filed: |
March 12, 2014 |
PCT Filed: |
March 12, 2014 |
PCT NO: |
PCT/US2014/024694 |
371 Date: |
May 27, 2016 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 19/527 20141101;
H04N 19/176 20141101; H04N 19/44 20141101; H04N 19/12 20141101;
H04N 19/122 20141101; H04N 19/82 20141101; H04N 19/105 20141101;
H04N 19/136 20141101; H04N 19/119 20141101; H04N 19/139 20141101;
H04N 19/61 20141101; H04N 19/513 20141101; H04N 19/172 20141101;
H04N 19/85 20141101; H04N 19/46 20141101; H04N 19/91 20141101; H04N
19/573 20141101; H04N 19/147 20141101 |
International
Class: |
H04N 19/82 20060101
H04N019/82; H04N 19/527 20060101 H04N019/527; H04N 19/139 20060101
H04N019/139; H04N 19/176 20060101 H04N019/176; H04N 19/105 20060101
H04N019/105; H04N 19/513 20060101 H04N019/513 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 27, 2013 |
US |
PCT/US2013/078114 |
Claims
1-47. (canceled)
48. A computer-implemented method for video coding, comprising:
obtaining frames of pixel data and having a current frame and a
decoded reference frame to use as a motion compensation reference
frame for the current frame; forming a warped global compensated
reference frame by displacing at least one portion of the decoded
reference frame by using global motion trajectories; determining a
motion vector indicating the motion of the at least one portion and
motion from a position based on the warped global compensated
reference frame to a position at the current frame; and forming a
prediction portion based, at least in part, on the motion vectors
and corresponding to a portion on the current frame.
49. The method of claim 48 wherein the at least one portion is a
block of pixels used as a unit to divide the current frame and the
reference frame into a plurality of the blocks.
50. The method of claim 48 wherein the at least one portion is at
least one tile of pixels, each tile being at least 64.times.64
pixels, and used as a unit to divide the current frame and the
reference frame into a plurality of the tiles; the method
comprising grouping tiles together based on common association with
an object in the frame to form the at least one portion; and
forming a single motion vector for each group of tiles; and
grouping the tiles based on a merge map transmittable from an
encoder to a decoder.
51. The method of claim 48 wherein the at least one portion is a
region of pixels shaped and sized depending on an object associated
with the region; and wherein a boundary of the region is at least
one of: a shape that resembles the shape of the object associated
with the region, and a rectangle placed around the object
associated with the region.
52. The method of claim 48 wherein the region is associated with at
least one of: a background of the frame, a foreground of the frame,
and a moving object in the frame; and wherein each region has a
single motion vector.
53. The method of claim 48 wherein forming a warped global
compensated reference frame comprises using the global motion
trajectories at the outer corners of the frame; and using an affine
or perspective global motion compensation method.
54. The method of claim 48 comprising: performing dominant motion
compensation comprising locally applied global motion compensation
so that at least one other set of global motion trajectories are
used at corners of at least one region on the frame that is less
than the entire frame to form a displaced region; using the pixel
values of the displaced region to form a prediction region that
corresponds to a region on the current frame; and providing the
option on the at least one region on a region-by-region basis to
select a prediction formed by: (1) a motion vector to form a
prediction for the at least one region and using global motion
compensation applied to the entire frame, or (2) applying local
global motion compensation with a set of global motion trajectories
at the region and using displaced pixel values of the region to
form a prediction.
55. The method of claim 48 comprising applying local global motion
compensation with a set of global motion trajectories applied at a
region of the reference frame that has an area less than the entire
reference frame, and using motion vectors to form a prediction for
the at least one region.
56. The method of claim 48 comprising providing the option to
select a mode for a frame among: (1) use the dominant motion
compensated reference frame prediction, (2) use blended prediction
of multiple dominant motion compensated reference frames, (3) use
dominant motion compensated reference with differential
translational motion vector for prediction, and (4) use dominant
motion compensated reference with differential translational motion
vector for prediction, blended with another reference frame.
57. The method of claim 48 comprising performing motion compensated
morphed reference prediction using bilinear interpolation and
motion compensation (MC) filter to form a morphed reference frame
MRef, tPred.sub.h as the intermediate horizontal interpolation, and
pred.sub.ji as the final motion compensated morphed reference
prediction: MRef [ i ' ] [ j ' ] = ( ( 8 - p x ) ( 8 - p y ) Ref [
y 0 ] [ x 0 ] + p x ( 8 - p y ) Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 -
p x ) Ref [ y 0 + 1 ] [ x 0 ] + p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ]
+ 31 ) 6 ##EQU00014## tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [
p j ] [ k ] MRef [ i ' + m ] [ j ' + n + k - N t 2 + 1 ]
##EQU00014.2## where m = [ - N t / 2 + 1 , H b + N t / 2 - 1 ] , n
= [ 0 , W b - 1 ] , Pred ji [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p
i ] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ] ##EQU00014.3## where
m = [ 0 , H b - 1 ] , n = [ 0 , W b - 1 ] , ##EQU00014.4## and
where: (iMVx, iMVy) is the transmitted motion vector in Sub-Pel
Unit (f.sub.s) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.t] of norm T, f.sub.s is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.t is the number MC Filter Taps, and i'=i+(iMVy/f.sub.s)
j'=j+(iMVx/f.sub.s) p.sub.i=iMVy & (f.sub.s-1) p.sub.j=iMVx
& (f.sub.s-1) (j', i') is integer motion adjusted current pixel
location in Morphed Reference Image; p.sub.j, p.sub.i are the
1/8.sup.th pel phases in the Morphed Reference Image;
x=(A*j'+B*i'+C<<r)>>r y=(D*j'+E*i'+F<<s)>>s
where (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy for location (j', i') p.sub.y=y & 0.times.7 p.sub.x=x
& 0.times.7 y.sub.0=>>3 x.sub.0=x>>3 where
(x.sub.0, y.sub.0) is the integer pel location in Ref Image;
p.sub.x, p.sub.y is the 1/8.sup.th pel phase;
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub.x*(8-p.-
sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][x.sub.0]-
+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
where m=[-N.sub.t/2-1, H.sub.b+N.sub.t/2], where n=[0, W.sub.b-1],
where k=[-N.sub.t/2-1, N.sub.t/2],
Pred.sub.ji[m]=SUM.sub.k[p.sub.j][k]*tPred.sub.h[m+k][n])/T, where
m=[0, H.sub.b-1], where n=[0, W.sub.b-1], where k=[-N.sub.t/2-1,
+N.sub.t/2].
58. The method of claim 48 comprising performing morphed reference
prediction using block motion compensation (MC) filtering to form a
morphed reference frame Mref, and Predh as the intermediate
horizontal interpolation: tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1
h [ p x ] [ k ] Ref [ y 0 + m ] [ x 0 + n + k - N t 2 + 1 ]
##EQU00015## m = [ - N t / 2 + 1 , H s + N t / 2 - 1 ] , n = [ 0 ,
W s - 1 ] , MRef [ i + m ] [ j + n ] = 1 T ' k = 0 N t - 1 h [ p y
] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ] ##EQU00015.2## where m
= [ 0 , H s - 1 ] , n = [ 0 , W s - 1 ] , ##EQU00015.3## and where
A, B, C, D, E, & F are affine parameters calculated from the
three Motion trajectories transmitted; using separable MC Filters
with filter coefficients h[fs][N.sub.t] of norm T; fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and
where N.sub.t is the number MC Filter Taps
x=(A*j+B*i+C<<r)>>r y=(D*j+E*i+F<<s)>>s (j,
i) is every (W.sub.s.times.H.sub.s) sub-block location in current
image, x and y are reference pixel coordinates in 1/8th Pel
accuracy; p.sub.y=y & 0.times.7 p.sub.x=x & 0.times.7
y.sub.0=y>>3 x.sub.0=x>>3 (x.sub.0, y.sub.0) is the
integer pel location in the reference frame (Ref Image); p.sub.x,
p.sub.y is the 1/8th pel phase.
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T,
m=[-N.sub.t/2-1, H.sub.s+N.sub.t/2], n=[0, W.sub.s-1],
k=[-N.sub.t/2-1, +N.sub.t/2]; and
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
m=[0, H.sub.s-1], n=[0, W.sub.s-1], k=[-N.sub.t/2-1,
+N.sub.t/2].
59. The method of claim 48 comprising performing motion compensated
morphed reference prediction using single loop motion compensation
(MC) filtering to form a morphed reference (Mref) and predictions
tPred.sub.h as the intermediate horizontal interpolation, and
Pred.sub.ji as the final motion compensated morphed reference
prediction for block of size W.sub.b.times.H.sub.b at (j, i): tPred
h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0 + m ]
[ x 0 + n + k - N t 2 + 1 ] ##EQU00016## for : m = [ - N t / 2 + 1
, H s + N t / 2 - 1 ] , n = [ 0 , W s - 1 ] , Pred ji [ uH s + m ]
[ vW s + n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [ m + k
- N t 2 + 1 ] [ n ] ##EQU00016.2## for : m = [ 0 , H s - 1 ] , n =
[ 0 , W s - 1 ] , u = [ 0 , H b / H s - 1 ] , v = [ 0 , W b / W s -
1 ] , ##EQU00016.3## and where: (iMVx, iMVy) is the transmitted
Motion Vector in Sub-Pel Units (fs) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable MC Filters with filter coefficients
h[fs][N.sub.t] of norm T, fs is the Sub-Pel Factor (e.g. 2=Half
Pel, 4=Quarter Pel, 8=Eighth Pel), and N.sub.t is the number MC
Filter Taps; i'=(i+u*H.sub.s)*fs+iMVx j'=(j+v*W.sub.s)*fs+iMVy
where (j, i) is the current block pixel location, (u, v) is the
index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block; Below, i', j' is motion adjusted
current pixel location in fs sub-pel accuracy,
x=((A*j'+B*i'+(C*fs)<<r)>>(r+3)
y=((D*j'+E*i'+(F*fs)<<s)>>(s+3) where x & y are
reference pixel coordinates in fs sub-pel accuracy p.sub.y=y &
(fs-1) p.sub.x=x & (fs-1) y.sub.0=y/fs x.sub.0=x/fs where
y.sub.0, x.sub.0 is the integer pel location in Ref Image, p.sub.x,
p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+-
n+k])/T, m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h-
[m+k][n])/T, m=[0,H.sub.s-1], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2], v=[0,W.sub.b/W.sub.s-1],
u=[0,H.sub.b/H.sub.s-1].
60. The method of claim 48 wherein the at least one portion is at
least one of: (1) a block of pixels used as a unit to divide the
current frame and the reference frame into a plurality of the
blocks; (2) at least one tile of pixels, each tile being at least
64.times.64 pixels, and used as a unit to divide the current frame
and the reference frame into a plurality of the tiles; the method
comprising at least one of: grouping tiles together based on common
association with an object in the frame to form the at least one
portion; and forming a single motion vector for each group of
tiles, grouping the tiles based on a merge map transmittable from
an encoder to a decoder; (3) a region of pixels shaped and sized
depending on an object associated with the region, wherein a
boundary of the region is at least one of: a shape that resembles
the shape of the object associated with the region, and a rectangle
placed around the object associated with the region; wherein the
region is associated with at least one of: a background of the
frame, a foreground of the frame, and a moving object in the frame;
the method comprising defining the region based on a boundary map
transmittable from an encoder to a decoder; wherein forming a
warped global compensated reference frame comprises using the
global motion trajectories at the outer corners of the frame;
wherein forming a warped global compensated reference frame
comprises using an affine or perspective global motion compensation
method; wherein the at least one portion comprises a frame divided
into a background and a foreground, and wherein determining motion
vectors comprises providing the background and foreground each with
one motion vector; the method comprising performing dominant motion
compensation comprising locally applied global motion compensation
so that at least one other set of global motion trajectories are
used at corners of at least one region on the frame that is less
than the entire frame to form a displaced region; and using the
pixel values of the displaced region to form a prediction region
that corresponds to a region on the current frame; the method
comprising at least one of: performing local global motion
compensation on multiple regions of the frame by using a different
set of global motion trajectories on each region; wherein each
region is a tile, and dividing the frame into the tiles, and
wherein each tile has a set of global motion trajectories;
providing the option to perform local global motion compensation on
a fraction of a tile in addition to entire tiles; wherein each
region is shaped and sized depending on an object associated with
the region; wherein the object is one of: a foreground, a
background, and an object moving in the frame; the method
comprising providing the option on the at least one region on a
region-by-region basis to select a prediction formed by: (1) a
motion vector to form a prediction for the at least one region and
using global motion compensation applied to the entire frame, or
(2) applying local global motion compensation with a set of global
motion trajectories at the region and using displaced pixel values
of the region to form a prediction; the method comprising applying
local global motion compensation with a set of global motion
trajectories applied at a region of the reference frame that has an
area less than the entire reference frame, and using motion vectors
to form a prediction for the at least one region; the method
comprising providing the option to select a mode for a frame among:
(1) use the dominant motion compensated reference frame prediction,
(2) use blended prediction of multiple dominant motion compensated
reference frames, (3) use dominant motion compensated reference
with differential translational motion vector for prediction, and
(4) use dominant motion compensated reference with differential
translational motion vector for prediction, blended with another
reference frame; the method comprising at least one of (a) to (c):
(a) performing motion compensated morphed reference prediction
using bilinear interpolation and motion compensation (MC) filter to
form a morphed reference frame MRef, tPred.sub.h as the
intermediate horizontal interpolation, and pred.sub.ji as the final
motion compensated morphed reference prediction: MRef [ i ' ] [ j '
] = ( ( 8 - p x ) ( 8 - p y ) Ref [ y 0 ] [ x 0 ] + p x ( 8 - p y )
Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 - p x ) Ref [ y 0 + 1 ] [ x 0 ] +
p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ] + 31 ) 6 ##EQU00017## tPred h [
m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p j ] [ k ] MRef [ i ' + m ] [
j ' + n + k - N t 2 + 1 ] ##EQU00017.2## where m = [ - N t / 2 + 1
, H b + N t / 2 - 1 ] , n = [ 0 , W b - 1 ] , Pred ji [ m ] [ n ] =
1 T ' k = 0 N t - 1 h [ p i ] [ k ] tPred h [ m + k - N t 2 + 1 ] [
n ] ##EQU00017.3## where m = [ 0 , H b - 1 ] , n = [ 0 , W b - 1 ]
, ##EQU00017.4## and where: (iMVx, iMVy) is the transmitted motion
vector in Sub-Pel Unit (f.sub.s) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.t] of norm T, fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.t is the number MC Filter Taps, and i'=i+(iMVy/f.sub.s)
j'=j+(iMVx/f.sub.s) p.sub.i=iMVy & (f.sub.s-1) p.sub.j=iMVx
& (f.sub.s-1) (j', i') is integer motion adjusted current pixel
location in Morphed Reference Image; p.sub.j, p.sub.i are the
1/8.sup.th pel phases in the Morphed Reference Image;
x=(A*j'+B*i'+C<<r)>>r y=(D*j'+E*i'+F<<s)>>s
where (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy for location (j', i') p.sub.y=y & 0.times.7 p.sub.x=x
& 0.times.7 y.sub.0=>>3 x.sub.0=x>>3 where
(x.sub.0, y.sub.0) is the integer pel location in Ref Image;
p.sub.x, p.sub.y is the 1/8.sup.th pel phase;
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub-
.x*(8-p.sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][-
x.sub.0]+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
where m=[-N.sub.t/2-1, H.sub.b+N.sub.t/2], where n=[0, W.sub.b-1],
where k=[-N.sub.t/2-1, N.sub.t/2],
Pred.sub.ji[m][n]=SUM.sub.k(h[p.sub.j][k]*tPred.sub.h[m+k][n])/T,
where m=[0, H.sub.b-1], where n=[0, W.sub.b-1], where
k=[-N.sub.t/2-1, +N.sub.t/2]; (b) performing morphed reference
prediction using block motion compensation (MC) filtering to form a
morphed reference frame Mref, and Predh as the intermediate
horizontal interpolation: tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1
h [ p x ] [ k ] Ref [ y 0 + m ] [ x 0 + n + k - N t 2 + 1 ]
##EQU00018## m = [ - N t / 2 + 1 , H s + N t / 2 - 1 ] , n = [ 0 ,
W s - 1 ] , MRef [ i + m ] [ j + n ] = 1 T ' k = 0 N t - 1 h [ p y
] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ] ##EQU00018.2## where m
= [ 0 , H s - 1 ] , n = [ 0 , W s - 1 ] , ##EQU00018.3## and where
A, B, C, D, E, & F are affine parameters calculated from the
three Motion trajectories transmitted; using separable MC Filters
with filter coefficients h[fs][N.sub.t] of norm T; fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and
where N.sub.t is the number MC Filter Taps
x=(A*j+B*i+C<<r)>>r y=(D*j+E*i+F<<s)>>s (j,
i) is every (W.sub.s.times.H.sub.s) sub-block location in current
image, x and y are reference pixel coordinates in 1/8th Pel
accuracy; p.sub.y=y & 0.times.7 p.sub.x=x & 0.times.7
y.sub.0=y>>3 x.sub.0=x>>3 (x.sub.0, y.sub.0) is the
integer pel location in the reference frame (Ref Image); p.sub.x,
p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T,
m=[-N.sub.t/2-1, H.sub.s+N.sub.t/2], n=[0, W.sub.s-1],
k=[-N.sub.t/2-1, +N.sub.t/2]; and
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
m=[0, H.sub.s-1], n=[0, W.sub.s-1], k=[-N.sub.t/2-1, +N.sub.t/2];
and (c) performing motion compensated morphed reference prediction
using single loop motion compensation (MC) filtering to form a
morphed reference (Mref) and predictions tPred.sub.h as the
intermediate horizontal interpolation, and Pred.sub.ji as the final
motion compensated morphed reference prediction for block of size
W.sub.b.times.H.sub.b at (j, i): tPred h [ m ] [ n ] = 1 T ' k = 0
N t - 1 h [ p x ] [ k ] Ref [ y 0 + m ] [ x 0 + n + k - N t 2 + 1 ]
##EQU00019## for : m = [ - N t / 2 + 1 , H s + N t / 2 - 1 ] , n =
[ 0 , W s - 1 ] , Pred ji [ uH s + m ] [ vW s + n ] = 1 T ' k = 0 N
t - 1 h [ p y ] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ]
##EQU00019.2## for : m = [ 0 , H s - 1 ] , n = [ 0 , W s - 1 ] , u
= [ 0 , H b / H s - 1 ] , v = [ 0 , W b / W s - 1 ] ,
##EQU00019.3## and where: (iMVx, iMVy) is the transmitted Motion
Vector in Sub-Pel Units (fs) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable MC Filters with filter coefficients
h[fs][N.sub.t] of norm T, fs is the Sub-Pel Factor (e.g. 2=Half
Pel, 4=Quarter Pel, 8=Eighth Pel), and N.sub.t is the number MC
Filter Taps; i'=(i+u*H.sub.s)*fs+iMVx j'=(j+v*W.sub.s)*fs+iMVy
where (j, i) is the current block pixel location, (u, v) is the
index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block; Below, i', j' is motion adjusted
current pixel location in fs sub-pel accuracy,
x=((A*j'+B*i'+(C*fs)<<r)>>(r+3)
y=((D*j'+E*i'+(F*fs)<<s)>>(s+3) where x & y are
reference pixel coordinates in fs sub-pel accuracy p.sub.y=y &
(fs-1) p.sub.x=x & (fs-1) y.sub.0=y/fs x.sub.0=x/fs where
y.sub.0, x.sub.0 is the integer pel location in Ref Image, p.sub.x,
p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+-
n+k])/T, m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h-
[m+k][n])/T, m=[0,H.sub.s-1], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2], v=[0,W.sub.b/W.sub.s-1],
u=[0,H.sub.b/H.sub.s-1].
61. A computer-implemented method for video coding, comprising:
obtaining frames of pixel data and having a current frame and a
decoded reference frame to use as a motion compensation reference
frame for the current frame; dividing the reference frame into a
plurality of portions that are less than the area of the entire
frame; performing dominant motion compensation comprising applying
local global motion compensation on at least one of the portions by
displacing the at least one portion of the decoded reference frame
by using global motion trajectories at a boundary of the portion;
and forming a prediction portion that corresponds to a portion on
the current frame, and by using the pixel values of the displaced
portion.
62. The method of claim 61 comprising performing local global
motion compensation on a plurality of the portions by using a
different set of global motion trajectories on each portion of the
plurality of portions.
63. The method of claim 62 wherein each portion is a tile, the
method comprising dividing the frame into the tiles, and wherein
each tile has a set of global motion trajectories.
64. The method of claim 63 comprising grouping a plurality of the
tiles into a region, and applying the same global motion
trajectories on the tiles within the same region, and different
sets of global motion trajectories depending on the region.
65. The method of claim 48 comprising grouping a plurality of the
portions into a region, and applying the same global motion
trajectories on the portions within the same region, and different
sets of global motion trajectories depending on the region; and
wherein at least one of: each portion is shaped and sized depending
on an object associated with the portion, the object is one of: a
foreground, a background, and an object moving in the frame, and
the portion is a rectangle placed about the object.
66. The method of claim 61 comprising forming a portion of the
background of the reference frame, and a portion of the foreground
of the reference frame each with a different set of local global
motion trajectories for each portion.
67. The method of claim 61 comprising performing local global
motion compensation on a plurality of the portions by using a
different set of global motion trajectories on each portion of the
plurality of portions; wherein each portion is a tile, the method
comprising dividing the frame into the tiles, and wherein each tile
has a set of global motion trajectories; the method comprising
providing the option to perform local global motion compensation on
a fraction of a tile in addition to entire tiles; wherein local
global motion compensation trajectories are provided to half-tiles
or quarter-tiles; the method comprising at least one of: grouping a
plurality of the tiles into a region, and applying the same global
motion trajectories on the tiles within the same region, and
different sets of global motion trajectories depending on the
region, and grouping a plurality of the portions into a region, and
applying the same global motion trajectories on the portions within
the same region, and different sets of global motion trajectories
depending on the region; wherein each portion is shaped and sized
depending on an object associated with the portion; wherein the
object is one of: a foreground, a background, and an object moving
in the frame; wherein the portion is a rectangle placed about the
object; the method comprising forming a portion of the background
of the reference frame, and a portion of the foreground of the
reference frame each with a different set of local global motion
trajectories for each portion.
68. A coder comprising: an image buffer; and a graphics processing
unit configured to: obtain frames of pixel data and having a
current frame and a decoded reference frame to use as a motion
compensation reference frame for the current frame; divide the
reference frame into a plurality of portions that are less than the
area of the entire frame; perform dominant motion compensation
comprising applying local global motion compensation on at least
one of the portions by displacing the at least one portion of the
decoded reference frame by using global motion trajectories at a
boundary of the portion; and form a prediction portion that
corresponds to a portion on the current frame and by using the
pixel values of the displaced portion.
69. The coder of claim 68 the graphics processing unit configured
to: perform local global motion compensation on a plurality of the
portions by using a different set of global motion trajectories on
each portion of the plurality of portions; wherein each portion is
a tile, the graphics processing unit configured to divide the frame
into the tiles, and wherein each tile has a set of global motion
trajectories; the graphics processing unit configured to provide
the option to perform local global motion compensation on a
fraction of a tile in addition to entire tiles; wherein local
global motion compensation trajectories are provided to half-tiles
or quarter-tiles; the graphics processing unit configured to at
least one of: group a plurality of the tiles into a region, and
apply the same global motion trajectories on the tiles within the
same region, and different sets of global motion trajectories
depending on the region; and group a plurality of the portions into
a region, and apply the same global motion trajectories on the
portions within the same region, and different sets of global
motion trajectories depending on the region; wherein each portion
is shaped and sized depending on an object associated with the
portion; wherein the object is one of: a foreground, a background,
and an object moving in the frame; wherein the portion is a
rectangle placed about the object; the graphics processing unit
configured to form a portion of the background of the reference
frame, and a portion of the foreground of the reference frame each
with a different set of local global motion trajectories for each
portion.
70. A coder comprising: an image buffer; and a graphics processing
unit configured to: obtain frames of pixel data and having a
current frame and a decoded reference frame to use as a motion
compensation reference frame for the current frame; form a warped
global compensated reference frame by displacing at least one
portion of the decoded reference frame by using global motion
trajectories; determine a motion vector indicating the motion of
the at least one portion and motion from a position based on the
warped global compensated reference frame to a position at the
current frame; and form a prediction portion based, at least in
part, on the motion vectors and corresponding to a portion on the
current frame.
71. The coder of claim 70 wherein the at least one portion is at
least one of: (1) a block of pixels used as a unit to divide the
current frame and the reference frame into a plurality of the
blocks; (2) at least one tile of pixels, each tile being at least
64.times.64 pixels, and used as a unit to divide the current frame
and the reference frame into a plurality of the tiles; the graphics
processing unit being configured to at least one of: group tiles
together based on common association with an object in the frame to
form the at least one portion; and form a single motion vector for
each group of tiles, group the tiles based on a merge map
transmittable from an encoder to a decoder; (3) a region of pixels
shaped and sized depending on an object associated with the region,
wherein a boundary of the region is at least one of: a shape that
resembles the shape of the object associated with the region, and a
rectangle placed around the object associated with the region;
wherein the region is associated with at least one of: a background
of the frame, a foreground of the frame, and a moving object in the
frame; the graphics processing unit being configured to define the
region based on a boundary map transmittable from an encoder to a
decoder; wherein form a warped global compensated reference frame
comprises using the global motion trajectories at the outer corners
of the frame; wherein form a warped global compensated reference
frame comprises using an affine or perspective global motion
compensation method; wherein the at least one portion comprises a
frame divided into a background and a foreground, and wherein
determining motion vectors comprises providing the background and
foreground each with one motion vector; the graphics processing
unit configured to perform dominant motion compensation comprising
locally applied global motion compensation so that at least one
other set of global motion trajectories are used at corners of at
least one region on the frame that is less than the entire frame to
form a displaced region; and use the pixel values of the displaced
region to form a prediction region that corresponds to a region on
the current frame; the graphics processing unit configured to at
least one of: perform local global motion compensation on multiple
regions of the frame by using a different set of global motion
trajectories on each region; wherein each region is a tile, and
dividing the frame into the tiles, and wherein each tile has a set
of global motion trajectories; provide the option to perform local
global motion compensation on a fraction of a tile in addition to
entire tiles; wherein each region is shaped and sized depending on
an object associated with the region; wherein the object is one of:
a foreground, a background, and an object moving in the frame; the
graphics processing unit being configured to provide the option on
the at least one region on a region-by-region basis to select a
prediction formed by: (1) a motion vector to form a prediction for
the at least one region and using global motion compensation
applied to the entire frame, or (2) apply local global motion
compensation with a set of global motion trajectories at the region
and using displaced pixel values of the region to form a
prediction; the graphics processing unit configured to apply local
global motion compensation with a set of global motion trajectories
applied at a region of the reference frame that has an area less
than the entire reference frame, and use motion vectors to form a
prediction for the at least one region; the graphics processing
unit configured to provide the option to select a mode for a frame
among: (1) use the dominant motion compensated reference frame
prediction, (2) use blended prediction of multiple dominant motion
compensated reference frames, (3) use dominant motion compensated
reference with differential translational motion vector for
prediction, and (4) use dominant motion compensated reference with
differential translational motion vector for prediction, blended
with another reference frame; the graphics processing unit
configured to at least one of (a) to (c): (a) perform motion
compensated morphed reference prediction using bilinear
interpolation and motion compensation (MC) filter to form a morphed
reference frame MRef, tPred.sub.h as the intermediate horizontal
interpolation, and pred.sub.ji as the final motion compensated
morphed reference prediction: MRef [ i ' ] [ j ' ] = ( ( 8 - p x )
( 8 - p y ) Ref [ y 0 ] [ x 0 ] + p x ( 8 - p y ) Ref [ y 0 ] [ x 0
+ 1 ] + p y ( 8 - p x ) Ref [ y 0 + 1 ] [ x 0 ] + p y p x Ref [ y 0
+ 1 ] [ x 0 + 1 ] + 31 ) 6 ##EQU00020## tPred h [ m ] [ n ] = 1 T '
k = 0 N t - 1 h [ p j ] [ k ] MRef [ i ' + m ] [ j ' + n + k - N t
2 + 1 ] ##EQU00020.2## where m = [ - N t / 2 + 1 , H b + N t / 2 -
1 ] , n = [ 0 , W b - 1 ] , Pred ji [ m ] [ n ] = 1 T ' k = 0 N t -
1 h [ p i ] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ]
##EQU00020.3## where m = [ 0 , H b - 1 ] , n = [ 0 , W b - 1 ] ,
##EQU00020.4## and where: (iMVx, iMVy) is the transmitted motion
vector in Sub-Pel Unit (f.sub.s) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.i] of norm T, fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.i is the number MC Filter Taps, and i'=i+(iMVy/f.sub.s)
j'=j+(iMVx/f.sub.s) p.sub.i=iMVy & (f.sub.s-1) p.sub.j=iMVx
& (f.sub.s-1) (j', i') is integer motion adjusted current pixel
location in Morphed Reference Image; p.sub.j, p.sub.i are the
1/8.sup.th pel phases in the Morphed Reference Image;
x=(A*1+B*i'+C<<r)>>r y=(D*j'+E*i'+F<<s)>>s
where (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy for location (j', i') p.sub.y=y & 0.times.7 p.sub.x=x
& 0.times.7 y.sub.0=>>3 x.sub.0=x>>3 where
(x.sub.0, y.sub.0) is the integer pel location in Ref Image;
p.sub.x, p.sub.y is the 1/8.sup.th pel phase;
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub-
.x*(8-p.sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][-
x.sub.0]+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
where m=[-N.sub.t/2-1, H.sub.b+N.sub.t/2], where n=[0, W.sub.b-1],
where k=[-N.sub.t/2-1, N.sub.t/2],
Pred.sub.ji[m][n]=SUM.sub.k[p.sub.j][k]*tPred.sub.h[m+k][n])/T,
where m=[0, H.sub.b-1], where n=[0, W.sub.b-1], where
k=[-N.sub.t/2-1, +N.sub.t/2]; (b) perform morphed reference
prediction using block motion compensation (MC) filtering to form a
morphed reference frame Mref, and Predh as the intermediate
horizontal interpolation: tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1
h [ p x ] [ k ] Ref [ y 0 + m ] [ x 0 + n + k - N t 2 + 1 ]
##EQU00021## m = [ - N t / 2 + 1 , H s + N t / 2 - 1 ] , n = [ 0 ,
W s - 1 ] , MRef [ i + m ] [ j + n ] = 1 T ' k = 0 N t - 1 h [ p y
] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ] ##EQU00021.2## where m
= [ 0 , H s - 1 ] , n = [ 0 , W s - 1 ] , ##EQU00021.3## and where
A, B, C, D, E, & F are affine parameters calculated from the
three Motion trajectories transmitted; using separable MC Filters
with filter coefficients h[fs][N.sub.t] of norm T; fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and
where N.sub.t is the number MC Filter Taps
x=(A*j+B*i+C<<r)>>r y=(D*j+E*i+F<<s)>>s (j,
i) is every (W.sub.s.times.H.sub.s) sub-block location in current
image, x and y are reference pixel coordinates in 1/8th Pel
accuracy; p.sub.y=y & 0.times.7 p.sub.x=x & 0.times.7
y.sub.0=y>>3 x.sub.0=x>>3 (x.sub.0, y.sub.0) is the
integer pel location in the reference frame (Ref Image); p.sub.x,
p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T,
m=[-N.sub.t/2-1, H.sub.s+N.sub.t/2], n=[0, W.sub.s-1],
k=[-N.sub.t/2-1, +N.sub.t/2]; and
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
m=[0, H.sub.s-1], n=[0, W.sub.s-1], k=[-N.sub.t/2-1, +N.sub.t/2];
and (c) perform motion compensated morphed reference prediction
using single loop motion compensation (MC) filtering to form a
morphed reference (Mref) and predictions tPred.sub.h as the
intermediate horizontal interpolation, and Pred.sub.ji as the final
motion compensated morphed reference prediction for block of size
W.sub.b.times.H.sub.b at (j, i): tPred h [ m ] [ n ] = 1 T ' k = 0
N t - 1 h [ p x ] [ k ] Ref [ y 0 + m ] [ x 0 + n + k - N t 2 + 1 ]
##EQU00022## for : m = [ - N t / 2 + 1 , H s + N t / 2 - 1 ] , n =
[ 0 , W s - 1 ] , Pred ji [ uH s + m ] [ vW s + n ] = 1 T ' k = 0 N
t - 1 h [ p y ] [ k ] tPred h [ m + k - N t 2 + 1 ] [ n ]
##EQU00022.2## for : m = [ 0 , H s - 1 ] , n = [ 0 , W s - 1 ] , u
= [ 0 , H b / H s - 1 ] , v = [ 0 , W b / W s - 1 ] ,
##EQU00022.3## and where: (iMVx, iMVy) is the transmitted Motion
Vector in Sub-Pel Units (fs) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable MC Filters with filter coefficients
h[fs][N.sub.t] of norm T, fs is the Sub-Pel Factor (e.g. 2=Half
Pel, 4=Quarter Pel, 8=Eighth Pel), and N.sub.t is the number MC
Filter Taps; i'=(i+u*H.sub.s)*fs+iMVx j'=(j+v*W.sub.s)*fs+iMVy
where (j, i) is the current block pixel location, (u, v) is the
index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block; Below, i', j' is motion adjusted
current pixel location in fs sub-pel accuracy,
x=((A*j'+B*i'+(C*fs)<<r)>>(r+3)
y=((D*j'+E*i'+(F*fs)<<s)>>(s+3) where x & y are
reference pixel coordinates in fs sub-pel accuracy p.sub.y=y &
(f.sub.s-1) p.sub.x=x & (f.sub.s-1) y.sub.0=y/fs x.sub.0=x/fs
where y.sub.0, x.sub.0 is the integer pel location in Ref Image,
p.sub.x, p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T,
m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h-
[m+k][n])/T, m=[0,H.sub.s-1], n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2], V=[0,W.sub.b/W.sub.s-1],
u=[0,H.sub.b/H.sub.s-1].
72. At least one computer readable memory comprising instructions,
that when executed by a computing device, cause the computing
device to: obtain frames of pixel data and having a current frame
and a decoded reference frame to use as a motion compensation
reference frame for the current frame; divide the reference frame
into a plurality of portions that are less than the area of the
entire frame; perform dominant motion compensation comprising
applying local global motion compensation on at least one of the
portions by displacing the at least one portion of the decoded
reference frame by using global motion trajectories at a boundary
of the portion; and form a prediction portion that corresponds to a
portion on the current frame and by using the pixel values of the
displaced portion.
73. The computer readable memory of claim 72, wherein the
instructions cause the computing device to: perform local global
motion compensation on a plurality of the portions by using a
different set of global motion trajectories on each portion of the
plurality of portions; wherein each portion is a tile, the
instructions cause the computing device to divide the frame into
the tiles, and wherein each tile has a set of global motion
trajectories; the instructions cause the computing device to
provide the option to perform local global motion compensation on a
fraction of a tile in addition to entire tiles; wherein local
global motion compensation trajectories are provided to half-tiles
or quarter-tiles; the instructions cause the computing device to at
least one of: group a plurality of the tiles into a region, and
apply the same global motion trajectories on the tiles within the
same region, and different sets of global motion trajectories
depending on the region; and group a plurality of the portions into
a region, and apply the same global motion trajectories on the
portions within the same region, and different sets of global
motion trajectories depending on the region; wherein each portion
is shaped and sized depending on an object associated with the
portion; wherein the object is one of: a foreground, a background,
and an object moving in the frame; wherein the portion is a
rectangle placed about the object; the instructions cause the
computing device to form a portion of the background of the
reference frame, and a portion of the foreground of the reference
frame each with a different set of local global motion trajectories
for each portion.
74. At least one computer readable memory comprising instructions,
that when executed by a computing device, cause the computing
device to: obtain frames of pixel data and having a current frame
and a decoded reference frame to use as a motion compensation
reference frame for the current frame; form a warped global
compensated reference frame by displacing at least one portion of
the decoded reference frame by using global motion trajectories;
determine a motion vector indicating the motion of the at least one
portion and motion from a position based on the warped global
compensated reference frame to a position at the current frame; and
form a prediction portion based, at least in part, on the motion
vectors and corresponding to a portion on the current frame.
75. The computer readable memory of claim 74, wherein the at least
one portion is at least one of: (1) a block of pixels used as a
unit to divide the current frame and the reference frame into a
plurality of the blocks; (2) at least one tile of pixels, each tile
being at least 64.times.64 pixels, and used as a unit to divide the
current frame and the reference frame into a plurality of the
tiles; the instructions causing the computing device to at least
one of: group tiles together based on common association with an
object in the frame to form the at least one portion; and forming a
single motion vector for each group of tiles, group the tiles based
on a merge map transmittable from an encoder to a decoder; (3) a
region of pixels shaped and sized depending on an object associated
with the region, wherein a boundary of the region is at least one
of: a shape that resembles the shape of the object associated with
the region, and a rectangle placed around the object associated
with the region; wherein the region is associated with at least one
of: a background of the frame, a foreground of the frame, and a
moving object in the frame; the instructions causing the computing
device to define the region based on a boundary map transmittable
from an encoder to a decoder; wherein form a warped global
compensated reference frame comprises using the global motion
trajectories at the outer corners of the frame; wherein form a
warped global compensated reference frame comprises using an affine
or perspective global motion compensation method; wherein the at
least one portion comprises a frame divided into a background and a
foreground, and wherein determining motion vectors comprises
providing the background and foreground each with one motion
vector; the instructions causing the computing device to perform
dominant motion compensation comprising locally applied global
motion compensation so that at least one other set of global motion
trajectories are used at corners of at least one region on the
frame that is less than the entire frame to form a displaced
region; and use the pixel values of the displaced region to form a
prediction region that corresponds to a region on the current
frame; the instructions causing the computing device to at least
one of: perform local global motion compensation on multiple
regions of the frame by using a different set of global motion
trajectories on each region; wherein each region is a tile, and
dividing the frame into the tiles, and wherein each tile has a set
of global motion trajectories; provide the option to perform local
global motion compensation on a fraction of a tile in addition to
entire tiles; wherein each region is shaped and sized depending on
an object associated with the region; wherein the object is one of:
a foreground, a background, and an object moving in the frame; the
instructions causing the computing device to provide the option on
the at least one region on a region-by-region basis to select a
prediction formed by: (1) a motion vector to form a prediction for
the at least one region and use global motion compensation applied
to the entire frame, or (2) apply local global motion compensation
with a set of global motion trajectories at the region and using
displaced pixel values of the region to form a prediction; the
instructions causing the computing device to apply local global
motion compensation with a set of global motion trajectories
applied at a region of the reference frame that has an area less
than the entire reference frame, and use motion vectors to form a
prediction for the at least one region; and the instructions
causing the computing device to provide the option to select a mode
for a frame among: (1) use the dominant motion compensated
reference frame prediction, (2) use blended prediction of multiple
dominant motion compensated reference frames, (3) use dominant
motion compensated reference with differential translational motion
vector for prediction, and (4) use dominant motion compensated
reference with differential translational motion vector for
prediction, blended with another reference frame.
Description
RELATED APPLICATIONS
[0001] The present application claims the benefit of the
international application no. PCT/US2013/078114, filed 27 Dec.
2013, the disclosure of which is expressly incorporated herein in
its entirety for all purposes.
BACKGROUND
[0002] A video encoder compresses video information so that more
information can be sent over a given bandwidth. The compressed
signal may then be transmitted to a receiver having a decoder that
decodes or decompresses the signal prior to display.
[0003] High Efficient Video Coding (HEVC) is the latest video
compression standard, which is being developed by the Joint
Collaborative Team on Video Coding (JCT-VC) formed by ISO/IEC
Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts
Group (VCEG). HEVC is being developed in response to the previous
H.264/AVC (Advanced Video Coding) standard not providing enough
compression for evolving higher resolution video applications.
Similar to previous video coding standards, HEVC includes basic
functional modules such as intra/inter prediction, transform,
quantization, in-loop filtering, and entropy coding.
[0004] The ongoing HEVC standard may attempt to improve on
limitations of the H.264/AVC standard such as limited choices for
allowed prediction partitions and coding partitions, limited
allowed multiple references and prediction generation, limited
transform block sizes and actual transforms, limited mechanisms for
reducing coding artifacts, and inefficient entropy encoding
techniques. However, the ongoing HEVC standard may use iterative
approaches to solving such problems.
[0005] For instance, with ever increasing resolution of video to be
compressed and expectation of high video quality, the corresponding
bitrate/bandwidth required for coding using existing video coding
standards such as H.264 or even evolving standards such as
H.265/HEVC, is relatively high. The aforementioned standards use
expanded forms of traditional approaches to implicitly address the
insufficient compression/quality problem, but often the results are
limited.
[0006] The present description, developed within the context of a
Next Generation Video (NGV) codec project, addresses the general
problem of designing an advanced video codec that maximizes the
achievable compression efficiency while remaining sufficiently
practical for implementation on devices. For instance, with ever
increasing resolution of video and expectation of high video
quality due to availability of good displays, the corresponding
bitrate/bandwidth required using existing video coding standards
such as earlier MPEG standards and even the more recent H.264/AVC
standard, is relatively high. H.264/AVC was not perceived to be
providing high enough compression for evolving higher resolution
video applications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The material described herein is illustrated by way of
example and not by way of limitation in the accompanying figures.
For simplicity and clarity of illustration, elements illustrated in
the figures are not necessarily drawn to scale. For example, the
dimensions of some elements may be exaggerated relative to other
elements for clarity. Further, where considered appropriate,
reference labels have been repeated among the figures to indicate
corresponding or analogous elements. In the figures:
[0008] FIG. 1 is an illustrative diagram of an example next
generation video encoder;
[0009] FIG. 2 is an illustrative diagram of an example next
generation video decoder;
[0010] FIG. 3(a) is an illustrative diagram of an example next
generation video encoder and subsystems;
[0011] FIG. 3(b) is an illustrative diagram of an example next
generation video decoder and subsystems;
[0012] FIG. 4 is an illustrative diagram of modified prediction
reference pictures;
[0013] FIG. 5 is a diagram of a frame sequence for explaining a
method of providing super-resolution synthesized reference
frames;
[0014] FIG. 6 is an illustrative diagram of an example encoder
subsystem;
[0015] FIG. 7 is an illustrative diagram of an example decoder
subsystem;
[0016] FIG. 8 is an illustrative diagram of an example encoder
prediction subsystem;
[0017] FIG. 9 is an illustrative diagram of an example decoder
prediction subsystem;
[0018] FIG. 10 is an illustrative diagram showing frames used to
illustrate dominant motion compensation using motion vectors;
[0019] FIG. 11 is a flow chart of a method of performing dominant
motion compensation;
[0020] FIG. 12 is a flow chart of another method of dominant motion
compensation;
[0021] FIG. 13 is a flow chart of a detailed method of dominant
motion compensation using motion vectors;
[0022] FIG. 14-16 are illustrative diagrams showing frames to
explain global motion compensation;
[0023] FIG. 17 is an illustrative diagram of frames showing
dominant motion compensation using motion vectors;
[0024] FIG. 18 is a diagram of a dominant motion compensation
subsystem at an encoder;
[0025] FIG. 19 is a diagram of a dominant motion compensation
subsystem at a decoder;
[0026] FIGS. 20-23 are diagrams of frames showing alternative
dominant motion compensation techniques using motion vectors;
[0027] FIG. 24 is a diagram of a dominant motion compensation
subsystem at an encoder; FIG. 25 is a diagram of a dominant motion
compensation subsystem at a decoder; FIG. 26 is a flow chart of a
detailed method of dominant motion compensation using local global
motion compensation;
[0028] FIGS. 27-31 are diagrams of alternative dominant motion
compensation techniques using local global motion compensation;
[0029] FIG. 32 is a diagram of a dominant motion compensation
subsystem at an encoder;
[0030] FIG. 33 is a diagram of a dominant motion compensation
subsystem at a decoder;
[0031] FIG. 34 is a diagram of an example video coding system and
video coding process in operation.
[0032] FIG. 35 is an illustrative diagram of an example video
coding system;
[0033] FIG. 36 is an illustrative diagram of an example system;
and
[0034] FIG. 37 illustrates an example device.
DETAILED DESCRIPTION
[0035] One or more implementations are now described with reference
to the enclosed figures. While specific configurations and
arrangements are discussed, it should be understood that this is
done for illustrative purposes only. Persons skilled in the
relevant art will recognize that other configurations and
arrangements may be employed without departing from the spirit and
scope of the description. It will be apparent to those skilled in
the relevant art that techniques and/or arrangements described
herein may also be employed in a variety of other systems and
applications other than what is described herein.
[0036] While the following description sets forth various
implementations that may be manifested in architectures such as
system-on-a-chip (SoC) architectures for example, implementation of
the techniques and/or arrangements described herein are not
restricted to particular architectures and/or computing systems and
may be implemented by any architecture and/or computing system for
similar purposes. For instance, various architectures employing,
for example, multiple integrated circuit (IC) chips and/or
packages, and/or various computing devices and/or consumer
electronic (CE) devices such as set top boxes, smart phones, etc.,
may implement the techniques and/or arrangements described herein.
Further, while the following description may set forth numerous
specific details such as logic implementations, types and
interrelationships of system components, logic
partitioning/integration choices, etc., claimed subject matter may
be practiced without such specific details. In other instances,
some material such as, for example, control structures and full
software instruction sequences, may not be shown in detail in order
not to obscure the material disclosed herein.
[0037] The material disclosed herein may be implemented in
hardware, firmware, software, or any combination thereof. The
material disclosed herein may also be implemented as instructions
stored on a machine-readable medium, which may be read and executed
by one or more processors. A machine-readable medium may include
any medium and/or mechanism for storing or transmitting information
in a form readable by a machine (e.g., a computing device). For
example, a machine-readable medium may include read only memory
(ROM); random access memory (RAM); magnetic disk storage media;
optical storage media; flash memory devices; electrical, optical,
acoustical or other forms of propagated signals (e.g., carrier
waves, infrared signals, digital signals, etc.); and others.
[0038] References in the specification to "one implementation", "an
implementation", "an example implementation", etc., indicate that
the implementation described may include a particular feature,
structure, or characteristic, but every embodiment may not
necessarily include the particular feature, structure, or
characteristic. Moreover, such phrases are not necessarily
referring to the same implementation. Further, when a particular
feature, structure, or characteristic is described in connection
with an embodiment, it is submitted that it is within the knowledge
of one skilled in the art to effect such feature, structure, or
characteristic in connection with other implementations whether or
not explicitly described herein.
[0039] Systems, apparatus, articles, and methods are described
below related to dominant motion compensated prediction for next
generation video coding.
[0040] As discussed above, the H.264/AVC coding standard while it
represents improvement over past MPEG standards, it is still very
limiting in choices of prediction due to the following reasons: the
choices for allowed prediction partitions are very limited, the
accuracy of prediction for prediction partitions is limited, and
the allowed multiple reference predictions are very limited as they
are discrete based on past decoded frames rather than accumulation
of resolution over many frames. The aforementioned limitations of
the state of the art standards such as H.264 are recognized by the
ongoing work in HEVC that uses an iterative approach to fixing
these limitations.
[0041] Further, the problem of improved prediction is currently
being solved in an ad hoc manner by using decoded multiple
references in the past and/or future for motion compensated
prediction in inter-frame coding of video. This is done with the
hope that in the past or future frames, there might be some more
similar areas to the area of current frame being predicted than in
the past frame (for P-pictures/slices), or in the past and future
frames (for B-pictures/slices).
[0042] As will be described in greater detail below, some forms of
prediction, such as the dominant motion compensation prediction
procedures of this disclosure, may not be supportable by existing
standards. The present disclosure was developed within the context
of Next Generation Video (NGV) codec project to addresses the
problem of designing a new video coding scheme that maximizes
compression efficiency while remaining practical for implementation
on devices. Specifically, a new type of prediction is disclosed
herein called locally adaptive dominant motion compensated
prediction (or simply dominant motion compensation prediction) that
provides improved prediction which in turn reduces the prediction
error, thereby improving the overall video coding efficiency.
[0043] More specifically, techniques described herein may differ
from standards based approaches as it naturally incorporates
significant content based adaptivity in video coding process to
achieve higher compression. By comparison, standards based video
coding approaches typically tend to squeeze higher gains by
adaptations and fine tuning of legacy approaches. For instance, all
standards based approaches heavily rely on adapting and further
tweaking of motion compensated interframe coding as the primary
means to reduce prediction differences to achieve gains. On the
other hand, some video coding implementations disclosed herein, in
addition to exploiting interframe differences due to motion, also
exploits other types of interframe differences (gain, blur,
registration, dominant/global motion) that naturally exist in
typical video scenes, as well as prediction benefits of frames
synthesized from past decoded frames only or a combination of past
and future decoded frames. In NGV coding, the morphed frames used
for prediction include dominant motion compensated (DMC)
prediction. In some video coding implementations disclosed herein,
the synthesized frames used for prediction include Super Resolution
(SR) frames and PI (Projected Interpolation) frames. Besides the
issue of exploiting other sources of interframe differences besides
motion, some video coding implementations disclosed herein differ
from standards in other ways as well.
[0044] With regard to dominant motion compensation (DMC), improving
motion compensation for prediction is one of the keys to achieving
higher coding efficiency in recent video coding standard and
solutions. For example, with block-based motion compensation, a
block (such as a 16.times.16 block of pixels) of a current frame
being analyzed is matched during motion estimation to a similar
block in a previously decoded reference frame. The shift from one
frame to the other in the x and y directions with respect to a
block grid, is referred to as a motion vector with `x` and `y`
components referred to as mv.sub.x, and mv.sub.y. The motion
estimation process thus involves estimating motion of blocks to
determine my (mv.sub.x, mvy) for each block. The computed motion
estimates are then efficiently encoded (by first differencing them
with prediction, and entropy coding of the difference) and sent via
bitstream to the decoder where they are decoded (by entropy
decoding and adding the prediction back in) and used for motion
compensation. In highly efficient compression schemes, motion
estimation/compensation is performed with high accuracy (such as
1/4 pixel or 1/8 pixel accuracy rather than integer pixel accuracy)
by use of a fixed or adaptive interpolation filter for generation
of a prediction block. Further, generally the block sizes
themselves may be square or non-square (e.g. 16.times.8,
8.times.16) and of multiple sizes (e.g. 4.times.4, 8.times.8,
16.times.16, and others).
[0045] While H.264 includes several good ways of generating
predictions derived from block-based motion vectors, this may
result in two limitations: (1) block-based motion vectors,
regardless of block sizes or references, are all modeled based on
an assumption of translatory motion, which may disregard
alternative types of motion between frames and resulting in large
prediction error, and (2) block-based motion vectors, while they
provide good compensation of local motion, the inherent substantial
bit-cost associated with the block-based motion can limit the
potential gain that may otherwise be possible. Improvements have
involved use of variable size blocks which helps in reducing
overhead, but the overhead reduction is still quite limited.
[0046] For video content undergoing global motion such as camera
pan (translation), zoom, rotation, or in video content that has
special effects (such as shearing), the block-based translator
motion representation and coding of motion vectors can be
particularly inefficient. Since it was realized that global motion
in a video can present a challenge to block-based prediction due to
a large prediction resulting from a translatory motion model, and
significant amount of motion vector overhead, an alternative
approach was investigated that directly estimates/compensates
global motion due to its potential to adapt to
nontranslatory/complex motion, and a more compact representation of
motion parameters as they are needed only once per picture (note
that herein the term frame is used interchangeably with the term
picture). Among the choice of motion models for global motion, the
two models that offer significant improvements are the affine
model, and the perspective model. The affine model uses six
parameters and is able to address a large range of complex motions
(translation, zoom, shearing, and rotation). In the typical
process, the model results in a warped frame used to form the
predictions by reading the blocks on the warped frame. The
perspective model is more complex than the affine model and in
addition to the motions listed for affine, this method can also
handle a change in perspective. Due to the higher complexity of the
perspective model, it is not discussed here in detail, but in
general it is applicable in the same manner as the affine model.
The details for global motion compensation, at least as used by the
system and process herein, are discussed below.
[0047] While use of an affine model-based global motion
estimation/compensation (GME/C) was a notable improvement for
scenes with global motion over use of block based translatory
motion, video scenes in general can be classified into one of the
three cases: (1) scenes with purely global motion, (2) scenes with
purely local motion, and (3) scenes containing both local and
global motion. Thus, in general, both the global and the local
motion techniques needed to be combined for achieving good
efficiency in video coding. MPEG-4, part2 supports a very basic
combination of global and local motion techniques. Specifically it
supports 16.times.16 luma block (and its optional 8.times.8
sub-block) based local motion estimation/compensation, picture
based affine model developing a global motion trajectory (gmt)
parameters-based motion compensation, and a 16.times.16 block by
block-based flag for local or global motion (1 gm) that allows a
choice of when to use which method.
[0048] While the MPEG-4 part 2 standard represents an improvement
(due to inclusion of Global Motion and other aspects) over past
MPEG or ITU-T standards, it still offers only a limited improvement
in motion compensated prediction due to the following reasons. Even
though a combination of local and global motion compensation is
allowed, the local motion compensation occurs at a very small block
size (16.times.16 size at most). Thus, there is considerable
overhead in signaling, at a 16.times.16 basis, when to use local
versus global motion compensation. This overhead cuts into the
possible gains due to GMC. Also, since P-pictures only use one
reference frame and B-pictures only use two, GMC is limited to
being applied only on immediately past decoded frames. Further,
global motion parameters are computed only once for the entire
picture (including blocks where local motion is found to be more
suitable) causing global motion parameters to be inaccurate often,
especially in the case of a frame that contains a mixture of both
local and global motion. Otherwise, other than using or not using
global motion compensated prediction, no adjustment or correction
of GMC generated prediction is possible. Lastly, the process for
generating interpolation (e.g. 1/4 or 1/8 pel precision) is
simplistic and results in blurry prediction.
[0049] These difficulties are addressed by the new and innovative
approaches used by a NGV video coding system including improved
prediction by dominant motion compensation described herein. By one
example, since global motion in video can present a challenge to
block-based prediction (due to larger prediction resulting from a
translatory motion model, and a significant amount of motion vector
overhead), an alternative approach was developed that directly
estimates and compensates global motion due to its potential of
being able to better adapt to nontranslatory or complex motion, and
a more compact representation of motion parameters is now available
as needed such as once per frame. Among the choice of motion models
for global motion, the two models that offer significant benefits
are still the affine model, and the perspective model. The affine
model uses six parameters, and is able to address a large range of
complex motions, while the perspective model is more complex and
flexible, but can use up to eight parameters. The affine model may
be sufficient for many cases and can allow global compensation for
motion of types such as translation, zoom, shear, and rotation.
[0050] While use of the affine model based global motion estimation
or compensation (GME/C) was a notable improvement for scenes with
global motion over use of block based translatory motion, in
reality both block-based local and global motion is combined here
for best coding efficiency results. Further, the affine model can
also be applied for motion compensation of non-overlapping tiles,
or regions/objects in a scene. This results in multiple global
motion parameter sets, and the process is referred to as performing
dominant motion compensation (DMC).
[0051] As used herein, the term "coder" may refer to an encoder
and/or a decoder. Similarly, as used herein, the term "coding" may
refer to performing video encoding via an encoder and/or performing
video decoding via a decoder. For example, a video encoder and
video decoder may both be examples of coders capable of coding
video data. In addition, as used herein, the term "codec" may refer
to any process, program or set of operations, such as, for example,
any combination of software, firmware, and/or hardware that may
implement an encoder and/or a decoder. Further, as used herein, the
phrase "video data" may refer to any type of data associated with
video coding such as, for example, video frames, image data,
encoded bit stream data, or the like.
[0052] Referring to FIG. 1, an example next generation video
encoder 100, arranged in accordance with at least some
implementations of the present disclosure. As shown, encoder 100
may receive input video 101. Input video 101 may include any
suitable input video for encoding such as, for example, input
frames of a video sequence. As shown, input video 101 may be
received via a content pre-analyzer module 102. Content
pre-analyzer module 102 may be configured to perform analysis of
the content of video frames of input video 101 to determine various
types of parameters for improving video coding efficiency and speed
performance. For example, content pre-analyzer module 102 may
determine horizontal and vertical gradient information (for
example, Rs, Cs), variance, spatial complexity per frame, temporal
complexity per frame (tpcpx), scene change detection, motion range
estimation, gain detection, prediction distance estimation (pdist),
number of objects estimation, region boundary detection, spatial
complexity map computation, focus estimation, film grain
estimation, or the like. The parameters generated by content
pre-analyzer module 102 may be used by encoder 100 (e.g., via
encode controller 103) and/or quantized and communicated to a
decoder. As shown, video frames and/or other data may be
transmitted from content pre-analyzer module 102 to adaptive
picture organizer module 104 (also referred to as the hierarchical
picture group structure organizer). The adaptive organizer module
104 determines the picture group structure and the picture types of
each picture in the group as well as reorder pictures in encoding
order as needed. The adaptive organizer module 104 outputs control
signals indicating the picture group structure and picture types
(the abbreviations for the output/input controls shown on system
100 are recited below). The NGV coding described herein uses
I-pictures (intra-coding), P-pictures (formed from inter-prediction
from past/previous reference frames), and F-pictures (functional as
described below). In some cases, B-pictures might also be used. In
some examples, adaptive picture organizer module 104 may include a
frame portion generator configured to generate frame portions. In
some examples, content pre-analyzer module 102 and adaptive picture
organizer module 104 may together be considered a pre-analyzer
subsystem of encoder 100.
[0053] As shown, video frames and/or other data may be transmitted
from adaptive picture organizer module 104 to prediction partitions
generator module 105. In some examples, prediction partitions
generator module 105 first may divide a frame or picture into tiles
or super-fragments or the like (herein the terms frame, picture,
and image may be used interchangeably except as otherwise noted and
except that a frame is used to generally refer to a frame that is
not necessarily assigned a specific picture type (I, P, F, or
B-pictures for example)). In some examples, an additional module
(for example, between modules 104 and 105) may be provided for
dividing a frame into tiles or super-fragments or the like. By one
example for NGV coding, a frame may be divided into tiles of
32.times.32 or 64.times.64 pixels where 64.times.64 is used for all
standard definition and higher resolution video for coding of all
picture types (I-, P-, or F-). For low resolution sequences,
64.times.64 is still used for coding of I- and F-pictures, while
32.times.32 is used for P-pictures.
[0054] By one example, prediction partitions generator module
(which also may be referred to as Pred KdTree/BiTree Partitions
Generator) 105 may then divide each tile or super-fragment into
potential prediction partitionings or partitions. In some examples,
the potential prediction partitionings may be determined using a
partitioning technique such as, for example, a k-d tree
partitioning technique, a bi-tree partitioning technique, or the
like, which may be determined based on the picture type (for
example, I-, P-, or F-picture) of individual video frames, a
characteristic of the frame portion being partitioned, or the like.
By one example, if an I-picture is being coded, every tile, or
almost all tiles, are further divided into KdTree based partitions
that can divide a space until a set minimum size is reached, and in
one dimension at a time. The options for dividing the space may
include no further division, division into two equal halves,
division into two parts that are 1/4 and 3/4 of the space, or
division into two parts that are 3/4 and 1/4 of the space. So, with
I-pictures using 64.times.64 as the largest size (and allowing a
minimum size of 4.times.4), a very large number of partitions of a
tile can be generated if no other constraints are imposed. For
example, one constraint is to set that the first pair of cuts are
pre-decided for a 64.times.64 tile to halve the space in both the
horizontal and vertical dimension so that four 32.times.32
sub-tiles are formed, and then sub-partitioning each 32.times.32
sub-tile by KdTree partitioning. Other restrictions are also
possible to reduce the number of possible partition
combinations.
[0055] These partitions of an I-picture tile are referred to as
prediction partitions, as each tile partition may be used for
spatial prediction (directional angular prediction or other types
of prediction) and coding of prediction differences. Likewise,
P-picture tiles can also be partitioned in this manner for
prediction except that for lower resolutions, P-picture partitions
start with a 32.times.32 tile, and KdTree based partitions are not
used, but rather a simpler Bi-Tree partitioning is used. Bi-Tree
partitioning divides a space into two equal parts, one dimension at
a time, alternating between the two dimensions. Further P-picture
partitions are mainly predicted using motion (with one or more
references) rather than spatial prediction, although some
sub-partitions can use intra spatial prediction to deal with, for
instance, uncovered background. For standard definition to higher
resolution picture sizes, P-pictures start with 64.times.64 tiles
before being divided. Finally, F-pictures also use Bi-Tree
partitioning and start with 64.times.64 tiles for generating
prediction partitions that mainly use motion (with one or more
partitions), although some sub-partitions can also use spatial
prediction (for intra coding).
[0056] Alternatively, where the methods described herein may be
performed on HEVC where a largest coding unit (LCU), also called a
coding tree unit (CTU), may be divided into coding tree blocks
(CTBs) which are themselves divided into coding units (CUs). Such
an LCU may be 64.times.64 pixels. Thus, a tile as used herein
covers HEVC and generally refers to a large block such as an LCU or
at least a block larger than a macroblock (MB) of 16.times.16
pixels unless the context suggests otherwise.
[0057] In NGV coding, there is much more to generation of inter
prediction data than simply using motion vectors to generate
prediction, and is discussed elsewhere. In P- and F-picture coding,
each sub-partition's prediction is identified by including a
prediction mode. The prediction modes include skip, auto, intra,
inter, multi, and split. Skip mode is used to skip prediction
coding when, for example, there is no, or relatively little change,
from a reference frame to a current frame being reconstructed so
that the pixel data need not be encoded and merely copied from one
frame to the other when decoded. Auto mode is used when only
partial data is needed so that for example, motion vectors may not
be needed but transform coefficients are still used to code the
data. Intra mode means that the frame or partition is spatially
coded. Split means a frame or partition needs to be split into
smaller parts or partitions before being coded. Inter mode means
that multiple reference frames are determined for a current frame,
and motion estimations are obtained by using each reference
separately, and then the best result is used for the motion
prediction data. Multi mode also uses multiple reference frames,
but in this case, the motion estimation data from the multiple
reference frames is combined, such as averaged, or weighted
averaged, to obtain a single result to be used for the
prediction.
[0058] One of the outputs of prediction partitions generator module
105 may be hundreds of potential partitionings (and more or less
depending on the limits placed on the partitioning) of a tile.
These partitionings are indexed as I . . . m and are provided to
the encode controller 103 to select the best possible prediction
partitioning for use. As mentioned, the determined potential
prediction partitionings may be partitions for prediction (for
example, inter- or intra-prediction) and may be described as
prediction partitions or prediction blocks or the like.
[0059] In some examples, a selected prediction partitioning (for
example, prediction partitions) may be determined from the
potential prediction partitionings. For example, the selected
prediction partitioning may be based on determining, for each
potential prediction partitioning, predictions using
characteristics and motion based multi-reference predictions or
intra-predictions, and determining prediction parameters. For each
potential prediction partitioning, a potential prediction error may
be determined by differencing original pixels with prediction
pixels, and the selected prediction partitioning may be the
potential prediction partitioning with the minimum prediction
error. In other examples, the selected prediction partitioning may
be determined based on a rate distortion optimization including a
weighted scoring based on number of bits for coding the
partitioning and a prediction error associated with the prediction
partitioning.
[0060] As shown, the original pixels of the selected prediction
partitioning (for example, prediction partitions of a current
frame) may be differenced with predicted partitions (for example, a
prediction of the prediction partition of the current frame based
on a reference frame or frames and other predictive data such as
inter- or intra-prediction data) at differencer 106. The
determination of the predicted partitions will be described further
below and may include a decode loop 135 as shown in FIG. 1. As to
the differences, the original partitioned blocks also are
differenced with the prediction blocks to determine whether or not
any residual signal exists that warrants encoding. Thus, not all
sub-partitions of a tile actually need to be coded (using transform
coding for example) as prediction may have been sufficient for
certain sub-partitions.
[0061] Otherwise, any residuals or residual data (for example,
partition prediction error data) from the differencing that
indicate that the partition cannot be compensated by prediction
alone (such as motion compensation alone) may be transmitted to
coding partitions generator module (or by one example, coding
bitree partitions generator) 107 to be further sub-partitioned into
smaller partitions for transform coding (coding partitions), and
particularly for P-pictures and F-pictures by one example. In P- or
F-pictures or frames, in some cases where very simple content
and/or large quantizer step sizes exist, the coding partitions may
equal the size of the entire tile, or the coding partitions and
prediction partitions may have the same size in these cases. Thus,
some P- and F-picture tiles may contain no coding partitioning, one
coding partitioning, or multiple coding partitionings. These coding
partitions are indexed as I . . . n, and are provided to encode
controller 103 to select the best possible combination of
prediction and coding partitioning from the given choices.
[0062] Also, in some of these examples, such as for
intra-prediction of prediction partitions in any picture type (I-,
F- or P-pictures), or otherwise where prediction partitions are not
further divided into coding partitions (where coding partitions are
skipped), coding partitions generator module 107 may be bypassed
via switches 107a and 107b. In such examples, only a single level
of partitioning may be performed. Such partitioning, where only a
single level of partitioning exists, it may be described as
prediction partitioning (as discussed) or coding partitioning or
both. In various examples, such partitioning may be performed via
prediction partitions generator module 105 (as discussed) or, as is
discussed further herein, such partitioning may be performed via a
k-d tree intra-prediction/coding partitioner module or a bi-tree
intra-prediction/coding partitioner module implemented via coding
partitions generator module 107.
[0063] In some examples, the partition prediction error data, if
any, may not be significant enough to warrant encoding. In other
examples, where it may be desirable to encode the partition
prediction error data and the partition prediction error data is
associated with inter-prediction or the like, coding partitions
generator module 107 may determine coding partitions of the
prediction partitions. In some examples, coding partitions
generator module 107 may not be needed as the partition may be
encoded without coding partitioning (e.g., as shown via the bypass
path available via switches 107a and 107b). With or without coding
partitioning, the partition prediction error data (which may
subsequently be described as coding partitions in either event) may
be transmitted to adaptive transform module 108 in the event the
residuals or residual data require encoding. In some examples,
prediction partitions generator module 105 and coding partitions
generator module 107 may together be considered a partitioner
subsystem of encoder 100. In various examples, coding partitions
generator module 107 may operate on partition prediction error
data, original pixel data, residual data, or wavelet data. Coding
partitions generator module 107 may generate potential coding
partitionings (for example, coding partitions) of, for example,
partition prediction error data using bi-tree and/or k-d tree
partitioning techniques or the like.
[0064] After the partitioning (after prediction partitions are
formed for I-pictures, and coding partitions are formed for P- and
F-pictures, and in some examples, the potential coding partitions),
the partitions may be transformed using adaptive or fixed
transforms with various block sizes via adaptive transform module
108 (also, in one form, referred to as the Adaptive Multi-size Rect
Hybrid Parametric Haar Transform (HPHT)/Discrete Cosine Transform
(DCT) unit). By one approach, the adaptive transform module 108 may
perform forward HPHT or forward DCT on rectangular blocks. By one
example, partition/block size as well as selected transforms (for
example, adaptive or fixed, and HPHT or DCT) may be determined
based on a rate distortion optimization (RDO) or other basis. In
some examples, both the selected coding partitioning and/or the
selected transform(s) may be determined based on a predetermined
selection method based on coding partitions size or the like. For
example, adaptive transform module 108 may include a first portion
or component for performing a parametric transform to allow locally
optimal transform coding of small to medium size blocks, and a
second portion or component for performing globally stable, low
overhead transform coding using a fixed transform, such as DCT or a
picture based transform from a variety of transforms, including
parametric transforms, or any other configuration. In some
examples, for locally optimal transform coding, HPHT may be
performed. In some examples, transforms may be performed on 2D
blocks of rectangular sizes between about 4.times.4 pixels and
64.times.64 pixels, with actual sizes depending on a number of
factors such as whether the transformed data is luma or chroma, or
inter or intra, or if the determined transform used is PHT or DCT
or the like.
[0065] For HPHT transform, small to medium block sizes are
supported while for DCT transform a large number of block sizes are
supported. For HPHT transform, some overhead is needed to identify
the direction, either horizontal or vertical in which DCT is
applied while the PHT is applied in the orthogonal direction, as
well as the mode (at least for intra-coding where a mode can be
based on decoded pixels or prediction difference pixels). The
actual PHT transform basis used for transforming a particular block
may be content adaptive as it depends on decoded neighboring
pixels. Since both encoder and decoder require calculation of the
same basis matrix, the complexity of the calculation is kept low by
allowing a limited number of good transforms known (to both encoder
and decoder) that one can select from.
[0066] As shown, the resultant transform coefficients may be
transmitted to adaptive quantize module 109, while a quantizer
adapter control 133 at the encode controller 103 performs analysis
of content to come up with locally adaptive quantization parameters
that are then represented by a multi-level map that can be
efficiently coded and included in the bitstream. The computed
quantizer set (qs, and a matrix applied to a coefficient block) may
be used by the adaptive quantizer module 109 to perform scaling of
the resultant transform coefficients. Further, any data associated
with a parametric transform, as needed, may be transmitted to
either adaptive quantize module 109 (if quantization is desired) or
adaptive entropy encoder module 110. Also as shown in FIG. 1, the
quantized coefficients may be scanned and transmitted to adaptive
entropy encoder module 110. Adaptive entropy encoder module 110 may
entropy encode the quantized coefficients and include them in
output bitstream 111. In some examples, adaptive transform module
108 and adaptive quantize module 109 may together be considered a
transform encoder subsystem of encoder 100.
[0067] As also shown in FIG. 1, encoder 100 includes the local
decode loop 135 to form predicted partitions (or frames) for
comparison to the prediction partitions as mentioned above.
Preliminarily, depending on the RDO operation, not all of the
hundreds or more tile partitions described above need to be fully
coded such as when lookup of bitcounts are sufficient. Once the
best partitioning of a tile is determined, however, in that case
full coding may be provided.
[0068] The local decode loop 135 may begin at adaptive inverse
quantize module 112. Adaptive inverse quantize module 112 may be
configured to perform the opposite operation(s) of adaptive
quantize module 109 such that an inverse scan may be performed and
quantized coefficients may be de-scaled to determine transform
coefficients. Such an adaptive quantize operation may be lossy, for
example. As shown, the transform coefficients may be transmitted to
an adaptive inverse transform module 113. Adaptive inverse
transform module 113 may perform the inverse transform as that
performed by adaptive transform module 108, for example, to
generate residuals or residual values or partition prediction error
data (or original data or wavelet data, as discussed) associated
with coding partitions. In some examples, adaptive inverse quantize
module 112 and adaptive inverse transform module 113 may together
be considered a transform decoder subsystem of encoder 100.
[0069] As shown, the partition prediction error data (or the like)
for P and F-pictures may be transmitted to optional coding
partitions assembler 114. Coding partitions assembler 114 may
assemble coding partitions into decoded prediction partitions as
needed (as shown, in some examples, coding partitions assembler 114
may be skipped such as for I-picture tile partitioning, and via
switches 114a and 114b such that decoded prediction partitions may
have been generated at adaptive inverse transform module 113) to
generate prediction partitions of prediction error data or decoded
residual prediction partitions or the like. As shown, the decoded
residual prediction partitions (inter or intra) may be added to
predicted partitions (for example, prediction pixel data) at adder
115 to generate reconstructed prediction partitions. The
reconstructed prediction partitions may be transmitted to
prediction partitions assembler 116. Prediction partitions
assembler 116 may assemble the reconstructed prediction partitions
to generate reconstructed tiles or super-fragments. In some
examples, coding partitions assembler module 114 and prediction
partitions assembler module 116 may together be considered an
un-partitioner sub-system of encoder 100.
[0070] The next set of steps involve filtering, and intermingling
of filtering and prediction generation. Overall four types of
filtering are shown. Specifically, in FIG. 1, the reconstructed
partitions are deblocked and dithered by a blockiness analyzer
& deblock filtering module (also Recon Blockiness Analyzer
& DD Filt Gen) 117. The resulting parameters for analysis ddi
are used for filtering operation and are also coded and sent to the
decoder via the bitstream 111. The deblocked reconstructed output
is then handed over to the quality analyzer & quality
restoration filtering module (or quality improvement filter also
referred to as Recon Quality Analyzer & QR Filt Gen) 118, which
computes QR filtering parameters and uses them for filtering. These
parameters are also coded and sent via the bitstream 111 to the
decoder. The QR filtered output is the final reconstructed or
decoded frame that is also used as a prediction for coding future
frames.
[0071] More specifically, when the reconstructed tiles or
super-fragments may be transmitted to blockiness analyzer and
deblock filtering module 117, the blockiness analyzer and deblock
filtering module 117 may deblock and dither the reconstructed tiles
or super-fragments (or prediction partitions of tiles or
super-fragments). The generated deblock and dither filter
parameters may be used for the current filter operation and/or
coded in bitstream 111 for use by a decoder, for example. The
output of blockiness analyzer and deblock filtering module 117 may
be transmitted to the quality analyzer and quality restoration
filtering module 118. Quality analyzer and quality restoration
filtering module 118 may determine QR filtering parameters (for
example, for a QR decomposition) and use the determined parameters
for filtering. The QR filtering parameters may also be coded in
bitstream 111 for use by a decoder. In some examples, blockiness
analyzer and deblock filtering module 117 and quality analyzer and
quality restoration filtering module 118 may together be considered
a filtering subsystem of encoder 100. In some examples, the output
of quality analyzer and quality restoration filtering module 118
may be a final reconstructed frame that may be used for prediction
for coding other frames (for example, the final reconstructed frame
may be a reference frame or the like). Thus, as shown, the output
of quality analyzer and quality restoration filtering module 118
may be transmitted to a multi-reference frame storage and frame
selector (or multi reference control) 119 which also may be
referred to as, or may include, the decoded picture storage or
buffer. A dependency logic module 128 (also referred to, in one
example, as dependency logic for mod multi ref pred in hierarchical
picture group struct) may provide indices for listing the reference
frames and the relationship among the frames such as frame
dependencies, or more specifically partition dependencies, for
proper ordering and use for the frames by the multi reference
control 119 and when certain frames are to be selected for
prediction of another frame. This may include providing the
dependency logic for picture group structures such as
multi-reference prediction, chain prediction, hierarchal
structures, and/or other prediction techniques as described
below.
[0072] Next, encoder 100 may perform inter- and/or intra-prediction
operations. As shown in FIG. 1, inter-prediction may be performed
by one or more modules including morphing generation and local
buffer module 120 (and in one example is referred to as Morph Gen
& Loc Buf, or referred to herein as the in-loop morphing
generation module), synthesizing generation and local buffer module
121 (and in one example is referred to as Synth Gen & Pic
Buffer or referred to herein as in-loop synthesizing generation
module), motion estimator 122, characteristics and motion filtering
and predictor module 123 (also in some examples may be referred to
as Char and Motion AP Filter Analyzer & 1/4 & 1/8 Pel
Compensated Predictor), morphing analyzer and generation module (or
out-of-loop morphing analyzer module) 130, and synthesizing
analyzer and generation module (or out-of-loop synthesizing
analyzer module) 132, where the morphing and synthesis generators
120 and 121 are considered in-loop (in the decoder loop of the
encoder), and where the morphing and synthesis analyzers 130 and
132 are considered out-of-loop (out of the decoder loop at the
encoder). Note that while one is called an analyzer and the other a
generator, both in-loop and out-of-loop modules may perform the
same or similar tasks (forming modified frames and modification
parameters for morphing and/or synthesis). Using these components,
morphing generation module 120, or morphing analyzer 130, may
permit various forms of morphing of a decoded frame to then be used
as a reference frame for motion prediction on other frames. The
module 120 may analyze a current picture to determine morphing
parameters for (1) changes in gain, and specifically to perform
gain compensation for changes in brightness from one frame to
another frame, (2) changes in dominant (or global) motion and as
discussed in detail below, (3) changes in registration, and/or (4)
changes in blur with respect to a reference frame or frames with
which it is to be coded, and prior to motion compensated
prediction.
[0073] The out-of-loop morphing analyzer 130 and the synthesizing
analyzer 132 receive picture group structure data from the adaptive
picture organizer 104 and communicate with the encoder controller
103 to form the morphing and synthesis parameters (mop, syp) and
modified reference frames based on the non-quantized, non-decoded,
original frame data. The formation of the modified reference frames
and modification parameters from the out-of-loop morphing and
synthesis analyzers 130 and 132 may be much faster than that
provided through the decoder loop 135, and this is especially
advantageous for real time encoding. However, the use of the
modified frames and parameters to perform compensation at another
location, such as by a decoder, should be performed by the in-loop
morphing and synthesis generators 120 and 121 on the decoding loop
side of the encoder so that the correct compensation can be
repeated when reconstructing frames at the decoder. Thus, the
resulting modification parameters from the out-of-loop analyzers
130 and 132 are used by the in-loop morphing and synthesizing
generator 120 and 121 to form the modified reference frames and for
motion estimation by the motion estimator 122 to compute motion
vectors. Thus, the computed morphing and synthesis parameters (mop
and syp) may be quantized/de-quantized and used (for example, by
morphing generation module 120) to generate morphed reference
frames that may be used by motion estimator module 122 for
computing motion vectors for efficient motion (and characteristics)
compensated prediction of a current frame. The synthesizing
generation module 121 uses several types of synthesized frames
including super resolution (SR) pictures, projected interpolation
(PI) pictures, among others in which motion compensated prediction
can result in even higher gains by determining motion vectors for
efficient motion compensated prediction in these frames. The
details for some examples to perform morphing or synthesis are
provided below
[0074] Motion estimator module 122 may generate motion vector data
based at least in part on morphed reference frame(s) and/or super
resolution (SR) pictures and projected interpolation (PI) pictures
along with the current frame. In some examples, motion estimator
module 122 may be considered an inter-prediction module. For
example, the motion vector data may be used for inter-prediction.
If inter-prediction is applied, characteristics and motion
filtering predictor module 123 may apply motion compensation as
part of the local decode loop as discussed. Also, characteristics
and motion filtering predictor module 123 may include adaptive
precision (AP) filtering where filtering and prediction are
intertwined. The filtering parameters (api) are coded and may be
sent to the decoder via the bitstream 111.
[0075] Intra-prediction may be performed by intra-directional
prediction analyzer and prediction generation module 124.
Intra-directional prediction analyzer and prediction generation
module 124 may be configured to perform spatial directional
prediction and may use decoded neighboring partitions. In some
examples, both the determination of direction and generation of
prediction may be performed by intra-directional prediction
analyzer and prediction generation module 124. In some examples,
intra-directional prediction analyzer and prediction generation
module 124 may be considered an intra-prediction module.
[0076] As shown in FIG. 1, prediction modes and reference types
analyzer module 125 may allow for selection of prediction modes as
introduced above and from among, "skip", "auto", "inter", "split",
"multi", and "intra", for each prediction partition of a tile (or
super-fragment), all of which may apply to P- and F-pictures (as
well as B-pictures when they are present). It should be noted that
while the system considers a configuration where I, P, and F
picture are available, it is possible to still provide B-pictures
where no morphing or synthesis is available for the B-pictures. In
addition to prediction modes, it also allows for selection of
reference types that can be different depending on "inter" or
"multi" mode, as well as for P- and F-pictures. The prediction
signal at the output of prediction modes and reference types
analyzer module 125 may be filtered by prediction analyzer and
prediction fusion filtering module 126. Prediction analyzer and
prediction fusion filtering module 126 may determine parameters
(for example, filtering coefficients, frequency, overhead) to use
for filtering and may perform the filtering. In some examples,
filtering the prediction signal may fuse different types of signals
representing different modes (e.g., intra, inter, multi, split,
skip, and auto). In some examples, intra-prediction signals may be
different than all other types of inter-prediction signal(s) such
that proper filtering may greatly enhance coding efficiency. In
some examples, the filtering parameters may be encoded in bitstream
111 for use by a decoder. The filtered prediction signal may
provide the second input (e.g., prediction partition(s)) to
differencer 106, as discussed above, that may determine the
prediction difference signal (e.g., partition prediction error) for
coding discussed earlier. Further, the same filtered prediction
signal may provide the second input to adder 115, also as discussed
above. As discussed, output bitstream 111 may provide an
efficiently encoded bitstream for use by a decoder for the
presentment of video.
[0077] In operation, some components of encoder 100 may operate as
an encoder prediction subsystem. For example, such an encoder
prediction subsystem of encoder 100 may include multi-reference
frame storage and frame selector 119, in-loop morphing analyzer and
generation module 120, in-loop synthesizing analyzer and generation
module 121, motion estimator module 122, and/or characteristics and
motion compensated precision adaptive filtering predictor module
123 as well as out-of-loop morphing analyzer 130 and synthesizing
analyzer 132.
[0078] As will be discussed in greater detail below, in some
implementations, such an encoder prediction subsystem of encoder
100 may incorporate a number of components and the combined
predictions generated by these components in an efficient video
coding algorithm. For example, proposed implementation of the NGV
coder may include one or more of the following features: 1. Gain
Compensation (e.g., explicit compensation for changes in
gain/brightness in a scene); 2. Blur Compensation: e.g., explicit
compensation for changes in blur/sharpness in a scene; 3.
Dominant/Global Motion Compensation (e.g., explicit compensation
for dominant motion in a scene); 4. Registration Compensation
(e.g., explicit compensation for registration mismatches in a
scene); 5. Super Resolution (e.g., explicit model for changes in
resolution precision in a scene); 6. Projection (e.g., explicit
model for changes in motion trajectory in a scene); the like,
and/or combinations thereof.
[0079] For example, in such an encoder prediction subsystem of
encoder 100, the output of quality analyzer and quality restoration
filtering may be transmitted to multi-reference frame storage and
frame selector 119. In some examples, the output of quality
analyzer and quality restoration filtering may be a final
reconstructed frame that may be used for prediction for coding
other frames (e.g., the final reconstructed frame may be a
reference frame or the like). In encoder 100, prediction operations
may include inter- and/or intra-prediction. As shown,
inter-prediction may be performed by one or more modules including
morphing generation module 120, synthesizing generation module 121,
and/or characteristics and motion compensated precision adaptive
filtering predictor module 123.
[0080] As will be described in greater detail below, morphing
generation module 120 may analyze a current frame to determine
parameters for changes in gain, changes in dominant motion, changes
in registration, and changes in blur with respect to a reference
frame or frames with which it is to be coded. The determined
morphing parameters may be quantized/de-quantized and used (e.g.,
by morphing generation module 120) to generate morphed reference
frames. Such generated morphed reference frames may be stored in a
buffer and may be used by motion estimator module 122 for computing
motion vectors for efficient motion (and characteristics)
compensated prediction of a current frame.
[0081] Similarly, synthesizing analyzer and generation module 121
may generate super resolution (SR) pictures and projected
interpolation (PI) pictures or the like for determining motion
vectors for efficient motion compensated prediction in these
frames. Such generated synthesized reference frames may be stored
in a buffer and may be used by motion estimator module 122 for
computing motion vectors for efficient motion (and characteristics)
compensated prediction of a current frame.
[0082] Accordingly, in such an encoder prediction subsystem of
encoder 100, motion estimator module 122 may generate motion vector
data based on morphed reference frame(s) and/or super resolution
(SR) pictures and projected interpolation (PI) pictures along with
the current frame. In some examples, motion estimator module 122
may be considered an inter-prediction module. For example, the
motion vector data may be used for inter-prediction. If
inter-prediction is applied, characteristics and motion filtering
predictor module 123 may apply motion compensation as part of the
local decode loop as discussed.
[0083] In operation, the proposed implementation of the NGV coder
(e.g., encoder 100 and/or decoder 200) may use one or more of the
above components besides the usual local motion compensation with
respect to decoded past and/or future, picture/slices. As such the
implementation does not mandate a specific solution for instance
for dominant motion compensation, or for any other characteristics
compensated reference frame generation.
[0084] FIG. 1 illustrates example control signals associated with
operation of video encoder 100, where the following abbreviations
may represent the associated information: [0085] scnchg Scene
change information [0086] spcpx Spatial complexity information
[0087] tpcpx Temporal complexity information [0088] pdist Temporal
prediction distance information [0089] pap Pre Analysis parameters
(placeholder for all other pre analysis parameters except scnchg,
spcpx, tpcpx, pdist) [0090] ptyp Picture types information [0091]
pgst Picture group structure information [0092] pptn cand.
Prediction partitioning candidates [0093] cptn cand. Coding
Partitioning Candidates [0094] prp Preprocessing [0095] xmtyp
Transform type information [0096] xmdir Transform direction
information [0097] xmmod Transform mode [0098] ethp One eighth
(1/8th) pel motion prediction [0099] pptn Prediction Partitioning
[0100] cptn Coding Partitioning [0101] mot & cod cost Motion
and Coding Cost [0102] qs quantizer information set (includes
Quantizer parameter (Qp), Quantizer matrix (QM) choice) [0103] mv
Motion vectors [0104] mop Morphing Parameters [0105] syp
Synthesizing Parameters [0106] ddi Deblock and dither information
[0107] qri Quality Restoration filtering index/information [0108]
api Adaptive Precision filtering index/information [0109] flu
Fusion Filtering index/information [0110] mod Mode information
[0111] reftyp Reference type information [0112] idir Intra
Prediction Direction
[0113] The various signals and data items that may need to be sent
to the decoder, i.e., pgst, ptyp, prp, pptn, cptn, modes, reftype,
ethp, xmtyp, xmdir, xmmod, idir, mv, qs, mop, syp, ddi, qri, api,
fii, quant coefficients and others may then be entropy encoded by
adaptive entropy encoder 110 that may include different entropy
coders collectively referred to as an entropy encoder subsystem.
The adaptive entropy encoder 110 may be used to encode various
types of control data/signals, parameters, modes and ref types,
motion vectors, and transform coefficients. It is based on a
generic class of low complexity entropy coders called adaptive
variable length coders (vlc). The data to be entropy coded may be
divided into several categories when convenient (seven in our
case), and starting from generic vlc coders, specialized coders are
developed for each category. While these control signals are
illustrated as being associated with specific example functional
modules of encoder 100 in FIG. 1, other implementations may include
a different distribution of control signals among the functional
modules of encoder 300. The present disclosure is not limited in
this regard and, in various examples, implementation of the control
signals herein may include the undertaking of only a subset of the
specific example control signals shown, additional control signals,
and/or in a different arrangement than illustrated.
[0114] FIG. 2 is an illustrative diagram of an example next
generation video decoder 200, arranged in accordance with at least
some implementations of the present disclosure and that utilizes
the content adaptive P- and F-pictures and resulting picture groups
herein. The general operation of this NGV decoder 200 may be
similar to the local decoding loop in the NGV Encoder 100 discussed
earlier with the caveat that the motion compensation decoding loop
in a decoder does not require any components that require analysis
to determine parameters as the parameters are actually sent via the
bitstream 111 or 201 to decoder 200. The bitstream 201 to be
decoded is input to adaptive entropy encoder (Content and Context
Adaptive Entropy Decoder) 202 which decodes headers, control
signals and encoded data. For instance, it decodes ptyp, pgst, prp,
pptn, cptn, ethp, mop, syp, mod, reftyp, idir, qs, xmtyp, xmdir,
xmmod, ddi, qri, api, fii, mv, listed above, and quantized
transform coefficients that constitute the overhead, control
signals and data that is distributed for use throughout the
decoder. The quantized transform coefficients are then inverse
quantized and inverse transformed by adaptive inverse quantize
module 203 and adaptive inverse transform (also Adaptive Multi-size
Rect HPHT/DCT) 204 to produce rectangular partitions of decoded
pixel differences that are assembled as per coding partitioning
used. Predictions are added to the differences resulting in
generation of recon (reconstructed) coded partitions that undergo
further reassembly as per motion partitioning to generate
reconstructed tiles and frames that undergo deblocking and
dithering in deblocking filter (Recon DD Filt Gen) 208 using
decoded ddi parameters, followed by quality restoration filtering
(or Recon QR Filt Gen) 209 using decoded qri parameters, a process
that creates the final recon frames.
[0115] The final recon frames are saved in multi-reference frame
storage and frame selector (also may be called decoded picture
buffer) 210, and are used (or morphed) to create morphed
pictures/local buffers (at morphed picture generator and buffer
211) depending on the applied, decoded mop parameters. Likewise
synthesized picture and local buffers (at synthesized picture
generation and buffer 212) are created by applying decoded syp
parameters to multi-reference frame storage and frame selector 210
(or in other words, the reconstructed frames in the storage or
buffer 210). A dependency logic 220 may hold the index for, and
perform the indexing for, the stored frames in the multi-reference
frame storage 210. The indexing may be used for prediction
techniques such as multi-reference frames, chain prediction and/or
hierarchal (or pyramid) frame structures, and/or others as
described below. The morphed local buffers and synthesized frames
are used for motion compensated prediction that uses adaptive
precision (AP) filtering based on api parameters, and keeps either
1/4 or 1/8 pel prediction depending on a decoded the ethp signal.
In fact, a characteristics and motion compensated filtering
predictor 213, depending on the mod, generates "inter" multi"
"skip" or "auto" partitions while an intra-directional prediction
generation module 214 generates "intra" partitions, and prediction
modes selector 215, based on an encoder selected option, allows
partition of the correct mode to pass through. Next, selective use
of prediction fusion filter generation module (or Pred FI Filter
Gen) 216 to filter and output the prediction is performed as needed
as the second input to the adder.
[0116] The recon frames at the output of the quality filter
generation module 209 (or Recon QR Filt Gen) are reordered (as
F-pictures are out of order) by adaptive picture reorganizer (or
Hierarchical Picture Group Structure Reorganizer) 217 in response
to control parameters of ptyp and pgst, and further the output of
this reorganizer undergoes optional processing in content post
restorer 218 that is controlled by prp parameters sent by the
encoder. This processing among other things may include deblocking
and film grain addition.
[0117] More specifically, and as shown, decoder 200 may receive an
input bitstream 201. In some examples, input bitstream 201 may be
encoded via encoder 100 and/or via the encoding techniques
discussed herein. As shown, input bitstream 201 may be received by
an adaptive entropy decoder module 202. Adaptive entropy decoder
module 202 may decode the various types of encoded data (e.g.,
overhead, motion vectors, transform coefficients, etc.). In some
examples, adaptive entropy decoder 202 may use a variable length
decoding technique. In some examples, adaptive entropy decoder 202
may perform the inverse operation(s) of adaptive entropy encoder
module 110 discussed above.
[0118] The decoded data may be transmitted to adaptive inverse
quantize module 203. Adaptive inverse quantize module 203 may be
configured to inverse scan and de-scale quantized coefficients to
determine transform coefficients. Such an adaptive quantize
operation may be lossy, for example. In some examples, adaptive
inverse quantize module 203 may be configured to perform the
opposite operation of adaptive quantize module 109 (e.g.,
substantially the same operations as adaptive inverse quantize
module 112). As shown, the transform coefficients (and, in some
examples, transform data for use in a parametric transform) may be
transmitted to an adaptive inverse transform module 204. Adaptive
inverse transform module 204 may perform an inverse transform on
the transform coefficients to generate residuals or residual values
or partition prediction error data (or original data or wavelet
data) associated with coding partitions. In some examples, adaptive
inverse transform module 204 may be configured to perform the
opposite operation of adaptive transform module 108 (e.g.,
substantially the same operations as adaptive inverse transform
module 113). In some examples, adaptive inverse transform module
204 may perform an inverse transform based on other previously
decoded data, such as, for example, decoded neighboring partitions.
In some examples, adaptive inverse quantize module 203 and adaptive
inverse transform module 204 may together be considered a transform
decoder subsystem of decoder 200.
[0119] As shown, the residuals or residual values or partition
prediction error data may be transmitted to coding partitions
assembler 205. Coding partitions assembler 205 may assemble coding
partitions into decoded prediction partitions as needed (as shown,
in some examples, coding partitions assembler 205 may be skipped
via switches 205a and 205b such that decoded prediction partitions
may have been generated at adaptive inverse transform module 204).
The decoded prediction partitions of prediction error data (e.g.,
prediction partition residuals) may be added to predicted
partitions (e.g., prediction pixel data) at adder 206 to generate
reconstructed prediction partitions. The reconstructed prediction
partitions may be transmitted to prediction partitions assembler
207. Prediction partitions assembler 207 may assemble the
reconstructed prediction partitions to generate reconstructed tiles
or super-fragments. In some examples, coding partitions assembler
module 205 and prediction partitions assembler module 207 may
together be considered an un-partitioner subsystem of decoder
200.
[0120] The reconstructed tiles or super-fragments may be
transmitted to deblock filtering module 208. Deblock filtering
module 208 may deblock and dither the reconstructed tiles or
super-fragments (or prediction partitions of tiles or
super-fragments). The generated deblock and dither filter
parameters may be determined from input bitstream 201, for example.
The output of deblock filtering module 208 may be transmitted to a
quality restoration filtering module 209. Quality restoration
filtering module 209 may apply quality filtering based on QR
parameters, which may be determined from input bitstream 201, for
example. As shown in FIG. 2, the output of quality restoration
filtering module 209 may be transmitted to multi-reference frame
storage and frame selector (which may be referred to as a
multi-reference control, and may be, or may include, a decoded
picture buffer) 210. In some examples, the output of quality
restoration filtering module 209 may be a final reconstructed frame
that may be used for prediction for coding other frames (e.g., the
final reconstructed frame may be a reference frame or the like). In
some examples, deblock filtering module 208 and quality restoration
filtering module 209 may together be considered a filtering
subsystem of decoder 200.
[0121] As discussed, compensation due to prediction operations may
include inter- and/or intra-prediction compensation. As shown,
inter-prediction compensation may be performed by one or more
modules including morphing generation module 211, synthesizing
generation module 212, and characteristics and motion compensated
filtering predictor module 213. Morphing generation module 211 may
use de-quantized morphing parameters (e.g., determined from input
bitstream 201) to generate morphed reference frames. Synthesizing
generation module 212 may generate super resolution (SR) pictures
and projected interpolation (PI) pictures or the like based on
parameters determined from input bitstream 201. If inter-prediction
is applied, characteristics and motion compensated filtering
predictor module 213 may apply motion compensation based on the
received frames and motion vector data or the like in input
bitstream 201.
[0122] Intra-prediction compensation may be performed by
intra-directional prediction generation module 214.
Intra-directional prediction generation module 214 may be
configured to perform spatial directional prediction and may use
decoded neighboring partitions according to intra-prediction data
in input bitstream 201.
[0123] As shown in FIG. 2, prediction modes selector module 215 may
determine a prediction mode selection from among, "skip", "auto",
"inter", "multi", and "intra", for each prediction partition of a
tile, all of which may apply to P- and F-pictures, based on mode
selection data in input bitstream 201. In addition to prediction
modes, it also allows for selection of reference types that can be
different depending on "inter" or "multi" mode, as well as for P-
and F-pictures. The prediction signal at the output of prediction
modes selector module 215 may be filtered by prediction fusion
filtering module 216. Prediction fusion filtering module 216 may
perform filtering based on parameters (e.g., filtering
coefficients, frequency, overhead) determined via input bitstream
201. In some examples, filtering the prediction signal may fuse
different types of signals representing different modes (e.g.,
intra, inter, multi, skip, and auto). In some examples,
intra-prediction signals may be different than all other types of
inter-prediction signal(s) such that proper filtering may greatly
enhance coding efficiency. The filtered prediction signal may
provide the second input (e.g., prediction partition(s)) to
differencer 206, as discussed above.
[0124] As discussed, the output of quality restoration filtering
module 209 may be a final reconstructed frame. Final reconstructed
frames may be transmitted to an adaptive picture re-organizer 217,
which may re-order or re-organize frames as needed based on
ordering parameters in input bitstream 201. Re-ordered frames may
be transmitted to content post-restorer module 218. Content
post-restorer module 218 may be an optional module configured to
perform further improvement of perceptual quality of the decoded
video. The improvement processing may be performed in response to
quality improvement parameters in input bitstream 201 or it may be
performed as standalone operation. In some examples, content
post-restorer module 218 may apply parameters to improve quality
such as, for example, an estimation of film grain noise or residual
blockiness reduction (e.g., even after the deblocking operations
discussed with respect to deblock filtering module 208). As shown,
decoder 200 may provide display video 219, which may be configured
for display via a display device (not shown).
[0125] In operation, some components of decoder 200 may operate as
a decoder prediction subsystem. For example, such a decoder
prediction subsystem of decoder 200 may include multi-reference
frame storage and frame selector 210, dependency logic 220 to index
the frames at the multi-reference frame storage and frame selector
210, morphing analyzer and generation module 211, synthesizing
analyzer and generation module 212, and/or characteristics and
motion compensated precision adaptive filtering predictor module
213.
[0126] As will be discussed in greater detail below, in some
implementations, such a decoder prediction subsystem of decoder 200
may incorporate a number of components and the combined predictions
generated by these components in an efficient video coding
algorithm. For example, proposed implementation of the NGV coder
may include one or more of the following features: 1. Gain
Compensation (e.g., explicit compensation for changes in
gain/brightness in a scene); 2. Blur Compensation: e.g., explicit
compensation for changes in blur/sharpness in a scene; 3.
Dominant/Global Motion Compensation (e.g., explicit compensation
for dominant motion in a scene); 4. Registration Compensation
(e.g., explicit compensation for registration mismatches in a
scene); 5. Super Resolution (e.g., explicit model for changes in
resolution precision in a scene); 6. Projection (e.g., explicit
model for changes in motion trajectory in a scene); the like,
and/or combinations thereof.
[0127] For example, in such a decoder prediction subsystem of
decoder 200, the output of quality restoration filtering module may
be transmitted to multi-reference frame storage and frame selector
210. In some examples, the output of quality restoration filtering
module may be a final reconstructed frame that may be used for
prediction for coding other frames (e.g., the final reconstructed
frame may be a reference frame or the like). As discussed,
compensation due to prediction operations may include inter- and/or
intra-prediction compensation. As shown, inter-prediction
compensation may be performed by one or more modules including
morphing analyzer and generation module 211, synthesizing analyzer
and generation module 212, and/or characteristics and motion
compensated precision adaptive filtering predictor module 213.
[0128] As will be described in greater detail below, morphing
analyzer and generation module 211 may use de-quantized morphing
parameters (e.g., determined from input bitstream) to generate
morphed reference frames. Such generated morphed reference frames
may be stored in a buffer and may be used by characteristics and
motion compensated precision adaptive filtering predictor module
213.
[0129] Similarly, synthesizing analyzer and generation module 212
may be configured to generate one or more types of synthesized
prediction reference pictures such as super resolution (SR)
pictures and projected interpolation (PI) pictures or the like
based on parameters determined from input bitstream 201. Such
generated synthesized reference frames may be stored in a buffer
and may be used by motion compensated filtering predictor module
213.
[0130] Accordingly, in such a decoder prediction subsystem of
decoder 200, in cases where inter-prediction is applied,
characteristics and motion compensated filtering predictor module
213 may apply motion compensation based on morphed reference
frame(s) and/or super resolution (SR) pictures and projected
interpolation (PI) pictures along with the current frame.
[0131] In operation, the proposed implementation of the NGV coder
(e.g., encoder 100 and/or decoder 200) may use one or more of the
above components besides the usual local motion compensation with
respect to decoded past and/or future, picture/slices. As such the
implementation does not mandate a specific solution for instance
for dominant motion compensation, or for any other characteristics
compensated reference frame generation.
[0132] FIG. 2 illustrates example control signals associated with
operation of video decoder 200, where the indicated abbreviations
may represent similar information as discussed with respect to FIG.
1 above. While these control signals are illustrated as being
associated with specific example functional modules of decoder 200,
other implementations may include a different distribution of
control signals among the functional modules of encoder 100. The
present disclosure is not limited in this regard and, in various
examples, implementation of the control signals herein may include
the undertaking of only a subset of the specific example control
signals shown, additional control signals, and/or in a different
arrangement than illustrated.
[0133] While FIGS. 1 and 2 illustrate particular encoding and
decoding modules, various other coding modules or components not
depicted may also be utilized in accordance with the present
disclosure. Further, the present disclosure is not limited to the
particular components illustrated in FIGS. 1 and 2 and/or to the
manner in which the various components are arranged. Various
components of the systems described herein may be implemented in
software, firmware, and/or hardware and/or any combination thereof.
For example, various components of encoder 100 and/or decoder 200
may be provided, at least in part, by hardware of a computing
System-on-a-Chip (SoC) such as may be found in a computing system
such as, for example, a mobile phone.
[0134] Further, it may be recognized that encoder 100 may be
associated with and/or provided by a content provider system
including, for example, a video content server system, and that
output bitstream 111 may be transmitted or conveyed to decoders
such as, for example, decoder 200 by various communications
components and/or systems such as transceivers, antennae, network
systems, and the like not depicted in FIGS. 1 and 2. It may also be
recognized that decoder 200 may be associated with a client system
such as a computing device (e.g., a desktop computer, laptop
computer, tablet computer, convertible laptop, mobile phone, or the
like) that is remote to encoder 100 and that receives input
bitstream 201 via various communications components and/or systems
such as transceivers, antennae, network systems, and the like not
depicted in FIGS. 1 and 2. Therefore, in various implementations,
encoder 100 and decoder subsystem 200 may be implemented either
together or independent of one another.
[0135] FIG. 3 is an illustrative diagram of example subsystems
associated with next generation video encoder 100, arranged in
accordance with at least some implementations of the present
disclosure. As shown, encoder 100 may include a structure subsystem
310, a partitioning subsystem 320, a prediction subsystem 330, a
transform subsystem 340, a filtering subsystem 350, and/or an
entropy coding subsystem 360.
[0136] FIG. 3(a) is an illustrative diagram of an example next
generation video encoder 300a, arranged in accordance with at least
some implementations of the present disclosure. FIG. 3(a) presents
a similar encoder to that shown in FIG. 1, and similar elements
will not be repeated for the sake of brevity. As shown in FIG.
3(a), encoder 300a may include pre-analyzer subsystem 310a,
partitioner subsystem 320a, prediction encoding subsystem 330a,
transform encoder subsystem 340a, filtering encoding subsystem
350a, entropy encoder system 360a, transform decoder subsystem
370a, and/or unpartitioner subsystem 380a. Pre-analyzer subsystem
310a may include content pre-analyzer module 102 and/or adaptive
picture organizer module 104. Partitioner subsystem 320a may
include prediction partitions generator module 105, and/or coding
partitions generator 107. Prediction encoding subsystem 330a may
include motion estimator module 122, characteristics and motion
compensated filtering predictor module 123, and/or
intra-directional prediction analyzer and prediction generation
module 124. Transform encoder subsystem 340a may include adaptive
transform module 108 and/or adaptive quantize module 109. Filtering
encoding subsystem 350a may include blockiness analyzer and deblock
filtering module 117, quality analyzer and quality restoration
filtering module 118, motion estimator module 122, characteristics
and motion compensated filtering predictor module 123, and/or
prediction analyzer and prediction fusion filtering module 126.
Entropy coding subsystem 360a may include adaptive entropy encoder
module 110. Transform decoder subsystem 370a may include adaptive
inverse quantize module 112 and/or adaptive inverse transform
module 113. Unpartitioner subsystem 380a may include coding
partitions assembler 114 and/or prediction partitions assembler
116.
[0137] Partitioner subsystem 320a of encoder 300a may include two
partitioning subsystems: prediction partitions generator module 105
that may perform analysis and partitioning for prediction, and
coding partitions generator module 107 that may perform analysis
and partitioning for coding. Another partitioning method may
include adaptive picture organizer 104 which may segment pictures
into regions or slices may also be optionally considered as being
part of this partitioner.
[0138] Prediction encoder subsystem 330a of encoder 300a may
include motion estimator 122 and characteristics and motion
compensated filtering predictor 123 that may perform analysis and
prediction of "inter" signal, and intra-directional prediction
analyzer and prediction generation module 124 that may perform
analysis and prediction of "intra" signal. Motion estimator 122 and
characteristics and motion compensated filtering predictor 123 may
allow for increasing predictability by first compensating for other
sources of differences (such as gain, global motion, registration),
followed by actual motion compensation. They may also allow for use
of data modeling to create synthesized frames (super resolution,
and projection) that may allow better predictions, followed by use
of actual motion compensation in such frames.
[0139] Transform encoder subsystem 340a of encoder 300a may perform
analysis to select the type and size of transform and may include
two major types of components. The first type of component may
allow for using parametric transform to allow locally optimal
transform coding of small to medium size blocks; such coding
however may require some overhead. The second type of component may
allow globally stable, low overhead coding using a generic/fixed
transform such as the DCT, or a picture based transform from a
choice of small number of transforms including parametric
transforms. For locally adaptive transform coding, PHT (Parametric
Haar Transform) may be used. Transforms may be performed on 2D
blocks of rectangular sizes between 4.times.4 and 64.times.64, with
actual sizes that may depend on a number of factors such as if the
transformed data is luma or chroma, inter or intra, and if the
transform used is PHT or DCT. The resulting transform coefficients
may be quantized, scanned and entropy coded.
[0140] Entropy encoder subsystem 360a of encoder 300a may include a
number of efficient but low complexity components each with the
goal of efficiently coding a specific type of data (various types
of overhead, motion vectors, or transform coefficients). Components
of this subsystem may belong to a generic class of low complexity
variable length coding techniques, however, for efficient coding,
each component may be custom optimized for highest efficiency. For
instance, a custom solution may be designed for coding of
"Coded/Not Coded" data, another for "Modes and Ref Types" data, yet
another for "Motion Vector" data, and yet another one for
"Prediction and Coding Partitions" data. Finally, because a very
large portion of data to be entropy coded is "transform
coefficient" data, multiple approaches for efficient handling of
specific block sizes, as well as an algorithm that may adapt
between multiple tables may be used.
[0141] Filtering encoder subsystem 350a of encoder 300a may perform
analysis of parameters as well as multiple filtering of the
reconstructed pictures based on these parameters, and may include
several subsystems. For example, a first subsystem, blockiness
analyzer and deblock filtering module 117 may deblock and dither to
reduce or mask any potential block coding artifacts. A second
example subsystem, quality analyzer and quality restoration
filtering module 118, may perform general quality restoration to
reduce the artifacts due to quantization operation in any video
coding. A third example subsystem, which may include motion
estimator 122 and characteristics and motion compensated filtering
predictor module 123, may improve results from motion compensation
by using a filter that adapts to the motion characteristics (motion
speed/degree of blurriness) of the content. A fourth example
subsystem, prediction fusion analyzer and filter generation module
126, may allow adaptive filtering of the prediction signal (which
may reduce spurious artifacts in prediction, often from intra
prediction) thereby reducing the prediction error which needs to be
coded.
[0142] Encode controller module 103 of encoder 300a may be
responsible for overall video quality under the constraints of
given resources and desired encoding speed. For instance, in full
RDO (Rate Distortion Optimization) based coding without using any
shortcuts, the encoding speed for software encoding may be simply a
consequence of computing resources (speed of processor, number of
processors, hyperthreading, DDR3 memory etc.) availability. In such
case, encode controller module 103 may be input every single
combination of prediction partitions and coding partitions and by
actual encoding, and the bitrate may be calculated along with
reconstructed error for each case and, based on lagrangian
optimization equations, the best set of prediction and coding
partitions may be sent for each tile of each frame being coded. The
full RDO based mode may result in best compression efficiency and
may also be the slowest encoding mode. By using content analysis
parameters from content pre-analyzer module 102 and using them to
make RDO simplification (not test all possible cases) or only pass
a certain percentage of the blocks through full RDO, quality versus
speed tradeoffs may be made allowing speedier encoding. Up to now
we have described a variable bitrate (VBR) based encoder operation.
Encode controller module 103 may also include a rate controller
that can be invoked in case of constant bitrate (CBR) controlled
coding.
[0143] Lastly, pre-analyzer subsystem 310a of encoder 300a may
perform analysis of content to compute various types of parameters
useful for improving video coding efficiency and speed performance.
For instance, it may compute horizontal and vertical gradient
information (Rs, Cs), variance, spatial complexity per picture,
temporal complexity per picture, scene change detection, motion
range estimation, gain detection, prediction distance estimation,
number of objects estimation, region boundary detection, spatial
complexity map computation, focus estimation, film grain estimation
etc. The parameters generated by preanalyzer subsystem 310a may
either be consumed by the encoder or be quantized and communicated
to decoder 200.
[0144] While subsystems 310a through 380a are illustrated as being
associated with specific example functional modules of encoder 300a
in FIG. 3(a), other implementations of encoder 300a herein may
include a different distribution of the functional modules of
encoder 300a among subsystems 310a through 380a. The present
disclosure is not limited in this regard and, in various examples,
implementation of the example subsystems 310a through 380a herein
may include the undertaking of only a subset of the specific
example functional modules of encoder 300a shown, additional
functional modules, and/or in a different arrangement than
illustrated.
[0145] FIG. 3(b) is an illustrative diagram of an example next
generation video decoder 300b, arranged in accordance with at least
some implementations of the present disclosure. FIG. 3(b) presents
a similar decoder to that shown in FIG. 2, and similar elements
will not be repeated for the sake of brevity. As shown in FIG.
3(b), decoder 300b may include prediction decoder subsystem 330b,
filtering decoder subsystem 350b, entropy decoder subsystem 360b,
transform decoder subsystem 370b, unpartitioner 2 subsystem 380b,
unpartitioner 1 subsystem 351b, filtering decoder subsystem 350b,
and/or post-restorer subsystem 390b. Prediction decoder subsystem
330b may include characteristics and motion compensated filtering
predictor module 213 and/or intra-directional prediction generation
module 214. Filtering decoder subsystem 350b may include deblock
filtering module 208, quality restoration filtering module 209,
characteristics and motion compensated filtering predictor module
213, and/or prediction fusion filtering module 216. Entropy decoder
subsystem 360b may include adaptive entropy decoder module 202.
Transform decoder subsystem 370b may include adaptive inverse
quantize module 203 and/or adaptive inverse transform module 204.
Unpartitioner subsystem 380b may include coding partitions
assembler 205 and prediction partitions assembler 207.
Post-restorer subsystem 390b may include content post restorer
module 218 and/or adaptive picture re-organizer 217.
[0146] Entropy decoding subsystem 360b of decoder 300b may perform
the inverse operation of the entropy encoder subsystem 360a of
encoder 300a, i.e., it may decode various data (types of overhead,
motion vectors, transform coefficients) encoded by entropy encoder
subsystem 360a using a class of techniques loosely referred to as
variable length decoding. Specifically, various types of data to be
decoded may include "Coded/Not Coded" data, "Modes and Ref Types"
data, "Motion Vector" data, "Prediction and Coding Partitions"
data, and "Transform Coefficient" data.
[0147] Transform decoder subsystem 370b of decoder 300b may perform
inverse operation to that of transform encoder subsystem 340a of
encoder 300a. Transform decoder subsystem 370b may include two
types of components. The first type of example component may
support use of the parametric inverse PHT transform of small to
medium block sizes, while the other type of example component may
support inverse DCT transform for all block sizes. The PHT
transform used for a block may depend on analysis of decoded data
of the neighboring blocks. Output bitstream 111 and/or input
bitstream 201 may carry information about partition/block sizes for
PHT transform as well as in which direction of the 2D block to be
inverse transformed the PHT may be used (the other direction uses
DCT). For blocks coded purely by DCT, the partition/block sizes
information may be also retrieved from output bitstream 111 and/or
input bitstream 201 and used to apply inverse DCT of appropriate
size.
[0148] Unpartitioner subsystem 380b of decoder 300b may perform
inverse operation to that of partitioner subsystem 320a of encoder
300a and may include two unpartitioning subsystems, coding
partitions assembler module 205 that may perform unpartitioning of
coded data and prediction partitions assembler module 207 that may
perform unpartitioning for prediction. Further if optional adaptive
picture organizer module 104 is used at encoder 300a for region
segmentation or slices, adaptive picture re-organizer module 217
may be needed at the decoder.
[0149] Prediction decoder subsystem 330b of decoder 300b may
include characteristics and motion compensated filtering predictor
module 213 that may perform prediction of "inter" signal and
intra-directional prediction generation module 214 that may perform
prediction of "intra" signal. Characteristics and motion
compensated filtering predictor module 213 may allow for increasing
predictability by first compensating for other sources of
differences (such as gain, dominant motion, registration) or
creation of synthesized frames (super resolution, and projection),
followed by actual motion compensation.
[0150] Filtering decoder subsystem 350b of decoder 300b may perform
multiple filtering of the reconstructed pictures based on
parameters sent by encoder 300a and may include several subsystems.
The first example subsystem, deblock filtering module 208, may
deblock and dither to reduce or mask any potential block coding
artifacts. The second example subsystem, quality restoration
filtering module 209, may perform general quality restoration to
reduce the artifacts due to quantization operation in any video
coding. The third example subsystem, characteristics and motion
compensated filtering predictor module 213, may improve results
from motion compensation by using a filter that may adapt to the
motion characteristics (motion speed/degree of blurriness) of the
content. The fourth example subsystem, prediction fusion filtering
module 216, may allow adaptive filtering of the prediction signal
(which may reduce spurious artifacts in prediction, often from
intra prediction) thereby reducing the prediction error which may
need to be coded.
[0151] Post-restorer subsystem 390b of decoder 300b is an optional
block that may perform further improvement of perceptual quality of
decoded video. This processing can be done either in response to
quality improvement parameters sent by encoder 100, or it can be
standalone decision made at the post-restorer subsystem 390b. In
terms of specific parameters computed at encoder 100 that can be
used to improve quality at post-restorer subsystem 390b may be
estimation of film grain noise and residual blockiness at encoder
100 (even after deblocking). As regards the film grain noise, if
parameters can be computed and sent via output bitstream 111 and/or
input bitstream 201 to decoder 200, then these parameters may be
used to synthesize the film grain noise. Likewise, for any residual
blocking artifacts at encoder 100, if they can be measured and
parameters sent via output bitstream 111 and/or bitstream 201,
post-restorer subsystem 390b may decode these parameters and may
use them to optionally perform additional deblocking prior to
display. In addition, encoder 100 also may have access to scene
change, spatial complexity, temporal complexity, motion range, and
prediction distance information that may help in quality
restoration in post-restorer subsystem 390b.
[0152] While subsystems 330b through 390b are illustrated as being
associated with specific example functional modules of decoder 300b
in FIG. 3(b), other implementations of decoder 300b herein may
include a different distribution of the functional modules of
decoder 300b among subsystems 330b through 390b. The present
disclosure is not limited in this regard and, in various examples,
implementation of the example subsystems 330b through 390b herein
may include the undertaking of only a subset of the specific
example functional modules of decoder 300b shown, additional
functional modules, and/or in a different arrangement than
illustrated.
[0153] FIG. 4 is an illustrative diagram of modified prediction
reference pictures 400, arranged in accordance with at least some
implementations of the present disclosure. As shown, the output of
quality analyzer and quality restoration filtering may be a final
reconstructed frame that may be used for prediction for coding
other frames (e.g., the final reconstructed frame may be a
reference frame or the like).
[0154] The proposed implementation of the NGV coder (e.g., encoder
100 and/or decoder 200) may implement P-picture coding using a
combination of Morphed Prediction References 428 through 438 (MR0
through 3) and/or Synthesized Prediction References 412 and 440
through 446 (S0 through S3, MR4 through 7). NGV coding involves use
of three picture types referred to as I-pictures, P-pictures, and
FB-pictures. In the illustrated example, the current picture to be
coded (a P-picture) is shown at time t=4. During coding, the
proposed implementation of the NGV coder (e.g., encoder 100 and/or
decoder 200) may use one or more of four previously decoded
references R0 412, R1 414, R2 416, and R3 418. Unlike other
solutions that may simply use these references directly for
prediction, the proposed implementation of the NGV coder (e.g.,
encoder 100 and/or decoder 200) may generate modified (morphed or
synthesized) references from such previously decoded references and
then use motion compensated coding based at least in part on such
generated modified (morphed or synthesized) references.
[0155] As will be described in greater detail below, in some
examples, the proposed implementation of the NGV coder (e.g.,
encoder 100 and/or decoder 200) may incorporate a number of
components and the combined predictions generated by these
components in an efficient video coding algorithm. For example,
proposed implementation of the NGV coder may include one or more of
the following features: 1. Gain Compensation (e.g., explicit
compensation for changes in gain/brightness in a scene); 2. Blur
Compensation: e.g., explicit compensation for changes in
blur/sharpness in a scene; 3. Dominant/Global Motion Compensation
(e.g., explicit compensation for dominant motion in a scene); 4.
Registration Compensation (e.g., explicit compensation for
registration mismatches in a scene); 5. Super Resolution (e.g.,
explicit model for changes in resolution precision in a scene); 6.
Projection (e.g., explicit model for changes in motion trajectory
in a scene); the like, and/or combinations thereof.
[0156] In the illustrated example, if inter-prediction is applied,
a characteristics and motion filtering predictor module may apply
motion compensation to a current picture 410 (e.g., labeled in the
figure as P-pic (curr)) as part of the local decode loop. In some
instances, such motion compensation may be based at least in part
on future frames (not shown) and/or previous frame R0 412 (e.g.,
labeled in the figure as R0), previous frame R1 414 (e.g., labeled
in the figure as R1), previous frame R2 416 (e.g., labeled in the
figure as R2), and/or previous frame R3 418 (e.g., labeled in the
figure as R3).
[0157] For example, in some implementations, prediction operations
may include inter- and/or intra-prediction. Inter-prediction may be
performed by one or more modules including a morphing analyzer and
generation module and/or a synthesizing analyzer and generation
module. Such a morphing analyzer and generation module may analyze
a current picture to determine parameters for changes in blur 420
(e.g., labeled in the figure as Blur par), changes in gain 422
(e.g., labeled in the figure as Gain par and explained in detail
below), changes in registration 424 (e.g., labeled in the figure as
Reg par), and changes in dominant motion 426 (e.g., labeled in the
figure as Dom par), or the like with respect to a reference frame
or frames with which it is to be coded.
[0158] The determined morphing parameters 420, 422, 424, and/or 426
may be used to generate morphed reference frames. Such generated
morphed reference frames may be stored and may be used for
computing motion vectors for efficient motion (and characteristics)
compensated prediction of a current frame. In the illustrated
example, determined morphing parameters 420, 422, 424, and/or 426
may be used to generate morphed reference frames, such as blur
compensated morphed reference frame 428 (e.g., labeled in the
figure as MR3b), gain compensated morphed reference frame 430
(e.g., labeled in the figure as MR2g), gain compensated morphed
reference frame 432 (e.g., labeled in the figure as MR1g),
registration compensated morphed reference frame 434 (e.g., labeled
in the figure as MR1r), dominant motion compensated morphed
reference frame 436 (e.g., labeled in the figure as MR0d), and/or
registration compensated morphed reference frame 438 (e.g., labeled
in the figure as MR0r), the like or combinations thereof, for
example.
[0159] Similarly, a synthesizing analyzer and generation module may
generate super resolution (SR) pictures 440 (e.g., labeled in the
figure as S0 (which is equal to previous frame R0 412), S1, S2, S3)
and projected interpolation (PI) pictures 442 (e.g., labeled in the
figure as PE) or the like for determining motion vectors for
efficient motion compensated prediction in these frames. Such
generated synthesized reference frames may be stored and may be
used for computing motion vectors for efficient motion (and
characteristics) compensated prediction of a current frame.
[0160] Additionally or alternatively, the determined morphing
parameters 420, 422, 424, and/or 426 may be used to morph the
generated synthesis reference frames super resolution (SR) pictures
440 and/or projected interpolation (PI) pictures 442. For example,
a synthesizing analyzer and generation module may generate morphed
registration compensated super resolution (SR) pictures 444 (e.g.,
labeled in the figure as MR4r, MR5r, and MR6r) and/or morphed
registration compensated projected interpolation (PI) pictures 446
(e.g., labeled in the figure as MR7r) or the like from the
determined registration morphing parameter 424. Such generated
morphed and synthesized reference frames may be stored and may be
used for computing motion vectors for efficient motion (and
characteristics) compensated prediction of a current frame.
[0161] In some implementations, changes in a set of characteristics
(such as gain, blur, dominant motion, registration, resolution
precision, motion trajectory, the like, or combinations thereof,
for example) may be explicitly computed. Such a set of
characteristics may be computed in addition to local motion. In
some cases previous and next pictures/slices may be utilized as
appropriate; however, in other cases such a set of characteristics
may do a better job of prediction from previous picture/slices.
Further, since there can be error in any estimation procedure,
(e.g., from multiple past or multiple past and future
pictures/slices) a modified reference frame associated with the set
of characteristics (such as gain, blur, dominant motion,
registration, resolution precision, motion trajectory, the like, or
combinations thereof, for example) may be selected that yields the
best estimate. Thus, the proposed approach that utilizes modified
reference frames associated with the set of characteristics (such
as gain, blur, dominant motion, registration, resolution precision,
motion trajectory, the like, or combinations thereof, for example)
may explicitly compensate for differences in these characteristics.
The proposed implementation may address the problem of how to
improve the prediction signal, which in turn allows achieving high
compression efficiency in video coding.
[0162] As discussed, in some examples, inter-prediction may be
performed. In some examples, up to 4 decoded past and/or future
pictures and several morphing/synthesis predictions may be used to
generate a large number of reference types (e.g., reference
pictures). For instance in `inter` mode, up to nine reference types
may be supported in P-pictures, and up to ten reference types may
be supported for F/B-pictures. Further, `multi` mode may provide a
type of inter prediction mode in which instead of 1 reference
picture, two reference pictures may be used and P- and F/B-pictures
respectively may allow 3, and up to 8 reference types. For example,
prediction may be based on a previously decoded frame generated
using at least one of a morphing technique or a synthesizing
technique. In such examples, the bitstream may include a frame
reference, morphing parameters, or synthesizing parameters
associated with the prediction partition.
[0163] Some of the morphing and synthesis techniques other than the
dominant motion compensation (described in more detail below) are
as follows.
[0164] Gain Compensation
[0165] One type of morphed prediction used by NGV coding is gain
compensated prediction, and includes detecting and estimating the
gain and/or offset luminance values, parameterizing them, using
them for compensation of gain/offset at the encoder, transmitting
them to the decoder, and using them at the decoder for gain
compensation by replicating the gain compensation process at the
encoder.
[0166] By one detailed example, often in video scenes, frame to
frame differences are caused not only due to movement of objects
but also due to changes in gain/brightness. Sometimes such changes
in brightness can be global due to editing effects such as a
fade-in, a fade-out, or due to a crossfade. However, in many more
cases, such changes in brightness are local for instance due to
flickering lights, camera flashes, explosions, colored strobe
lights in a dramatic or musical performance, etc.
[0167] The compensation of interframe changes in brightness,
whether global or local, can potentially improve compression
efficiency in video coding. However, the brightness change
parameters (gain and offset) are applied both at a video encoder
and a decoder so that both should be efficiently communicating with
low bit-cost from encoder to decoder via the bitstream and the
processing complexity for the decoder should be minimized. In the
past, only techniques for global brightness change have been
disclosed, but local compensation in brightness changes have not
been successfully addressed.
[0168] The following equation relates brightness of a pixel
s.sub.t(i,j) at (i,j) location in frame `t` to brightness of a
pixel at the same location (i,j) in a previous frame `t-1`, with
`a` and `b` being the gain and offset factors. Motion is assumed to
be small and only the brightness changes are modeled.
s.sub.t(i,j)=a.times.s.sub.t-1(i,j)+b (1)
Taking the expected value of s.sub.t(i,j) and (s.sub.t.sup.2(i,j)),
and following a method of equating first and second moments of
current frame and the previous frame, the value of gain `a` and
offset `b` can then be calculated as:
a = ( E ( s t 2 ( i , j ) ) - ( E ( s t ( i , j ) ) ) 2 ( E ( s t -
1 2 ( i , j ) ) - ( E ( s t - 1 ( i , j ) ) ) 2 ( 2 ) b = E ( s t (
i , j ) ) - a .times. E ( s t - 1 ( i , j ) ) ( 3 )
##EQU00001##
Once `a` and `b` are calculated as per equation (2), they are
quantized (for efficient transmission), encoded and sent to the
decoder. At the decoder, decoded dequantized values of `a`, and `b`
are put back into equation (1), and using decoded values of pixels
in the previous frame, a gain compensated modified version of a
previous reference frame is calculated that is lower in error than
the original previous frame, and is then used for generating (gain
compensated) motion compensated prediction. To the (inverse
transformed, and dequantized) decoded prediction error blocks, the
corresponding predictions from modified previous reference frames
are added to generate the final decoded frame (or blocks of the
frame).
[0169] For local motion compensation, instead of a single set of
(a, b) parameters, multiple sets of parameters are computed and
transmitted along with the map of which portion of the frame
corresponds to which parameters, and to the decoder and used for
gain compensation as described.
[0170] Blur/Registration Compensation
[0171] By one detailed example, methods for compensation of
Registration and Blur are described below although the terms can be
used interchangeably.
[0172] Registration Compensation:
[0173] A stationary video camera imaging a scene might still result
in shaky or unstable video that differs frame to frame due to
environmental factors (such as wind), vibrations from nearby
objects, a shaky hand, or a jittery capture process, rather than
global movement of the scene or motion of large objects in the
scene. This results in frame to frame registration differences, the
compensation of which (in addition to other forms of compensation
such as gain, global/dominant motion, and local motion
compensation) may result in improvement of compression efficiency
of video coding.
[0174] For computing registration parameters between a current
frame and a previous reference frame, Wiener filtering can be
employed. Let x(n) be the input signal, y(n) be the output, and
h(n) represent filter coefficients.
Filter output : y ( n ) = k = 0 N - 1 h ( k ) x ( n - k ) ( 4 )
Error signal : e ( n ) = d ( n ) - y ( n ) ( 5 ) ##EQU00002##
In matrix notation, h is the vector of filter coefficients. The
cross-correlation row vector (between source frame and reference
frame):
R.sub.dx=E[d(n)x(n).sup.T] (6)
The Autocorrelation Matrix (Based on Block Data):
[0175] R.sub.xx=E[x(n)x(n).sup.T] (7)
[0176] The Wiener Hopf equation to solve for h as then as follows.
The Wiener Hopf equation determines optimum filter coefficients in
mean square error, and the resulting filter is called the `wiener`
filter.
h=R.sub.xx.sup.-1R.sub.dx (8)
Blur Compensation:
[0177] A fast camera pan of a scene may, due to charge integration,
result in blurry image. Further, even if a camera is still, or in
motion, if a scene involves fast moving objects, for instance
football players in a football game, the objects can appear blurry
as the temporal resolution of the imaging is not sufficient. In
both of the aforementioned cases, compensation of blur prior to or
in conjunction with other forms of compensation, may improve
compression efficiency of video coding.
[0178] For motion blur estimation, a Lucy-Richardson method can be
used. It is an iterative algorithm for successively computing
reduced blur frame (X) at iteration i, from Y the source frame,
using B, the blur operator (blur frame using estimated blur
vectors) and B* an adjoint operator. The operator B* can be roughly
thought of as the same as B as B* can be replaced by B resulting in
roughly the same visual quality.
X i + 1 = X i B * ( Y B ( X i ) ) , X 0 = Y ( 9 ) ##EQU00003##
[0179] Super Resolution Synthesis
[0180] Referring to FIG. 5, besides morphed prediction (gain,
blur/registration, global/dominant motion) pictures, synthesized
prediction (super resolution (SR), and projected interpolation
(PI)) pictures are also supported. In general, super resolution
(SR) is a technique used to create a high resolution reconstruction
image of a single video frame using many past frames of the video
to help fill in the missing information. The goal of a good super
resolution technique is to be able to produce a reconstructed image
better than up-sampling alone when tested with known higher
resolution video. The super resolution generation technique herein
may use coded video codec data to create an in-loop super
resolution frame. The in-loop super resolution frame is used again
within the coding loop as the name implies. The use of SR in a
coding loop provides significant gain in the low resolution video
coding and thus in the reconstructed super resolution video. This
process uses an algorithm that combines and uses codec information
(like modes intra, motion, coefficients. etc.) along with current
decoded frames and past frames (or future frames if available) to
create a high resolution reconstruction of the current frame being
decoded. Thus the proposed technique is fast and produces good
visual quality.
[0181] For sequences where the movement is slow and content is
fairly detailed (many edges, texture, and so forth), the ability to
generate super resolution frames for use in prediction can provide
greater motion compensation accuracy, and thereby permit a higher
degree of compression. As shown in FIG. 5, a process 500 is
diagrammed where the principle of generation of SR prediction is
applied to P-pictures, which is a type of synthesized prediction
used by NGV coding. In this case, both the encoder and decoder
generate the synthesized frame from previously available decoded
frames and data. A SR frame 518 double the size of frame `n` 504 in
both the horizontal and vertical dimensions is generated by
blending upsampled decoded P frame 516 at `n`, and motion
compensated picture 514 constructed by using a previous SR frame
508 at `n-1`. The previous SR frame 508 is de-interleaved and
combined with the motion estimation values at de-interleaved blocks
510 by using the current P-picture 504. The blocks 510 are used for
motion compensation to form motion compensated, de-interleaved
blocks 512, which are then re-interleaved onto a block to form the
motion compensated picture 514. Multi reference prediction is also
shown for the P-picture at frame n+1 by arrow D.
[0182] Projected Interpolation Synthesis
[0183] A picture sequence such as frame sequence 400, may also be
used to illustrate the principle of generation and use of projected
interpolation frames (PI-pictures) shown as frame PE 442 on FIG. 4.
For simplicity, assume that F-pictures behave like B-pictures and
can reference two anchors, one in the past, and another in the
future (this is only one example case). Then, for every F-picture,
a co-located interpolated frame can be generated by a specific type
of interpolation referred to as projected interpolation using the
future and the past reference anchor frames. Projected
interpolation takes object motion into account that has
non-constant (or non-linear) velocity over a sequence of frames, or
relatively large motions. PI uses weighting factors depending on
the distance from the co-located or current frame to be replaced
and to each of the two reference frames being used for the
interpolation. Thus, a best fit motion vector is determined that is
proportional to these two distances, with the closer reference
usually given more weight. To accomplish this, a two scale factor
(x factor and y factor) are determined by least square estimations
for one example. Further motion compensation may then be allowed to
adjust small mismatches.
[0184] For instance, for F-pictures at a time `n+1`, a PI-picture
is generated co-located at this time using anchor or reference
frames at times `n` and `n+2`. Likewise for F-pictures at times,
`n+3`, and `n+4`, corresponding PI-pictures can be generated using
anchor frames at times `n+2` and `n+5`. This process may repeat for
each future F-picture as a PI-picture is synthesized to correspond
in time to each F-picture. The corresponding synthesized
PI-pictures can then be used as a third reference in the same or
similar way the two reference anchors were going to be used for
prediction. Some prediction partitions may use prediction
references directly while others may use them implicitly such as to
generate bi-prediction. Thus, synthesized PI-pictures can be used
for prediction, instead of the original F-pictures, with
multi-reference prediction and with two reference anchors.
[0185] Turning now to the system to implement these modifications
to reference frames, and as mentioned previously, with ever
increasing resolution of video to be compressed and expectation of
high video quality, the corresponding bitrate/bandwidth required
for coding using existing video coding standards such as H.264 or
even evolving standards such as H.265/HEVC, is relatively high. The
aforementioned standards use expanded forms of traditional
approaches to implicitly address the insufficient
compression/quality problem, but often the results are limited.
[0186] The proposed implementation improves video compression
efficiency by improving interframe prediction, which in turn
reduces interframe prediction difference (error signal) that needs
to be coded. The less the amount of interframe prediction
difference to be coded, the less the amount of bits required for
coding, which effectively improves the compression efficiency as it
now takes less bits to store or transmit the coded prediction
difference signal. Instead of being limited to motion predictions
only, the proposed NCV codec may be highly adaptive to changing
characteristics (such as gain, blur, dominant motion, registration,
resolution precision, motion trajectory, the like, or combinations
thereof, for example) of the content by employing, in addition or
in the alternative to motion compensation, approaches to explicitly
compensate for changes in the characteristics of the content. Thus,
by explicitly addressing the root cause of the problem the NGV
codec may address a key source of limitation of standards based
codecs, thereby achieving higher compression efficiency.
[0187] This change in interframe prediction output may be achieved
due to ability of the proposed NCV codec to compensate for a wide
range of reasons for changes in the video content. Typical video
scenes vary from frame to frame due to many local and global
changes (referred to herein as characteristics). Besides local
motion, there are many other characteristics that are not
sufficiently addressed by current solutions that may be addressed
by the proposed implementation.
[0188] The proposed implementation may explicitly compute changes
in a set of characteristics (such as gain, blur, dominant motion,
registration, resolution precision, motion trajectory, the like, or
combinations thereof, for example) in addition to local motion, and
thus may do a better job of prediction from previous picture/slices
than only using local motion prediction from previous and next
pictures/slices. Further, since there can be error in any
estimation procedure, from multiple past or multiple past and
future pictures/slices the NGV coder may choose the frame that
yields the best by explicitly compensating for differences in
various characteristics.
[0189] In operation, the proposed implementation of the NGV coder
(e.g., encoder 100 and/or decoder 200) may operate so that
prediction mode and/or reference type data may be defined using
symbol-run coding or a codebook or the like. The prediction mode
and/or reference type data may be transform encoded using content
adaptive or discrete transform in various examples to generate
transform coefficients. Also as discussed, data associated with
partitions (e.g., the transform coefficients or quantized transform
coefficients), overhead data (e.g., indicators as discussed herein
for transform type, adaptive transform direction, and/or a
transform mode), and/or data defining the partitions and so on may
be encoded (e.g., via an entropy encoder) into a bitstream. The
bitstream may be communicated to a decoder, which may use the
encoded bitstream to decode video frames for display. On a local
basis (such as block-by-block within a macroblock or a tile, or on
a partition-by-partition within a tile or a prediction unit, or
fragments within a superfragment or region) the best mode may be
selected for instance based at least in part on Rate Distortion
Optimization (RDO) or based at least in part on pre-analysis of
video, and the identifier for the mode and needed references may be
encoded within the bitstream for use by the decoder.
[0190] As explained above, various prediction modes are allowed in
P- and F-pictures and are exemplified below, along with how they
relate to the reference types. Both the P-picture and F-picture
tiles are partitioned into smaller units, and a prediction mode
from among "skip", "auto", "inter", and "multi", is assigned to
each partition of a tile. The entire list of modes in Table 1 also
includes `intra` that refers to spatial prediction from neighboring
blocks as compared to temporal motion compensated prediction. The
"split" mode refers to a need for further division or further
partitioning. For partitions that use "inter" or "multi" mode,
further information about the used reference is needed and is shown
for P-pictures in Table 2(a) and Table 2(b) respectively, while for
F-pictures, in Table 3(a) and Table 3(b) respectively.
[0191] Prediction modes and reference types analyzer 125 (FIG. 1)
may allow for selection of prediction modes from among, "skip",
"auto", "inter", "multi", and "intra" as mentioned above, and for
each partition of a tile, all of which may apply to P- and
F-pictures; this is shown in Table 1 below. In addition to
prediction modes, it also allows for selection of reference types
that can be different depending on "inter" or "multi" mode, as well
as for P- and F-pictures; the detailed list of ref types is shown
in Tables 2(a) and 2(b) for P-pictures, and Tables 3(a), 3(b),
3(c), and 3(d) for F-pictures.
[0192] Tables 1 through 3(d), shown below, illustrate one example
of codebook entries for a current frame (curr_pic) being, or that
will be, reconstructed. A full codebook of entries may provide a
full or substantially full listing of all possible entries and
coding thereof. In some examples, the codebook may take into
account constraints as described above. In some examples, data
associated with a codebook entry for prediction modes and/or
reference types may be encoded in a bitstream for use at a decoder
as discussed herein.
TABLE-US-00001 TABLE 1 Prediction modes for Partitions of a Tile in
P-and F- pictures (already explained above): No. Prediction mode 0.
Intra 1. Skip 2. Split 3. Auto 4. Inter 5. Multi
TABLE-US-00002 TABLE 2(a) Ref Types for Partitions of Tile that
have "inter" mode in P-pictures: No. Ref Types for partitions with
"inter" mode 0. MR0n (=past SR0) 1. MR1n 2. MR2n 3. MR3n 4. MR5n
(past SR1) 5. MR6n (past SR2) 6. MR7n (past SR3) 7. MR0d 8.
MR0g
TABLE-US-00003 TABLE 2(b) Ref Types for Partitions of Tile that
have "multi" mode in P-pictures: Ref Types for partitions with
"multi" mode No. (first Ref Past none, second Ref:) 0. MR1n 1. MR2n
2. MR3n
where table 2(b) is directed to a specific combination of
references including a past reference without parameters and one of
the references on the table as indicated by the table heading.
TABLE-US-00004 TABLE 3(a) Ref Types for Partitions of Tile that
have "inter" mode in F-pictures: No. Ref Types for partitions with
"inter" mode 0. MR0n 1. MR7n (=proj F) 2. MR3n (=future SR0) 3.
MR1n 4. MR4n (=Future SR1) 5. MR5n (=Future SR2) 6. MR6n (=Future
SR3) 7. MR3d 8. MR0g/MR3g
where proj F refers to PI, and line 8, by one example, includes two
optional references.
TABLE-US-00005 TABLE 3(b) Ref Types for Partitions of Tile that
have "multi" mode and Dir 0 in F-pictures: Ref Types for partitions
with "multi" mode and Dir 0 No. (first Ref Past none, second Ref:)
0. MR3n (=future SR0) 1. MR1n 2. MR4n (=Future SR1) 3. MR5n
(=Future SR2) 4. MR6n (=Future SR3) 5. MR7n (=proj F) 6. MR3d 7.
MR3g
where Dir refers to a sub-mode that is a fixed, or partially fixed,
combination of references for multi-mode for F-frames, such that
Dir 0 above, and Dir 1 and Dir 2 below, each refer to a combination
of references. Thus, as shown in Table 3(b), Dir 0 may refer to a
combination of a past reference (which may be a particular
reference at a particular time (reference 3 at n+2 for example) and
combined with one of the references from the table. Dir on the
tables below are similar and as explained in the heading of the
table.
TABLE-US-00006 TABLE 3(c) Ref Types for Partitions of Tile that
have "multi" mode and Dir 1 in F-pictures: Ref Types for partitions
with "multi" mode and Dir 1 No. (first Ref MR0n, second Ref:) 0.
MR7n (=proj F)
TABLE-US-00007 TABLE 3(d) Ref Types for Partitions of Tile that
have "multi" mode and Dir 2 in F-pictures: Ref Types for partitions
with "multi" mode and Dir 2 No. (first Ref MR3n, second Ref:) 0.
MR7n (=proj F)
[0193] Specific to dominant motion compensation, "inter" mode of
P-pictures supports a reference type called MR0d (morphed reference
0 dominant motion), and for "inter" mode of F-pictures, supported
reference types include MR0d and MR3d (morphed reference 3 dominant
motion). These codes are explained further below. Further, in
"multi" mode, MR3d is supported as one of the two references used
for a single current frame. Besides "inter" and "multi", DMC also
may be used in "auto" mode of NGV. A summary of modes and reference
type combinations where DMC is invoked are as follows.
[0194] Use the dominant motion compensated reference frame
prediction: [0195] F-Picture, auto mode, sub-mode 1, 2
[0196] Use blended prediction of multiple dominant motion
compensated reference frames: [0197] F-Picture, auto mode, sub-mode
3
[0198] Dominant motion compensated reference with differential
translational motion vector for prediction: [0199] P-Picture, inter
mode, par=DMC [0200] F-Picture, inter mode, par=DMC
[0201] Dominant motion compensated reference with differential
translational motion vector for prediction, blended with another
reference frame: [0202] F-Picture, multi mode, ref1=past_ref,
par1=none, ref2=utr_ref, par1=DMC
[0203] FIG. 6 is an illustrative diagram of an example encoder
prediction subsystem 330 for performing characteristics and motion
compensated prediction, arranged in accordance with at least some
implementations of the present disclosure. As illustrated, encoder
prediction subsystem 330 of encoder 600 may include decoded picture
buffer 119, morphing analyzer and generation module 120,
synthesizing analyzer and generation module 121, motion estimator
module 122, and/or characteristics and motion compensated precision
adaptive filtering predictor module 123.
[0204] As shown, the output of quality analyzer and quality
restoration filtering may be transmitted to decoded picture buffer
119. In some examples, the output of quality analyzer and quality
restoration filtering may be a final reconstructed frame that may
be used for prediction for coding other frames (e.g., the final
reconstructed frame may be a reference frame or the like). In
encoder 600, prediction operations may include inter- and/or
intra-prediction. As shown in FIG. 6, inter-prediction may be
performed by one or more modules including morphing analyzer and
generation module 120, synthesizing analyzer and generation module
121, and/or characteristics and motion compensated precision
adaptive filtering predictor module 123.
[0205] Morphing analyzer and generation module 120 may include a
morphing types analyzer (MTA) and a morphed pictures generator
(MPG) 610 as well as a morphed prediction reference (MPR) buffer
620. Morphing types analyzer (MTA) and a morphed pictures generator
(MPG) 610 may analyze a current picture to determine parameters for
changes in gain, changes in dominant motion, changes in
registration, and changes in blur with respect to a reference frame
or frames with which it is to be coded. The determined morphing
parameters may be quantized/de-quantized and used (e.g., by
morphing analyzer and generation module 120) to generate morphed
reference frames. Such generated morphed reference frames may be
stored in morphed prediction reference (MPR) buffer 620 and may be
used by motion estimator module 122 for computing motion vectors
for efficient motion (and characteristics) compensated prediction
of a current frame.
[0206] Synthesizing analyzer and generation module 121 may include
a synthesis types analyzer (STA) and synthesized pictures generator
(SPG) 630 as well as a synthesized prediction reference (SPR)
buffer 640. Synthesis types analyzer (STA) and synthesized pictures
generator 630 may generate super resolution (SR) pictures and
projected interpolation (PI) pictures or the like for determining
motion vectors for efficient motion compensated prediction in these
frames. Such generated synthesized reference frames may be stored
in synthesized prediction reference (SPR) buffer 640 and may be
used by motion estimator module 122 for computing motion vectors
for efficient motion (and characteristics) compensated prediction
of a current frame.
[0207] Motion estimator module 122 may generate motion vector data
based at least in part on morphed reference frame(s) and/or super
resolution (SR) pictures and projected interpolation (PI) pictures
along with the current frame. In some examples, motion estimator
module 122 may be considered an inter-prediction module. For
example, the motion vector data may be used for inter-prediction.
If inter-prediction is applied, characteristics and motion
filtering predictor module 123 may apply motion compensation as
part of the local decode loop as discussed.
[0208] FIG. 7 is an illustrative diagram of an example decoder
prediction subsystem 701 for performing characteristics and motion
compensated prediction, arranged in accordance with at least some
implementations of the present disclosure. As illustrated, decoder
prediction subsystem 701 of decoder 700 may include decoded picture
buffer 210, morphing analyzer and generation module 211,
synthesizing analyzer and generation module 212, and/or
characteristics and motion compensated precision adaptive filtering
predictor module 213.
[0209] As shown, the output of quality restoration filtering module
may be transmitted to decoded picture buffer (or frame selector
control) 210. In some examples, the output of quality restoration
filtering module may be a final reconstructed frame that may be
used for prediction for coding other frames (e.g., the final
reconstructed frame may be a reference frame or the like). As
discussed, compensation due to prediction operations may include
inter- and/or intra-prediction compensation. As shown,
inter-prediction compensation may be performed by one or more
modules including morphing analyzer and generation module 211,
synthesizing analyzer and generation module 212, and/or
characteristics and motion compensated precision adaptive filtering
predictor module 213.
[0210] Morphing analyzer and generation module 211 may include a
morphed pictures generator (MPG) 710 as well as a morphed
prediction reference (MPR) buffer 720. Morphed pictures generator
(MPG) 710 may use de-quantized morphing parameters (e.g.,
determined from input bitstream) to generate morphed reference
frames. Such generated morphed reference frames may be stored in
morphed prediction reference (MPR) buffer 720 and may be used by
characteristics and motion compensated precision adaptive filtering
predictor module 213.
[0211] Synthesizing analyzer and generation module 212 may include
a synthesized pictures generator (SPG) 730 as well as a synthesized
prediction reference (SPR) buffer 740. Synthesized pictures
generator 730 may be configured to generate one or more types of
synthesized prediction reference pictures such as super resolution
(SR) pictures and projected interpolation (PI) pictures or the like
based at least in part on parameters determined from input
bitstream 201. Such generated synthesized reference frames may be
stored in synthesized prediction reference (SPR) buffer 740 and may
be used by motion compensated filtering predictor module 213.
[0212] If inter-prediction is applied, characteristics and motion
compensated filtering predictor module 213 may apply motion
compensation based at least in part on morphed reference frame(s)
and/or super resolution (SR) pictures and projected interpolation
(PI) pictures along with the current frame.
[0213] Referring to FIG. 8, an illustrative diagram of another
example encoder prediction subsystem 330 for performing
characteristics and motion compensated prediction is arranged in
accordance with at least some implementations of the present
disclosure. As illustrated, encoder prediction subsystem 330 of
encoder 800 may include decoded picture buffer 119, morphing
analyzer and generation module 120, synthesizing analyzer and
generation module 121, motion estimator module 122, and/or
characteristics and motion compensated precision adaptive filtering
predictor module 123.
[0214] As shown, the output of quality analyzer and quality
restoration filtering may be transmitted to decoded picture buffer
119. In some examples, the output of quality analyzer and quality
restoration filtering may be a final reconstructed frame that may
be used for prediction for coding other frames (e.g., the final
reconstructed frame may be a reference frame or the like). In
encoder 800, prediction operations may include inter- and/or
intra-prediction. As shown in FIG. 8, inter-prediction may be
performed by one or more modules including morphing analyzer and
generation module 120, synthesizing analyzer and generation module
121, and/or characteristics and motion compensated precision
adaptive filtering predictor module 123.
[0215] Morphing analyzer and generation module 120 may include a
morphing types analyzer (MTA) and a morphed pictures generator
(MPG) 610 as well as a morphed prediction reference (MPR) buffer
620. Morphing types analyzer (MTA) and a morphed pictures generator
(MPG) 610 may be configured to analyze and/or generate one or more
types of modified prediction reference pictures.
[0216] For example, morphing types analyzer (MTA) and a morphed
pictures generator (MPG) 610 may include Gain Estimator and
Compensated Prediction Generator 805, Blur Estimator and
Compensated Prediction Generator 810, Dominant Motion Estimator and
Compensated Prediction Generator 815, Registration Estimator and
Compensated Prediction Generator 820, the like and/or combinations
thereof. Gain Estimator and Compensated Prediction Generator 805
may be configured to analyze and/or generate morphed prediction
reference pictures that are adapted to address changes in gain.
Blur Estimator and Compensated Prediction Generator 810 may be
configured to analyze and/or generate morphed prediction reference
pictures that are adapted to address changes in blur. Global Motion
Estimator and Compensated Prediction Generator 815 may be
configured to analyze and/or generate morphed prediction reference
pictures that are adapted to address changes in dominant motion.
Specifically, the global motion estimator and compensated
prediction generator 815 is used for computation of global motion
parameters (dp) and applying them on a picture from the DPR buffers
119 to generate a GMC Morphed Reference Picture that is stored in
one of the MPR Picture Buffers (Local/Picture Buffers for Dominant
Motion Compensated Prediction). The output of that is used for
block motion estimation and compensation. Registration Estimator
and Compensated Prediction Generator 820 may be configured to
analyze and/or generate morphed prediction reference pictures that
are adapted to address changes in registration.
[0217] Morphing types analyzer (MTA) and a morphed pictures
generator (MPG) 610 may store such generated morphed reference
frames in morphed prediction reference (MPR) buffer 620. For
example, morphed prediction reference (MPR) buffer 620 may include
Gain Compensated (GC) Picture/s Buffer 825, Blur Compensated (BC)
Picture/s Buffer 830, Dominant Motion Compensated (DC) Picture/s
Buffer 835, Registration Compensated (RC) Picture/s Buffer 840, the
like and/or combinations thereof. Gain Compensated (GC) Picture/s
Buffer 825 may be configured to store morphed reference frames that
are adapted to address changes in gain. Blur Compensated (BC)
Picture/s Buffer 830 may be configured to store morphed reference
frames that are adapted to address changes in blur. Dominant Motion
Compensated (DC) Picture/s Buffer 835 may be configured to store
morphed reference frames that are adapted to address changes in
dominant motion. Registration Compensated (RC) Picture/s Buffer 840
may be configured to store morphed reference frames that are
adapted to address changes in registration.
[0218] Synthesizing analyzer and generation module 121 may include
a synthesis types analyzer (STA) and synthesized pictures generator
630 as well as a synthesized prediction reference (SPR) buffer 640.
Synthesis types analyzer (STA) and synthesized pictures generator
530 may be configured to analyze and/or generate one or more types
of synthesized prediction reference pictures. For example,
synthesis types analyzer (STA) and synthesized pictures generator
630 may include Super Resolution Filter Selector & Prediction
Generator 845, Projection Trajectory Analyzer & Prediction
Generator 850, the like and/or combinations thereof. Super
Resolution Filter Selector & Prediction Generator 845 may be
configured to analyze and/or generate a super resolution (SR) type
of synthesized prediction reference pictures. Projection Trajectory
Analyzer & Prediction Generator 850 may be configured to
analyze and/or generate a projected interpolation (PI) type of
synthesized prediction reference pictures.
[0219] Synthesis types analyzer (STA) and synthesized pictures
generator 630 may generate super resolution (SR) pictures and
projected interpolation (PI) pictures or the like for efficient
motion compensated prediction in these frames. Such generated
synthesized reference frames may be stored in synthesized
prediction reference (SPR) buffer 640 and may be used by motion
estimator module 122 for computing motion vectors for efficient
motion (and characteristics) compensated prediction of a current
frame.
[0220] For example, synthesized prediction reference (SPR) buffer
640 may include Super Resolution (SR) Picture Buffer 855, Projected
Interpolation (PI) Picture Buffer 860, the like and/or combinations
thereof. Super Resolution (SR) Picture Buffer 855 may be configured
to store synthesized reference frames that are generated for super
resolution (SR) pictures. Projected Interpolation (PI) Picture
Buffer 860 may be configured to store synthesized reference frames
that are generated for projected interpolation (PI) pictures.
[0221] Motion estimator module 122 may generate motion vector data
based on morphed reference frame(s) and/or super resolution (SR)
pictures and projected interpolation (PI) pictures along with the
current frame. In some examples, motion estimator module 122 may be
considered an inter-prediction module. For example, the motion
vector data may be used for inter-prediction. If inter-prediction
is applied, characteristics and motion filtering predictor module
123 may apply motion compensation as part of the local decode loop
as discussed.
[0222] The prediction mode analyzer 125 (or Pred Modes & Ref
Types Analyzer & Selector), and as explained above, chooses on
a local (block, tile or partition) basis the best prediction from
among various type of inter modes and intra mode. Here the term
inter is being used in generality and includes `inter` mode,
`multi` mode, `auto` mode and `skip` modes. The chosen mode (and
sub-mode if applicable), morphing or synthesis parameters (dp, gp,
rp, sp, pp), reference info, and motion (mv, .DELTA.mv) and other
data is entropy coded as explained above and sent as part of an
encoded bitstream to the decoder.
[0223] FIG. 9 is an illustrative diagram of another example decoder
prediction subsystem 701 for performing characteristics and motion
compensated prediction, arranged in accordance with at least some
implementations of the present disclosure. As illustrated, decoder
prediction subsystem 701 may include decoded picture buffer 210,
morphing analyzer and generation module 211, synthesizing analyzer
and generation module 212, and/or characteristics and motion
compensated precision adaptive filtering predictor module 213.
[0224] As shown, the output of quality restoration filtering module
may be transmitted to decoded picture buffer 210. In some examples,
the output of quality restoration filtering module may be a final
reconstructed frame that may be used for prediction for coding
other frames (e.g., the final reconstructed frame may be a
reference frame or the like). As discussed, compensation due to
prediction operations may include inter- and/or intra-prediction
compensation. As shown, inter-prediction compensation may be
performed by one or more modules including morphing analyzer and
generation module 211, synthesizing analyzer and generation module
212, and/or characteristics and motion compensated precision
adaptive filtering predictor module 213.
[0225] Morphing generation module 212 may include a morphed
pictures generator (MPG) 710 as well as a morphed prediction
reference (MPR) buffer 720. Morphed pictures generator (MPG) 710
may use de-quantized morphing parameters (e.g., determined from
input bitstream) to generate morphed reference frames. For example,
morphed pictures generator (MPG) 710 may include Gain Compensated
Prediction Generator 905, Blur Compensated Prediction Generator
910, Dominant Motion Compensated Prediction Generator 915,
Registration Compensated Prediction Generator 920, the like and/or
combinations thereof. Gain Compensated Prediction Generator 905 may
be configured to generate morphed prediction reference pictures
that are adapted to address changes in gain as described in greater
detail below. Blur Compensated Prediction Generator 910 may be
configured to generate morphed prediction reference pictures that
are adapted to address changes in blur. Dominant Motion Compensated
Prediction Generator 915 may be configured to generate morphed
prediction reference pictures that are adapted to address changes
in dominant motion. Registration Compensated Prediction Generator
920 may be configured to generate morphed prediction reference
pictures that are adapted to address changes in registration.
[0226] Morphed pictures generator (MPG) 710 may store such
generated morphed reference frames in morphed prediction reference
(MPR) buffer 720. For example, morphed prediction reference (MPR)
buffer 720 may include Gain Compensated (GC) Picture/s Buffer 925,
Blur Compensated (BC) Picture/s Buffer 930, Dominant Motion
Compensated (DC) Picture/s Buffer 935, Registration Compensated
(RC) Picture/s Buffer 940, the like and/or combinations thereof.
Gain Compensated (GC) Picture/s Buffer 925 may be configured to
store morphed reference frames that are adapted to address changes
in gain. Blur Compensated (BC) Picture/s Buffer 930 may be
configured to store morphed reference frames that are adapted to
address changes in blur. Dominant Motion Compensated (DC) Picture/s
Buffer 935 may be configured to store morphed reference frames that
are adapted to address changes in dominant motion. Registration
Compensated (RC) Picture/s Buffer 940 may be configured to store
morphed reference frames that are adapted to address changes in
registration.
[0227] Synthesizing generation module 212 may include a synthesized
pictures generator 630 as well as a synthesized prediction
reference (MPR) buffer 740. Synthesized pictures generator 730 may
be configured to generate one or more types of synthesized
prediction reference pictures such as super resolution (SR)
pictures and projected interpolation (PI) pictures or the like
based on parameters determined from input bitstream 201. Such
generated synthesized reference frames may be stored in synthesized
prediction reference (SPR) buffer 740 and may be used by motion
compensated filtering predictor module 213. For example,
synthesized pictures generator 730 may include Super Resolution
Picture Generator 945, Projection Trajectory Picture Generator 950,
the like and/or combinations thereof. Super Resolution Picture
Generator 945 may be configured to generate a super resolution (SR)
type of synthesized prediction reference pictures. Projection
Trajectory Picture Generator 950 may be configured to generate a
projected interpolation (PI) type of synthesized prediction
reference pictures.
[0228] Synthesized pictures generator 730 may generate super
resolution (SR) pictures and projected interpolation (PI) pictures
or the like for efficient motion compensated prediction in these
frames. Such generated synthesized reference frames may be stored
in synthesized prediction reference (SPR) buffer 740 and may be
used by characteristics and motion compensated filtering predictor
module 213 for efficient motion (and characteristics) compensated
prediction of a current frame.
[0229] For example, synthesized prediction reference (SPR) buffer
740 may include Super Resolution (SR) Picture Buffer 955, Projected
Interpolation (PI) Picture Buffer 960, the like and/or combinations
thereof. Super Resolution (SR) Picture Buffer 955 may be configured
to store synthesized reference frames that are generated for super
resolution (SR) pictures. Projected Interpolation (PI) Picture
Buffer 960 may be configured to store synthesized reference frames
that are generated for projected interpolation (PI) pictures.
[0230] If inter-prediction is applied, characteristics and motion
compensated filtering predictor module 213 may apply motion
compensation based on morphed reference frame(s) and/or super
resolution (SR) pictures and projected interpolation (PI) pictures
along with the current frame.
[0231] Dominant Motion Compensation
[0232] Referring to FIG. 10, as mentioned above a reference frame
may be modified for dominant motion compensation to provide more
efficient and accurate global motion compensation. NGV video coding
addresses limitations of the current state of the art by novel
approaches to content partitioning, content adaptive prediction,
and content adaptive transform coding. Among the various approaches
for content adaptive prediction it includes a more sophisticated
approach to global motion compensation as compared to the MPEG-4,
part2 standard based technique discussed earlier.
[0233] One of the limitations of global motion compensation (GMC)
included in the MPEG-4 standard is that the computed GMC parameters
may not provide good prediction due to various reasons including
large distance between prediction frames, uncovered background,
mixing of global and local motion of objects, and simplistic
interpolation. With no way for correction of computed GMC
parameters, the only alternative existed was to enable or disable
use of GMC on a local basis. More specifically, it was determined
whether or not to use GMC on a block-by-block basis. This is an
overhead-expensive process.
[0234] In comparison, NGV video coding introduces the principle of
correction of GMC prediction by introducing a correction vector. By
one example, an original or current video picture 1002 to be coded
(on the right) has a foreground object 1004 (a large star shaped
object) and a background 1006. A dominant motion compensated (DMC)
picture 1000 (on the left and also referred to as the decoded
reference frame) is first created by forming a GMC morphed picture
1008 rounded or fit within a rectangle 1010 as explained below. A
delta correction motion vector (.DELTA.mvx, .DELTA.mvy) 1016 then
may `fine tune` an adjusted (or morphed or warped) position of the
foreground object 1012 to a final position 1014. The motion vectors
shown herein point from the current frame 1002 to the corresponding
position on the reference frame to show where the region, portion,
or block comes from as per usual coding diagrams.
[0235] While a single delta correction motion vector 1016 of the
foreground star shaped object 1014 is shown, in reality there may
be at least two delta correction motion vectors at work since one
motion vector could be used for the background 1018 (including a
zero delta motion vector), and another motion vector used for the
foreground (such as the shown delta motion vector). In other
alternatives a single motion vector may be used on a block (such as
a macroblock or larger), on a tile (such as a 64.times.64 rectangle
or larger), or other partition or portion of a frame that may or
may not be formed by grouping blocks, tiles, or other units
together.
[0236] Referring to FIG. 11, by one approach, an example process
1100 is a computer implemented method of video coding, and
specifically, to perform dominant motion compensation. The process
1100 is arranged in accordance with at least some implementations
of the present disclosure. Process 1100 may include one or more
operations, functions or actions as illustrated by one or more
operations. Process 1100 may form at least part of a next
generation video coding process. By way of non-limiting example,
process 1100 may form at least part of a next generation video
encoding process as undertaken by coder system 100 and 200 of FIGS.
1-2 or dominant motion compensation coder sub-systems 1800 and 1900
of FIGS. 18-19, and/or any other coder system or subsystems
described herein.
[0237] Process 1100 may begin with "obtaining frames of pixel data
and having a current frame and a decoded reference frame to use as
a motion compensation reference frame for the current frame"
1102.
[0238] Thereafter, the process 1100 may comprise "forming a warped
global compensated reference frame by displacing at least one
portion of the decoded reference frame by using global motion
trajectories". This is explained in detail below. The at least one
portion may refer to single portion of the frame, many portions, or
the entire frame. The portion may be a block, a tile such as a
coding tree block, and/or a region or partition of the frame. The
region may or may not be associated with an object in the frame (or
in other words, an object shown on the image the frame provides),
and may or may not have a boundary that is shaped like the
object.
[0239] The process 1100 may also comprise "determining a motion
vector indicating the motion of the at least one portion and motion
from a position based on the warped global compensated reference
frame to a position at the current frame". This may be performed by
motion estimation calculations.
[0240] The process 1100 may also include "forming a prediction
portion based, at least in part, on the motion vectors and
corresponding to a portion on the current frame". Thus, in this
case, motion vectors may be applied to adjust the position of the
block, tile, region, or object, before the pixel values are used in
that portion to form a prediction that may be compared to the
corresponding area of an original frame to determine if there is
any residual that warrants coding.
[0241] Referring to FIG. 12, by yet another alternative, dominant
motion compensation includes performing local global motion
compensation on portions that are less than an entire frame before
using the pixel values as predictions and, in one approach, without
determining motion vectors at least for that portion. Specifically,
an example process 1200 is arranged in accordance with at least
some implementations of the present disclosure. Process 1200 may
include one or more operations, functions or actions as illustrated
by one or more operations. Process 1200 may form at least part of a
next generation video coding process. By way of non-limiting
example, process 1200 may form at least part of a next generation
video encoding process as undertaken by coder system 100 and 200 of
FIGS. 1-2, or dominant motion compensation coder sub-systems 2400
and 2500 of FIGS. 24-25, and/or any other coder system or
subsystems described herein.
[0242] The process 1200 may be a computer-implemented method for
video coding, and comprises "obtaining frames of pixel data and
having a current frame and a decoded reference frame to use as a
motion compensation reference frame for the current frame"
1202.
[0243] The process 1200 also may include "dividing the reference
frame into a plurality of portions that are less than the area of
the entire frame" 1204. Thus, the frame may be divided into
portions that are a uniform unit such as a block or a tile such as
a coding tree block, and so forth. Otherwise, the portion may be
object based, such as a foreground, a background, a moving object
in the frame, or any other object in the frame.
[0244] The process 1200 also may include "performing dominant
motion compensation comprising applying local global motion
compensation on at least one of the portions by displacing the at
least one portion of the decoded reference frame by using global
motion trajectories at a boundary of the portion" 1206.
Specifically, global motion trajectories may be placed at corners
of each, or selected, portions on the frame.
[0245] The process 1200 also may include "form a prediction portion
that corresponds to a portion on the current frame, and by using
the pixel values of the displaced portion" 1208. Thus, in this
case, the pixel values may be used directly from the warped GMC
picture without the use of motion vectors. The local GMC then
provides greater accuracy than applying GMC once to the entire
frame.
[0246] A summary table showing some of the possible options and
features described in detail below are listed on the following
table for both motion vector options and local global motion
compensation options:
TABLE-US-00008 Pic Tile/Sub- Affine Tile/CTB/ Local Coded GMC delta
Sub-CTB Affine Region FIG. DMC Technique Par MV Merge Map LMC Par
Boundary 17 Block based delta MVs 1 1 0 0 0 20-22 Approx
Region-layer based delta 1 1 1 0 0 23 Region-layer based delta MVs
1 1 0 0 1 27 Tile based Affine Motion Pars 0 0 0 1 0 28-30 Approx
Region-layer based Affine 0 0 1 1 0 Motion Pars 31 Region-layer
based Affine Motion 0 0 0 1 1 Pars 32-33 Block Based delta MVs and
Tile 1 1 0 1 0 based Affine Motion Pars Combined
[0247] Referring to FIG. 13, now in more detail, an example process
1300 is arranged in accordance with at least some implementations
of the present disclosure. Process 1300 may include one or more
operations, functions or actions as illustrated by one or more
operations 1302 to 1328 numbered evenly. Process 1300 may form at
least part of a next generation video coding process. By way of
non-limiting example, process 1300 may form at least part of a next
generation video encoding process as undertaken by coder system 100
or 200 of FIGS. 1-2 or gain compensation coder sub-systems 1800 or
1900 of FIGS. 18-19, and/or any other coder system or subsystems
described herein.
[0248] Process 1300 first may include first obtaining frames of
pixel data and having a current frame and a decoded reference frame
1302. As described with encoder 100, a video stream may be provided
to an encoder that has a decoding loop 135 in order to find
residuals and provide the quantized residuals to a decoder. Thus,
frames may be decoded, and used as decoded reference frames to
predict yet other frames. A morphing unit such as unit 120 may be
used to determine which frames are to be modified or morphed by
dominant motion compensation. These frames may already be divided
into units such as macroblocks, prediction blocks, and so
forth.
[0249] Referring to FIGS. 14-16, process 1300 may comprise create
global motion compensation (GMC) warped frames 1304. One form of
the principle of generation of a GMC (morphed) picture given a
decoded reference picture 1400 and global motion trajectories is as
follows. GMC using an affine model involves use of six parameters
that are encoded as three motion trajectories 1404, 1406, and 1408
with one corresponding to each of the three corners of the
reference picture 1400, and the fourth corner treated as
unconstrained. Motion trajectories may be created 1306 by processes
that are well understood. The trajectories may be applied 1308 also
by processes as understood or by using the equations provided
herein as explained below. The resulting GMC warped frame or
picture 1402 appears warped compared to the reference picture 1400.
In other words, a Ref Picture `rectangle` results in a
quadrilateral GMC morphed or warped picture 1402 as shown when
applying the GMC parameter equations. Specifically, here the
quadrilateral itself is not referred to as a reference frame
yet.
[0250] A GMC morphed reference picture or frame 1500 may be formed
1310 from the GMC morphed picture 1402. This is performed to
provide a frame size for ease of computations and comparisons to
the original frame. This may include creating a larger padded
rectangle 1500 encompassing the GMC morphed picture (a trapezoid)
1402 using the top left coordinate as the reference point (or
starting point or connection point). The area outside the
quadrilateral 1402 but inside the rectangle 1500 may be filled by
padding 1506 which consists of simply copying pixels from the
right, top and bottom edges of the quadrilateral 1402, except for
corner pixels (areas of overlap) that may be filled by extending
both horizontally and vertically and averaging pixels. Areas where
the quadrilateral extends out of the rectangle 1500 may be cut or
snipped. By one approach, this GMC morphed reference frame is used
for motion compensation going forward. It will be understood that
the rectangle formed based on the warped picture itself also may be
referred to herein as the warped reference frame since it includes
warped pixel locations of an image.
[0251] By another approach, a virtual GMC morphed picture 1600
(shown in dashed line) may optionally be formed 1312 to proceed
with the motion compensation. This may be provided when the system
has sufficient compute power to handle division efficiently since
such further warping results in significant computation load.
Otherwise, the motion compensation may continue with the warped
reference rectangle 1500 as explained above.
[0252] To provide the virtual GMC morphed picture 1600, the
reference picture may be extended to generate virtual reference
picture 1600 such that the width and height becomes a power of two.
For example if the reference picture 1500 has a width of 720 and
height of 480, then the virtual ref picture would have a width of
1024 and height of 512. As before motion trajectories 1602, 1604,
and 1606 may be computed for each of the three vertices (with the
fourth vertex being unconstrained), and applied to the vertices of
the virtual reference picture 1600 (rather than being applied to
the reference picture 1400 as done before). The resulting warped
quadrilateral (due to application of motion trajectories) is also
shown and referred to as the virtual GMC morphed picture 1600. The
reason for using virtual GMC morphed picture 1600 instead of GMC
morphed picture 1402 for generation of a morphed reference picture
1600 has to do with the fact that the motion compensation process
often involves use of much higher precision (often 1/8.sup.th pixel
precision) than integer pixel, and thus may require interpolation
which requires division for scaling. By working with pictures that
are powers of 2, scaling related divisions simply become shifts and
are much computationally simpler for a decoder.
[0253] By a first optional approach, mathematically the affine
transform process is described by the following equations that use
affine parameters a, b, c, d, e, f to map a set of points (x, y) in
a previous picture to a modified set of points (x', y').
x.sub.i'=ax.sub.i+by.sub.i+c (10)
y.sub.i'=dx.sub.i+ey.sub.i+f (11)
[0254] It will be understand for all equations herein that any of
(), (*) or (x) simply refer to multiplication. Equations (10) and
(11) effectively modify or morph the reference frame so it can then
be used for more efficient motion compensation for a current frame
being analyzed. This model is transmitted as three motion
trajectories, one for top-left corner of the picture, one for
top-right corner of the picture, and one for bottom-left corner of
the picture. Affine parameters are calculated (fixed point
arithmetic) for a virtual picture which is assumed to be of width
and height of nearest power of 2 number which is greater than the
coded picture. This removes division operations at the decoder.
Formally, assume for 3 vertices (x0, y0), (x1, y1), (x2, y2)
corresponding motion trajectories mt0, mt1, and mt2 are given and
can be represented as (dx0,dy0), (dx1, dy1), and (dx2, dy2) say in
1/8 pel units. Where:
x0=0,y0=0
x1=W*8,y1=0
x2=0,y2=H*8
where W is the width of a picture and H is the height of the
picture. Then, rounding W and H to powers of 2, derive W' and H' as
follows.
W'=2.sup.r:W'>=w,2.sup.r-1<W (12)
H'=2.sup.s:H'>=H,2.sup.s-1<H (13)
The affine parameters A, B, C, D, E, F can then be calculated as
follows.
C=dx0 (14)
F=dy0 (15)
A=W'*((x1+dx1)-(x0+dx0))/W (16)
B=W'*((x2+dx2)-(x0+dx0))/W (17)
D=H'*(((y1+dy1)-(y0+dy0))/H) (18)
E=H'*(((y2+dy2)-(y0+dy0))/H) (19)
Other option to calculate the morphed or warped reference frame are
provided below.
[0255] Process 1300 also may include define frame portions for
motion vectors 1304. Here, three options are provided and described
below. In one option, motion vectors are provided on a
block-by-block basis, and may be defined prediction blocks 1306
such as macroblocks or other prediction or coding units. By another
option, the frame may be divided into tiles, which may be blocks of
64.times.64 pixels or more. For this option the tiles may be
grouped into regions associated with an object such as being part
of the background or foreground. A motion vector is then determined
for each region. By a third option, the regions may be defined
directly without any initial division of large tiles or blocks
although the boundary to such regions may be defined to fit small
blocks (such as 4.times.4 pixels).
[0256] Referring to FIG. 17, for the first option, block-based,
dominant motion compensation of where each block uses a delta
motion vector (MV) correction with respect to an affine GMC
reference picture. This includes first using 1316 or obtaining
blocks such as prediction macroblocks that may be 16.times.16
pixels, or other size block units of a current frame that are
larger as shown, and that are displaced on a warped reference
frame.
[0257] By one example, a current frame 1702 with a star shaped
object 1704 to be coded may be divided into blocks A2, B2, C2 for
coding, and a GMC Morphed Reference frame 1700 may be derived using
a past decoded reference frame and GMC motion trajectories that
formed the warped quadrilateral 1708. The blocks of pixels A2, B2,
C2 from the current frame 1702 are matched during motion estimation
to find closest matches at A1, B1, and C1 respectively. The first
block match is offset by delta motion vector .DELTA.mv1 (1714), the
second block match is offset by delta motion vector .DELTA.mv2
(1716), and the third block match is offset by delta motion vector
.DELTA.mv3 (1718) in the warped and padded GMC Reference picture
1700. While only three blocks are shown, it is assumed that the
entire picture may be divided into blocks on a block grid, and for
each block, a delta motion vector can be computed that provides the
best match in the GMC reference picture 1700. Also, while blocks
are shown to be of medium size, generally the blocks can be large,
small, or each block can be one of a few permitted sizes, and
whatever size may be needed to provide the right tradeoff between
the reduction in Motion Compensated Prediction error versus the
cost of extra delta motion vector information that needs to be
coded and transmitted.
[0258] Once the blocks are established, motion estimation may be
performed 1326 to determine delta motion vectors based on the
warped position of the portion of the frame, which is a block in
this case. In a practical coding scenario (explained further via
FIGS. 18-19), it is expected that different blocks may use
different coding modes such as DMC, Gain, Register, SR, PI,
reference no parameters, or intra as described above, and that
maximize the reduction in prediction error with regard to coding
cost. Thus, in reality, only a small portion of blocks in a frame
may actually be coded with DMC and thereby require delta motion
vectors.
[0259] To get higher prediction efficiency, the delta motion
vectors may be kept at 1/4 or 1/8 th pel precision. To reduce the
cost of sending delta motion vectors, the motion vectors identified
herein as (.DELTA.mv) can be coded efficiently with prediction
following the method similar to coding of normal motion vectors.
The process 1300 used with blocks as described herein may not be
referred to as GMC prediction since it uses delta motion vectors
pointed to source blocks at a GMC reference picture 1700. Rather,
this is considered a type of dominant motion compensation (DMC),
and may be referred to motion vector DMC. Other forms of DMC exist
as described below. This difference (between GMC and DMC) here,
however, is not minor. It forms an adjustment to the pixel
locations that may significantly decrease prediction error over
known GMC providing more efficient coding.
[0260] Also, the method described herein is simpler than that
discussed for warped reference frame 1000 (FIG. 10) as warped
reference frame 1700 does not require separate knowledge of a
foreground or a background object, while the process illustrated
for a warped reference frame 1700 still extends the principle of
GMC to DMC.
[0261] Once the delta motion vectors are established, the process
1300 may continue with form prediction 1328 (or specifically a
prediction portion or in this case a prediction block) using the
portion identified by the motion vector. A simple technique such as
bilinear interpolation may be used for generating the necessary DMC
prediction block. More sophisticated methods can also be used as
follows:
[0262] The following is one method for generating a morphed
reference (MRef)
[0263] 1. (Ref Method) Morphed Reference Using Bilinear
Interpolation:
[0264] A, B, C, D, E, & F are affine parameters calculated from
the three motion trajectories transmitted.
x=(A*j+B*i+C<<r)>>r (20)
y=(D*j+E*i+F<<s)>>s (21)
where (j, i) is current pixel location (on the current frame being
analyzed), << and >> are left and right bitwise shifts,
and (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy on the morphed or modified reference frame.
p.sub.y=y & 0.times.7 (22)
p.sub.x=x & 0.times.7 (23)
y.sub.0=y>>3 (24)
x.sub.0=x>>3 (25)
where (x.sub.0, y.sub.0) is the integer pel location in the Ref
Image (reference frame), and p.sub.x, p.sub.y is the 1/8th pel
phase, & 0.times.7 refers to bitwise AND with (the binary value
of 7 using 8 bits). These represent four corner points used to find
a weighted average value for a pixel in the middle of them. Then,
the morphed or modified reference is constructed as follows:
MRef[i][j]=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]p.sub.x*(8-p.su-
b.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][x.sub.0]p.-
sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6 (26)
where MRef is the morphed reference frame, and recited in a
different form:
MRef [ i ] [ j ] = ( ( 8 - p x ) ( 8 - p y ) Ref [ y 0 ] [ x 0 ] +
p x ( 8 - p y ) Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 - p x ) Ref [ y 0
+ 1 ] [ x 0 ] + p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ] + 31 ) 6 ( 27 )
##EQU00004##
[0265] 2. Motion Compensated Morphed Reference Prediction using
Bilinear Interpolation & MC Filtering:
[0266] By another alternative to determine the morphed reference
and predictions, motion vectors and variety of block sizes may be
factored into the equations as follows. (iMVx, iMVy) is the
transmitted motion vector in Sub-Pel Unit (f.sub.s) for a block at
(j, i) of size (W.sub.b.times.H.sub.b). A, B, C, D, E, & F are
affine parameters calculated from the three Motion trajectories
transmitted. Using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.t] of norm T, f.sub.s is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.t is the number MC Filter Taps, and
i'=i+(iMVy/f.sub.s) (28)
j'=j+(iMVx/f.sub.s) (29)
p.sub.i=iMVy & (f.sub.s-1) (30)
p.sub.j=iMvx & (f.sub.s-1) (31)
where (j', i') is integer motion adjusted current pixel location in
a Morphed Reference Image, and p.sub.j, p.sub.i are the 1/8.sup.th
pel phases in the Morphed Reference Image. To create an MRef Image
then:
x=(A*j+B*i'+C<<r)>>r (32)
y=(D*j'+E*i'+F<<s)>>s (33)
where (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy for location (j', i')
p.sub.y=y & 0.times.7 (34)
p.sub.x=x & 0.times.7 (35)
y.sub.0=y>>3.times.0=x>>3 (36)
where (x.sub.0, y.sub.0) is the integer pel location in Ref Image.
p.sub.x, p.sub.y is the 1/8.sup.th pel phase.
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub.x*(8-p-
.sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][x.sub.0-
]+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6 (37)
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
(38)
where:
m=[-N.sub.t/2-1,H.sub.b+N.sub.t/2], (39)
n=[0,W.sub.b-1], (40)
k=[-N.sub.t/2-1,N.sub.t/2] (41)
and,
Pred.sub.ji[m][n]=SUM.sub.k(h[p.sub.j][k]*tPred.sub.h[m+k][n])/T,
(42)
where
m=[0,H.sub.b-1], (43)
n=[0,W.sub.b-1], (44)
k=(-N.sub.t/2-1,+N.sub.t/2] (45)
MRef is the morphed reference frame, tPred.sub.h is the
intermediate Horizontal Interpolation, and Pred.sub.ji is the final
Motion Compensated Morphed Reference Prediction.
MRef [ i ' ] [ j ' ] = ( ( 8 - p x ) ( 8 - p y ) Ref [ y 0 ] [ x 0
] + p x ( 8 - p y ) Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 - p x ) Ref [
y 0 + 1 ] [ x 0 ] + p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ] + 31 ) 6 (
46 ) tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p j ] [ k ] MRef
[ i ' + m ] [ j ' + n + k - N t 2 + 1 ] where ( 47 ) m = [ - N t /
2 + 1 , H b + N t / 2 - 1 ] , ( 48 ) n = [ 0 , W b - 1 ] , ( 49 )
Pred ji [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p i ] [ k ] tPred h [
m + k - N t 2 + 1 ] [ n ] where : ( 50 ) m = [ 0 , H b - 1 ] , and
n = [ 0 , W b - 1 ] , ( 51 ) ##EQU00005##
[0267] 3. Morphed Reference Using Block MC Filtering:
[0268] By yet another alternative, A, B, C, D, E, & F are
affine parameters calculated from the three Motion trajectories
transmitted. Using separable MC Filters with filter coefficients
h[f.sub.s][N.sub.t] of norm T. f.sub.s is the Sub-Pel Factor (e.g.
2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and where N.sub.t is the
number MC Filter Taps
x=(A*j+B*i+C<<r)>>r (52)
y=(D*j+E*i+F<<s)>>s (53)
where (j, i) is every (W.sub.s.times.H.sub.s) sub-block location in
the current image (typically 4.times.4, 8.times.4, or 8.times.8
sub-blocks), x and y are reference pixel coordinates in 1/8.sup.th
Pel accuracy.
p.sub.y=y & 0.times.7 (54)
p.sub.x=x & 0.times.7 (55)
y.sub.0=y>>3 (56)
x.sub.0=x>>3 (57)
where (x.sub.0, y.sub.0) is the integer pel location in the
reference frame (Ref Image), and p.sub.x, p.sub.y is the 1/8.sup.th
pel phase.
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
, (58)
m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2], (59)
n=[0,W.sub.s-1], (60)
k=[-N.sub.t/2-1,+N.sub.t/2] (61)
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
(62)
m=[0,H.sub.s-1], (63)
n=[0,W.sub.s-1], (64)
k=[-N.sub.t/2-1,+N.sub.t/2] (65)
where MRef is the morphed reference frame; PredH is the
intermediate Horizontal Interpolation.
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] ( 66 ) m = [ - N t / 2 + 1 , H s
+ N t / 2 - 1 ] , ( 67 ) n = [ 0 , W s - 1 ] , ( 68 ) MRef [ i + m
] [ j + n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [ m + k -
N t 2 + 1 ] [ n ] ( 69 ) m = [ 0 , H s - 1 ] , ( 70 ) n = [ 0 , W s
- 1 ] , ( 71 ) ##EQU00006##
[0269] 4. Motion Compensated Morphed Reference Prediction Using
Single Loop MC Filtering:
[0270] By yet a further alternative that factors in motion vectors
and variance in block size, (iMVx, iMVy) is the transmitted Motion
Vector in Sub-Pel Units (f.sub.s) for a block at (j,i) of size
(W.sub.b.times.H.sub.b). A, B, C, D, E, & F are affine
parameters calculated from the three motion trajectories
transmitted. Using separable MC Filters with filter coefficients
h[f.sub.s][N.sub.t] of norm T, f.sub.s is the Sub-Pel Factor (e.g.
2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and N.sub.t is the number
MC Filter Taps.
i'=(i+u*H.sub.s)*fs+iMVx (72)
j'=(j+v*W.sub.s)*fs+iMVy (73)
where (j, i) is the current block pixel location, (u, v) is the
index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block is typically 4.times.4,
8.times.4, or 8.times.8. Below, i',j' is motion adjusted current
pixel location in f.sub.s sub-pel accuracy.
x=((A*j'+B*i'+(C*f.sub.s)<<r)>>(r+3) (74)
y=((D*j'+E*i'+(F*f.sub.s)<<s)>>(s+3) (75)
where x & y are reference pixel coordinates in f.sub.s sub-pel
accuracy
p.sub.y=y & (f.sub.s-1) (76)
p.sub.x=x & (f.sub.s-1) (77)
y.sub.0=y/fs (78)
x.sub.0=x/fs (79)
where y.sub.0, x.sub.0 is the integer pel location in Ref Image,
px, py is the 1/8.sup.th pel phase.
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
, (80)
m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2], (81)
n=[0,W.sub.s-1], (82)
k=[-N.sub.t/2-1,+N.sub.t/2] (83)
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.-
h[m+k][n])/T,
m=[0,H.sub.s-1], (84)
n=[0,W.sub.s-1], (85)
k=[-N.sub.t/2-1,+N.sub.t/2], (86)
v=[0,W.sub.b/W.sub.s-1], (87)
u=[0,H.sub.b/H.sub.s-1] (88)
where tPred.sub.h is the intermediate Horizontal Interpolation, and
Pred.sub.ji is the final Motion Compensated Morphed Reference
Prediction for block of size W.sub.b.times.H.sub.b at (j, i).
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] for : ( 89 ) m = [ - N t / 2 + 1
, H s + N t / 2 - 1 ] , ( 90 ) n = [ 0 , W s - 1 ] , ( 91 ) Pred ji
[ uH s + m ] [ vW s + n ] = 1 T ' k = b N t - 1 h [ p y ] [ k ]
tPred h [ m + k - N t 2 + 1 ] [ n ] for : ( 92 ) m = [ 0 , H s - 1
] , ( 93 ) n = [ 0 , W s - 1 ] , ( 94 ) u = [ 0 , H b / H s - 1 ] (
95 ) v = [ 0 , W b / W s - 1 ] ( 96 ) ##EQU00007##
[0271] As described above with FIG. 1, once the prediction for a
block or other portion is established, the best prediction is
chosen among the alternatives that are calculated for a particular
portion or block, if any. The pixel values of the best prediction
are compared to those corresponding pixel values of the original
frame, and the difference, if any, is a residual that is coded and
transmitted to the decoder.
[0272] Referring now to FIG. 18, a portion or sub-system of a NGV
or modified HEVC encoder 1800 uses the block-based type of DMC
prediction 1316 using delta motion vectors as described above.
While HEVC standard does not support Global (or Dominant) Motion
Compensation, or Morphed or Synthesized references, it does support
plain references and can be modified to the encoder sub-system
1800.
[0273] As compared to FIG. 1, which may share much of the same or
similar functionality, encoder 1800 provides a simplified
representation that focuses on DMC prediction. As discussed
earlier, decoded frames obtained from deblock filtering and QR
filtering are stored in Decoded Prediction Reference (DPR) Picture
Buffers 1802 (also referred to as multi reference frame store 119
of FIG. 1) for use by morphed reference analysis and generation
logic or synthesized reference analysis and generation logic. For
simplicity, other than DMC, the details of other components
employed for morphing are omitted so that here the morphed
reference generation logic is simply labeled as Other Morph
Analyzer, Generator & Picture/s Buffer 1808. Similarly, details
of synthesized reference analysis and generation are hidden so the
synthesized reference analysis and generation logic is labeled as
Other Synth Analyzer, Generator & Picture/s Buffer 1810.
[0274] In this figure GMC/DMC components are separated and shown
explicitly as compared to other morphing related components and
logic. The main components are Global or Dominant Motion Estimator
& Compensated Prediction Picture/Tile/Partition Generator 1804,
Dominant Motion Compensation Local/Picture Buffer 1806, and an
interpolation subset 1815 of a Motion Compensated Filtering
Predictor 1814, whereas the control logic includes global motion
trajectory information gmt and delta motion vectors .DELTA.mvs. It
will be noted that components of other alternative (non-block
based) processes for defining frame portions to be used with motion
vectors are shown in dashed line and are described later below.
[0275] In operation, decoded and filtered picture/s stored in DPR
buffer 1802 are input to the Global or Dominant Motion Estimator
& Compensated Prediction Picture/Tile/Part. Generator 1804
that, in the present block-based alternative, performs global
motion estimation (GME) producing global motion parameters
(represented as gmt trajectory) for an entire frame (where the
trajectories are located at the corners of the frame) and
generating GMC Ref Picture(s) or regions that are stored in
Local/Picture Buffer 1806 for Dominant Motion Compensation. Next,
the block motion estimator and partitions motion assembler 1812
performs block based motion estimation resulting in motion vectors
or delta motion vectors of blocks (or partitions), and these motion
vectors are used by motion compensated predictor 1814 (referred to
here as Bi-Tree Partitions Char and Motion Compensated Adaptive
Precision Filtering Predictor) that generates prediction blocks by
sub-pixel interpolation using interpolation unit 1815. The
alternative choices of various sub-pixel interpolators are
explained above (alternatives 1 to 4). The output prediction blocks
(partitions) from motion compensated predictor 1814 are fed to the
Prediction Modes and Reference Types Analyzer 1816. Also, intra
prediction blocks (or partitions) are input to the Prediction Modes
and reference types analyzer 1816 from an intra directional
prediction analyzer and generator 1818. On a block (or partition)
basis, the Pred Modes & Ref Types Analyzer 1816 determines the
best prediction block from various choices (e.g. DMC prediction is
one of the many choices available) and outputs it to a differencer
(in portion of the circuit not shown here) that generates a
prediction error for coding. Further, the entropy coder 1820, also
referred to as the Entropy Encoder Morphing and Synthesis
Parameters & MVs, encodes GMC/DMC parameters and data, i.e.,
gmt, and .DELTA.mvs. For simplicity other needed information such
as modes and reference information that identifies if a block
(partition) uses DMC or some other morphed or synthesized
prediction type (or intra type) and further, the picture to use as
reference is not shown.
[0276] As can be noted, the encoder sub-system 1800 can be
relatively extended or modified for other video compression
techniques. For instance the DMC approach discussed here can also
be made to work with by first extending current H.264, and the
upcoming HEVC standards. The HEVC standard does not support GMC or
any of the morphing or synthesized prediction modes of NGV, but it
does support multiple reference based prediction so before DMC
based improved delta mvs can be added, GMC would need to be added
first to the HEVC standard. Further, NGV codec uses Tiles and
Bi-Tree partitioning for motion compensation, but HEVC uses the
concept of Coding Tree Blocks (CTB) or Largest Coding Unit (LCU),
and quadtree partitioning into Coding Units (CUs) and small
codebook of 8, based partitioning into Prediction Units (PUs), that
are functionally similar although they have outwardly different
processing structures. Mode and ref type information is available
in HEVC but would need to be extended to support GMC/DMC
extensions.
[0277] Referring to FIG. 19, a portion or subsystem of NGV or
modified HEVC decoder 1900 may be complementary to the related
components of the encoder 1800. As compared to decoder 200 with
which it essentiality shares the same or similar functionality,
decoder 1900 shows a simpler representation that focuses on DMC
prediction. Decoder 1900 uses Dominant Motion Compensation of
Blocks and by using delta MV's correction. HEVC standard does not
support Global (or Dominant) Motion Compensation, or Morphed or
Synthesized references, but does support plain references.
[0278] The entropy decoder 1901 (also called Entropy Decoder
Morphing and Synthesis Params & MVs) first decodes morphing and
synthesis parameters, motion vectors, delta motion vectors (shown),
and modes and reference type decisions (not shown). Decoded and
filtered picture/s may be stored in DPR buffer 1902 and are input
to Global or Dominant Motion Compensated Prediction
Picture/Tile/Partition Generator 1904 that uses decoded gmt
parameters to generate GMC Reference Picture(s) or regions that are
then stored in a Dominant Motion Compensated Local/Picture Buffer
1906. As in the case of the encoder, a motion compensated
prediction unit 1912, also called Bi-Tree Partitions Char and
Motion Compensated Adaptive Precision Filtering Predictor, is
specific to NGV coding, but in general any type of motion
compensated predictor could have been used. The set of
prediction-interpolation alternatives (1 to 4) described above may
be used by an interpolator unit or sub-system 1915 of the
prediction unit 1912. Upon decoding, use of the delta motion
vectors by the motion compensated prediction unit 1912 results in
dominant motion compensated blocks (partitions) that are sent to
the Prediction Mode Selector 1914, along with predictions from an
intra directional prediction generator 1916, which uses decoded
reference type and mode info to output, for each or multiple block
(partition), the best prediction block (partition). This is
identical, or similar, to the prediction process in the local
prediction loop at the encoder (the encoder also uses
analysis/estimation process that is not part of the decoder).
[0279] Similar to encoder 1800, decoder 1900 also has a simplified
view of other morphing components. Thus, Other Morph. Generator
& Picture/s Buffer 1908) and parameters (mop), and synthesis
components Other Synth Generator & Picture/s Buffer 1910 and
parameters (syp) are not elaborated upon here. Further, as with
encoder 1800, the portion of decoder 1900 shown can be adapted to
work with extended H.264, or HEVC video standards. As mentioned
earlier, the HEVC standard does not include morphing and synthesis
types of the NGV codec and only shares the commonality of multiple
reference frame prediction, so the HEVC standard would have to be
extended to support the DMC prediction modes. The NGV and the HEVC
standard also differ in partitionings employed. That difference,
however, is mainly superficial since at a functional level the
processes employed are similar.
[0280] Referring to FIGS. 20-22, the process 1300 may alternatively
include divide 1318 the frame into tiles, and then group 1322 the
tiles into regions, before applying the delta motion vectors (note
herein motion vectors may mean delta motion vectors depending on
the context). Specifically, while the block based delta motion
vector type of DMC approach discussed via warped reference frame
1700 may be simple, it is not as efficient as it could be regarding
overhead management since delta motion vectors are provided on a
block basis in addition to the cost of coding global GMC trajectory
(gmt) parameters.
[0281] Three variations of warped reference frames 2000, 2100, and
2200, are provided to show the use of delta motion vectors for DMC
on other portions of a frame rather than, or in addition to, a
block-by-block basis. A warped reference frame 2000 (FIG. 20) may
be used to provide dominant motion compensation of approximate
region-layers having a group of merged Tiles or CTBs, where each
region uses a delta MV correction with respect to an affine GMC
reference picture. Thus, a significant reduction in overhead may b
e obtained by using larger tiles instead of blocks, and more
importantly, grouping of tiles together to further reduce the
number of delta motion vectors that need to be sent. For instance,
a warped reference 2000 may have been formed from a warped
quadrilateral 2008 based on a decoded reference frame, and is used
as a reference for a current frame 2002 to be coded. Both the
current frame 2002 and the warped reference frame 2000 may be
divided into tiles, and here nine tiles are shown, but the number
of tiles may be different. By one approach, the tiles may be a
64.times.64 array of luma pixels and corresponding chroma pixels.
The tiles may be larger as well. As shown, the tiles may grouped as
foreground tiles 2004 and background tiles 2006 on the current
frame.
[0282] By grouping of tiles based on whether they predominantly
include foreground (FG) or Background (BG) in a picture requires
only two delta mvs, one for correction of FG, and the other for
correction of BG in the GMC Reference frame 2000. Thus, only two
delta mvs 2014, 2016, one for FG another for BG, need be coded and
transmitted, and a binary mask that distinguishes FG from BG tiles.
Since tile are often large, the overhead bits required for sending
the mask is often much smaller than that incurred in sending a
single delta motion vector with each block or tile. Further, if for
certain tiles, another mode instead of DMC is to be used then such
tiles do not need to be included in the FG/BG mask. In other words,
the position of the tiles at the warped reference frame are still
adjusted by the delta motion vector tile-by-tile such as for tile
2010 or 2012, except now, the same delta motion vector is used for
each tile in the group or region. Thus, all foreground tiles have
the same vector and all the background tiles have the same vector.
This may depend on how the computations proceed and very well may
be the same as, or stated as, moving the entire region together
with the single motion vector. The approach discussed thus far also
applies to the case of more than two regions (multi-object
segmentation) and requires sending multiple exclusive
tile-to-object/region maps instead of a single FG/BG map.
Generally, the portions, regions or region-layers may be groups of
tiles that have an association to the same object whether that
object is a background, a foreground, a moving object, or any other
object. It is contemplated that the tiles may be grouped by other
common associations such as to size or position on the frame
regardless of the image displayed by the frame, and so forth. This
applies to any of the groupings discussed herein. While the term
tiles from NGV has been used thus far, if DMC is used in
conjunction with HEVC, the same principle applies with the term
CTB/LCU replacing the term tile.
[0283] Referring to FIG. 21, while warped reference frame 2000
shows a first variation of delta my type of DMC with increased
reduction of overhead, since tiles can be large in size, the
approximate FG/BG region classification based on a tile map can be
rather coarse, which in turn may not result in sufficient DMC
prediction (measured by reduction in DMC prediction error). Thus, a
warped reference frame 2100 may be used to perform dominant motion
compensation of approximate region-layers of merged Bi-tree
partitions of tiles or CTBs with each or multiple region-layers
using delta MV correction with respect to an affine GMC reference
frame in order to further reduce overhead. Specifically, a warped
reference frame 2100 may be used as a reference to code current
frame 2102. Current frame 2102 has foreground (FG) tiles 2108 and
background (BG) tiles 2110. In this arrangement, however, frame
2102 has a variation of tile sizes and permits tiles to be split
into two horizontally or vertically, allowing for border tiles 2104
and 2106 to be more accurate, and thus the approximate region
boundary to be more accurate, and hence the prediction to be more
accurate (resulting in further reduction of prediction error).
[0284] Specifically, current frame 2102 has some tiles 2108 and
2110 that are complete, be it in FG or BG regions, while many tiles
in the FG/BG border are either horizontally (2104) or vertically
(2106) by bi-tree partition splits allowing improved accuracy with
FG/BG approximate region-layers to be constructed. Further, warped
reference frame 2100 generated by first estimating global gmt
parameters for the entire current picture with regard to a
reference frame, and then warping the reference frame using the
computed parameters to compute a warped GMC frame 2116, and then
padding it with a boundary extension to create the rectangular
warped GMC Reference frame 2100. In this GMC Reference frame 2100,
two delta motion vectors 2122 and 2124 are then computed, one for
approximate FG (full tiles and half tiles that form the approximate
foreground region), and the other for approximate BG (remaining
full tiles and half tiles that form the background region) such
that these delta mvs 2122 and 2124 further allow improved DMC
prediction instead of using collocated approximate FG/BG regions
(composed of tiles and bi-tree partitioned tiles). In this case,
warped and adjusted tile 2126 is used to position current frame
tile 2110, while warped and adjusted tile halves 2120 and 2122
respectfully are references for current frame tile halves 2106 and
2108.
[0285] Referring to FIG. 22, by another alternative, a warped
reference frame 2200 may provide dominant motion compensation of
approximate region-layers of merged quad-tree partitions of tiles
or CTBs with each region-layer using delta MV correction with
respect to affine GMC Reference frame. Thus, in this variation,
current frame 2202 to be coded is permitted to have tiles split
into quads 2206 and 2208 (1/2 both horizontally and vertically),
allowing for border tiles to be more accurate, and thus the
approximate region boundary to be more accurate, and hence the
prediction to be more accurate (resulting in further reduction of
prediction error). Specifically, current frame 2202 has some tiles
2204 and 2210 that are complete, be it in FG or BG regions, while
many tiles in the FG/BG border are both horizontally and vertically
(quad-tree partitions) split allowing improved accuracy FG/BG
approximate region-layers to be constructed. Further, GMC Reference
frame 2200 generated by first estimating global gmt parameters for
the entire current frame picture with regard to a reference
picture, and then warping the reference picture using the computed
parameters to compute warped GMC Picture 2114, and then padding it
with a boundary extension to create rectangular GMC Reference
Picture (or warped reference frame) 2200. In this GMC Reference
frame 2200, two delta motion vectors 2222 and 2224 are then
computed, one for approximate FG (full tiles and quarter tiles that
form approx. foreground region), and the other for approximate BG
(remaining full tiles and quarter tiles that form background
region) such that these delta mvs further allow improved DMC
prediction instead of using collocated approx. FG/BG regions
(composed of tiles and quad-tree partitioned tiles). As can be
seen, here quarter tiles 2206 use warped and adjusted reference
quarter tiles 2216, and whole background tile 2210 uses the
reference whole tile 2220.
[0286] Referring again to FIGS. 18-19, encoder 1800 and decoder
1900 may be slightly modified to provide components to perform
approximate region-layer delta mv based DMC approach for frames
2000, 2100, and 2200 discussed above. The region-layers may be
merged tiles or CTBs, or merged (Bi-/Quad-) tree partitions of
Tiles/CTBs) each using delta MV's correction. HEVC standard does
not support Global (or Dominant) Motion Compensation, or Morphed or
Synthesized references, but does support plain references. The
decoder 1900 may use a Content Blocks Props Analyzer &
Approximate Region Segmenter 1822 that analyzes each input picture
of a video sequence and segments it into approximate region-layers.
For the purpose of explanation assuming each picture is segmented
into two region-layers, an approximate foreground (FG) region-layer
and an approximate background (BG) region-layer, a merge map may be
provided and has data that carries the mapping of tiles (and
partitions of tiles) into one of the two categories. A mapper 1824
(also referred to as an approximate regions to tile/CTB and
partitions mapper) receives boundary data or parameters (also
referred to as Tile/CTB & Bi/Quadtree Partitions Boundary) in
order to construct the merge map.
[0287] Both the current picture (being processed) as well as the
past decoded reference picture (one of the pictures from DPR
Picture Buffer 1802) are input to the Global or Dominant Motion
Estimator & Compensated Prediction Picture Generator 1804 so
that delta mv DMC parameters can be computed for each of the two
approximate regions and by using the DMC Reference Picture. The
merge map is used by the motion compensated predictor (also called
the Bi-Tree Partitions Char. and Motion Compensated Adaptive
Precision (AP) Filtering Predictor) and by the block motion
estimator and partitions motion assembler 1812 (also called the
4.times.4 Block Motion Estimator 1/4 & 1/8 pel Accuracy and
Partitions Motion Assembler), of which the latter is used to
compute delta motion vectors and the former is used to compute
actual motion compensated DMC prediction using these delta mvs. The
result is a complete DMC Ref Picture. The DMC approximate region
based predictions using DMC Ref Pictures are fed to the Pred Modes
& Ref Types Analyzer 1816, along with the input from the intra
directional predictor 1818, and the process proceeds as described
above on a Tile or partition basis where the Pred Modes & Ref
Types Analyzer 1816 determines the best prediction from various
choices. Further, the entropy coder 1820 encodes DMC data, such as
gmt parameters, .DELTA.mvs and merge map for this alternative,
along with other data such as mvs, mop (morphing parameters) and
syp (synthesis parameters) and mode info, and so forth.
[0288] Referring to FIG. 19, decoder 1900 may be modified to
perform Dominant Motion Compensation of approximate Region-layers
(of merged Tiles/CTBs, or merged (Bi-/Quad-) tree partitions of
Tiles/CTBs) each using delta MV's correction. An entropy decoder
1901 decodes DMC data such as gmt parameters, delta mvs, and merge
map, as well other data such as mvs, mops (morphing parameters),
and syp (synthesis parameters), and mode info (not shown). The
decoded merge map data is used by the motion compensation predictor
1912 (also called the Bi-Tree Char. and Motion Compensated Adaptive
Precision (AP) Filtering Predictor) to keep track of the type of
(FG/BG) region a tile or partitioned tile being processed belongs
to. The Global or Dominant Motion Compensated Prediction
Pictures/Tiles/Partitions Generator 1904 uses the decoded gmt
parameters on decoded picture/s from the DPR Picture Buffer 1902 to
generate warped GMC Picture, and then with boundary padding to
generate a rectangular GMC Reference frame or picture. In this GMC
Reference picture, a delta motion vector for approximate FG region
is then applied to determine a DMC predicted approximate FG region
consisting of tiles and partitions. Likewise, a delta motion vector
for approximate BG region is then applied to determine DMC
predicted approximate BG region consisting of tiles and partitions.
In case of overlap of FG and BG regions, the averaged pixels of an
overlapping area are used as the reconstruction. Likewise, in the
case of holes between foreground and background, neighboring
background region boundary and foreground region boundary pixels
are averaged or extended to fill these holes. The result is a
complete DMC Reference Picture.
[0289] The Prediction Mode Selector 1914 uses the tiles or
partitions based mode information sent by the encoder via the
bitstream to use tiles/partitions of either the DMC approximate FG
or BG regions from the DMC Reference Picture, or one of the several
available morphed predictors, or synthesized predictors, or intra
predictors. The resulting prediction is then added back (this
portion of decoder is external) to a decoded quantized prediction
error decoded at the decoder to reconstruct a final decoded video
picture.
[0290] Referring to FIGS. 23-25, process 1300 may also include
define object associated regions 1324, rather than defining tiles
and then grouping them into regions, and rather than using blocks
for motion vector based DMC. Here, dominant motion compensation of
segmented region-layers is performed with each region layer using a
delta MV correction with respect to an affine GMC reference frame
with the goal of further improving prediction accuracy/overhead
tradeoff. This variation uses regions (more precisely a collection
of regions belonging to an object), and called a region-layer (RL)
explicitly. A current frame 2300 to be coded is explicitly
segmented into a foreground (FG) region-layer 2304 and a background
(BG) region-layer 2306, with the FG region-layer 2304 comprising a
head and shoulder view 2302 of a person, and the background
containing the rest of the picture including a star 2308. For the
entire picture, GMC parameters gmt are first computed and are used
to generate a GMC Reference frame or Picture (or warped reference
frame) 2310 as described previously. Next, the location of the FG
region-layer 2304 is determined in the GMC Reference frame 2322 and
a single correction delta mv (.DELTA.mvf) 2328 is computed for the
FG region adjusting the position of the region-layer from a warped
position 2326 to an adjusted position 2330 and by the delta motion
vector, such that it reduces the DMC prediction error for the FG
region-layer. Next, a single correction delta mv (.DELTA.mvb) 2318
is computed for the BG region such that the delta motion vector
adjusts the position of the star from a warped position 2314 to an
adjusted position 2316, which also reduces the DMC prediction error
for the background.
[0291] This variation of delta mv type DMC incurs additional cost
in region-layer representation in terms of a differential with
regard to a simple block-based (partition-based) merge map, but in
return allows for further reduction of DMC prediction error with
regard to the previous tile-based technique while requiring the
same number (two) of delta mvs. In fact, all three variations
block, grouped tile, and whole region here offer different
tradeoffs in terms of complexity, overhead, and DMC prediction
error reduction. The block-based variation of warped reference
frame 1700 (FIG. 17) is the simplest in complexity, the
tile-grouping variation/s (FIGS. 20-22) offer a middle ground in
terms of approximate region-layer boundary, while the whole region
variation (FIG. 23) offers the opportunity for more accurate
reduction of DMC prediction error.
[0292] Referring to FIG. 24, an encoder 2400 is provided with
components to perform DMC of segmented region-layers where each
region-layer uses a delta mv correction. The encoder 2400 has a
Content Analyzer and Region Segmenter 2426 that analyzes each input
picture of video sequence and segments it into region-layers. For
the purpose of explanation, assuming each picture is segmented into
two region-layers, a foreground (FG) region-layer and a background
(BG) region-layer, a region boundary (region bndry) has data that
carries the foreground boundary shape (the remaining being the
background). Both the current picture (being processed) as well as
the past decoded reference picture (one of the pictures from DPR
Picture Buffer 2402) are input to the Global or Dominant Motion
Estimator & Compensated Prediction Picture Generator 2404 so
that delta mv DMC parameters can be computed for each of the two
regions and to generate the DMC Reference Picture.
[0293] The region bndry map is used by the motion compensated
predictor 2414 (also called the Regions Char. and Motion
Compensated Adaptive Precision (AP) Filtering Predictor) and by the
regions motion estimator 2412 (also called the Regions Motion
Estimator 1/4 & 1/8 pel Accuracy) of which the latter is used
to compute delta motion vectors while the former is used to compute
actual motion compensated DMC prediction using these delta mvs. The
DMC region based predictions of FG and BG regions are generated
using computed motion vectors for offsetting in GMC Reference
frames, and resulting predictions are fed to the Prediction Modes
& Ref Types Analyzer 2416. The intra data from an intra
directional prediction analyzer and generator 2418 as well as
morphed and synthesized predictions may be provided to the analyzer
2416. On a local sub-region basis, the Prediction Modes & Ref
Types Analyzer 2416 may determine the best prediction from various
choices (e.g. DMC prediction is one of the many choices available),
and outputs it to a differencer (in a portion of the circuit not
shown here) that generates prediction error for coding. Further,
the entropy coder 2420 (called the Entropy Encoder Morphing and
Synthesis Parameters & MVs) encodes DMC data, such as gmt
parameters, delta mvs and region bndry, along with other data such
as mvs, mop (morphing parameters) and syp (synthesis parameters)
and mode info (not shown). As mentioned earlier, this type of
region based DMC makes best sense in context of overall Region
Based Video Coder where a picture is divided into region-layers,
and for ease of processing each region may be divided into
sub-regions.
[0294] Referring to FIG. 25, a portion or sub-system of a region
based decoder 2500 is provided to perform Dominant Motion
Compensation of segmented Region-Layers each region-layer using
delta MV's correction. The decoder 2500 may have an entropy decoder
2518 (also referred to as an Entropy Decoder Morphing &
Synthesis Params) and that decodes DMC data such as gmt parameters,
delta motion vectors, and region bndry, as well other data such as
mvs, mops (morphing parameters), and syp (synthesis parameters),
and mode info (not shown). The decoded region bndry data is used by
a Regions Char. and Motion Compensated Adaptive Precision (AP)
Filtering Predictor 2512 to determine if a sub-region is part of
the FG or BG region. The Global Motion Compensated Prediction
Pictures Generator 2504 uses the decoded gmt parameters on decoded
picture/s from DPR Picture Buffer 2502 to first generate a warped
GMC picture that is then padded to generate a rectangular GMC
Reference Picture. Then using decoded region bndry and delta motion
vectors, DMC prediction of an FG region as well as DMC prediction
of a BG region is generated. The Pred Mode Selector 2514 uses the
sub-region based mode information sent by the encoder via the
bitstream to use sub-regions of either DMC predicted FG or BG
regions from the GMC Reference Picture, or sub-regions of one of
the several available morphed predictors, or synthesized
predictors, or intra predictors. The resulting prediction is then
added back (this portion of decoder is external) to decoded
quantized prediction error decoded at the decoder to reconstruct a
final decoded video picture. The remaining components are similar
to those described before.
[0295] Referring to FIG. 26, a second type of DMC that is referred
to as local global motion trajectory (gmt) type of DMC, or really
global motion compensation applied locally, but simply called local
global motion compensation for short. This may be performed by
computer implemented example process 2600 for local global motion
compensation. Example process 2600 is arranged in accordance with
at least some implementations of the present disclosure. Process
2600 may include one or more operations, functions or actions as
illustrated by one or more operations 2602 to 2626 numbered evenly.
Process 2600 may form at least part of a next generation video
coding process. By way of non-limiting example, process 2600 may
form at least part of a next generation video encoding process as
undertaken by coder system 100 or 200 of FIGS. 1-2 or gain
compensation coder sub-systems 1800 or 1900 of FIGS. 18-19, and/or
any other coder system or subsystems described herein.
[0296] Process 2600 first may include obtaining frames of pixel
data and having a current frame and a decoded reference frame 2602
as described previously. Process 2600 may then include define frame
portions for local global motion compensation 2604. Three
alternative ways to divide a frame into portions is provided and
generally look similar to the three divisions used with delta
motion vectors, except here some key differences exist. Here, the
process 2600 may continue with divide the frame into tiles 2606
instead of blocks and where local GMC is applied to each tile. By
another alternative, the process 2600 may include divide the frame
into tiles 2608 (and/or sub-tiles 2610) and then group the tiles
into regions 2612 so that the same local global motion trajectories
are applied to each tile in the same region. Otherwise, the process
2600 may include define object-associated regions 2614 so that
local GMC will be applied to each region. Each of these options is
explained in detail below. Note on both process FIGS. 13 and 26,
tile is meant in the general sense to mean very large block and
includes CTBs.
[0297] Referring to FIG. 27, dominant motion compensation of tiles
or CTBs may be performed by using affine motion parameters for
generation of affine local motion compensated (LMC) reference for
each or multiple tiles or CTBs. Particularly, a current frame 2702
may be divided into tiles (or rectangular regions) 2704.
[0298] The process 2600 may then continue with create dominant
motion compensated warped portion. To accomplish this, a reference
frame 2700 has individual local gmt GMC morphed tiles (or
rectangular regions) that may be reconstructed. Thus, the process
2600 may continue with determine local global motion trajectories
2618 which also may be referred to as dominant motion trajectories
(dmts). In this variation, each tile (or rectangular region) is
allowed to have its own independent set of gmt DMC (or dmts)
parameters (each of three vertices of the tile can be provided
independent motion with a fourth corner being dependent) instead of
a single set of gmt GMC parameters for the entire frame. For
example, reference frame 2700 may have a tile 2706 that is a
reference for tile 2704 on the current frame 2702. The tile may be
displaced from a position on the initial reference frame by global
motion trajectories (gmt) 2712a, 2712b, and 2712c. Similarly, a
tile 2708 may be displaced by global motion trajectories 2714a,
2714b, and 2714c, while tile 2710 may be displaced by global motion
trajectories 2716a, 2716b, and 2716c, such that each tile or region
has its own gmt set (or dmts set). As mentioned above, the dmts or
local gmt may be obtained by known processes with the affine
methods.
[0299] Once the tile or region is displaced by the trajectories,
then averaging of pixels in overlapping areas of warped neighboring
tiles may be performed, followed by filling in holes also using
averaging of values from available nearby pixels to form the region
rectangle 2622. This is more sophisticated than the rather
simplistic boundary extension performed earlier to create a
rectangular morphed GMC reference frame. In another alternative,
extensions could be used instead to form tile or region rectangles.
The resulting filled picture is then the DMC Ref Picture for this
type of DMC and can be used for tile (or rectangular region) based
DMC motion compensation. As another option, a further virtual
region may be formed 2624 as with the virtual reference frame. The
process 2600 may then include form portion predictions using the
pixels from the warped portions (or tiles or regions) 2626.
[0300] While this method may seem to be motion overhead intensive
due to the need to send gmt parameters for each tile (or
rectangular region), it is important to note that in this type of
DMC, (i) no need exists to transmit picture-wide GMC gmt
parameters, (ii) gmt parameters are sent in the bitstream only for
tiles (rectangular regions) where they are most effective in
reducing DMC prediction error; they are not sent for every tile (or
rectangular region). If neighboring tiles (regions) gmt parameters
are not sent, simple extended padding instead of averaged padding
may be employed. (iii) to further reduce the bit cost of gmt
parameters per tile, these parameters can be coded differentially
with regard to the immediate previous tile (which may reduce the
number of bits needed for each trajectory.
[0301] Referring again to FIG. 18, encoder 1800 may be modified to
be a portion or sub-system of NGV/HEVC Extension-2a Encoder and to
have components to perform an approximate region-layer gmt based
DMC approach used with reference frame 2700 as discussed above. In
this example, the two inputs to the global or Dominant Motion
Estimator & Compensated Prediction Pictures/Tiles/Partitions
Generator 1804 may include the current picture (being processed) as
well as the past decoded reference picture (one of the pictures
from DPR Picture Buffer 1802) so that gmt DMC parameters can be
computed for each tile, and a DMC Reference Picture can be
generated. As mentioned earlier, tile-based DMC parameters can
result in areas of overlap that are resolved by averaging pixels in
overlapped areas, and holes that are resolved by averaging and
filling in a boundary from neighboring tiles. The tile-based DMC
predictions using DMC Reference Pictures may be fed directly from
the dominant motion compensated prediction local/picture buffer
1806 to the Prediction Modes & Ref Types Analyzer 1816 (in
addition to intra prediction input from the intra directional
prediction analyzer and generator 1818 as well as morphed and
synthesized predictions. On a tile-basis, the Prediction Modes
& Ref Types Analyzer 1816 determines the best prediction from
various choices (e.g. DMC prediction is one of the many choices
available) and outputs it to a differencer (in portion of circuit
not shown here) that generates prediction error for coding.
Further, for this alternative, the entropy coder 1820 encodes DMC
data, such as dmts parameters along with other data such as mvs,
mop (morphing parameters) and syp (synthesis parameters) and mode
info (not shown).
[0302] Referring to FIG. 19, as with the encoder 1800, the decoder
1900 may be modified as a portion of NGV/HEVC Extension-2a Decoder
with Dominant Motion Compensation of Tiles or CTBs each using
affine motion parameters. In this alternative, an entropy decoder
1901 decodes DMC data such as dmts parameters as well other data
such as mvs, mops (morphing parameters), and syp (synthesis
parameters), and mode info (not shown). The Global or Dominant
Motion Compensated Prediction Pictures/Tile/Partitions Generator
1904 uses the decoded dmts parameters to generate tile based warped
DMC predictions during which pixels in the areas of overlap are
reconstructed as average of pixels from overlapping tiles, and
while filling any holes (average of nearest boundary pixels of
tiles, or boundary extension on borders of picture) to create a
complete DMC Reference picture for prediction. The Prediction Mode
Selector 1914 may directly, or otherwise, receive the DMC reference
picture (or the warped tiles) from the dominant motion compensated
prediction local picture buffer 1906, and uses the tiles based mode
information sent by the encoder via the bitstream to use the warped
tiles of either the DMC Reference Picture, or one of the several
available morphed predictors, or synthesized predictors, or intra
predictors (intra directional prediction generator 1916). The
resulting prediction is then added back (this portion of the
decoder is not shown) to decoded quantized prediction error decoded
at the decoder to reconstruct final decoded video picture.
Otherwise, the description of the components for this alternative
are as described above with the other alternatives.
[0303] Referring to FIGS. 28-30, alternatively process 2600 may
continue with divide the frame into tiles 2608, and when available,
into sub-tiles 2610. The tiles may then be grouped into regions
2612 as with the delta motion vector option. In this alternative,
however, instead of a motion vector for each region, the tiles here
are each warped with its own set of dmts as explained previously to
create warped portions 2616, and the pixel values of the warped
tiles are used as predictions. Specifically, this option provides
Dominant Motion Compensation with approximate region-layers of
merged Tiles or CTBs each using affine motion parameters for
generation of an affine Semi-global Motion Compensated (SMC)
Reference.
[0304] In more detail, a current frame 2802 may be divided into
tiles (or rectangular regions). In this variation, a group of tiles
(or approximate region-layers) may have their own independent set
of gmt, or more specifically dmt, GMC parameters (each of three
vertices of each tile in the group of tiles can be provided the
same independent motion with fourth vertex of each tile being
dependent). In other words, although a set of motion trajectories
is still applied to each tile, all (or selected ones) of the tiles
in the same region receive the same trajectories. A frame may be
divided into two groups of tiles (or approximate region-layers),
one group corresponding to the background (BG), and the other group
corresponding to foreground (FG). In this scenario, two sets of gmt
GMC parameters, one for FG and the other for BG is sent, along with
a tile (rectangular region) based FG/BG map (merge map) via the
bitstream to the decoder. The operation, however, may not be
limited to division of a picture into two groups of tiles (or
approximate region layers), and can the frame may be divided into
three, four, or more approximate region-layers where each
approximate region-layer has one gmt GMC parameter set that is used
and sent to the decoder. In any case, for all tiles that are
considered as part of the FG group of tiles (approximate
region-layer), a single gmt GMC parameter is computed, and likewise
for the complimentary tiles that are part of the BG group of tiles
(approximate region-layer), a different gmt GMC parameter is
computed.
[0305] This is illustrated by FIG. 28 where the current frame 2802
is divided into foreground tiles 2804 and background tiles 2806.
One background tile 2806 corresponds to a background tile 2808 of a
reference frame 2800. The background tile 2808 is displaced by
dominant motion trajectories 2814, while an adjacent background
tile 2810 is warped by trajectories 2816 that are the same or
similar to trajectories 2814 since both of these tiles are in a
background region. One foreground tile 2804 corresponds to a warped
tile 2812 on the reference frame 2800 and uses foreground
trajectories 2818 that are different from the background
trajectories 2814 and 2816.
[0306] To perform DMC motion compensation, a reference picture is
read from DPR picture buffers, and the collocated tile
corresponding to each tile of an FG region layer is warped at each
of the three vertices (the fourth being free) by using FG gmt
parameters. Next, the process is repeated for each BG tile of
approximate a BG region-layer using the same picture and using BG
gmt parameters. Thus the warped reconstruction of each of
approximate FG and BG region layers is completed. For the
overlapping areas between warped approximate FG and BG
region-layers, an averaging process is used to reconstruct the
final pixels. Likewise, in an area of holes (an area not covered by
either of the approximate FG or BG region-layers), hole filling is
employed using averaged prediction from closest neighboring
boundary of approximate FG and BG region-layers. The resulting
filled picture is then the DMC Reference Picture, and can be used
for tile (or rectangular region) based DMC motion compensation.
[0307] Further, while the example of two region layer
classification into approximate FG/BG regions is provided, the
technique can easily be applied to more than two approximate
region-layers. Since in this technique there will always be two or
more approximate region-layers, instead of the term gmt, the term
dmt (dominant motion trajectory) parameters is used as mentioned
above
[0308] Referring to FIG. 29, Dominant Motion Compensation is
performed with approximate region-layers of merged Bi-tree
Partitioned Tiles or CTBs each using affine motion parameters for
generation of an affine SMC reference frame. Specifically, a
modification of the tile-based process for reference frame 2800 is
provided to improve the accuracy of approximate FG/BG region-layer
classification by horizontally or vertically splitting the full
tiles into half-tiles (such as that from Bi-Tree partitioning),
which can be used in addition to the full tiles. Thus, for
instance, a current frame 2902 to be coded may have foreground full
tiles 2904 and half-tiles 2906. Thus, the FG group of tiles (or
approximate region-layer) may include mostly full tiles but several
horizontal or vertical half-tiles as well. The background also may
have full tiles 2910 as well as horizontal or vertical half-tiles
2912 and 2914. A reference frame 2900 may include both full and
half-tiles 2916, 2922, and 2926. The background full tile 2916 is
shown to be warped from an original position 2920 by trajectories
3018 where each of the background tiles or half-tiles use the same
trajectories 2918. The foreground tiles and half-tiles 2924, 2928,
and 2930 all use the same warping trajectories 2924. The FG/BG
segmentation (merge) map may require slightly higher bit totals due
to higher accuracy in FG/BG approximation. Overall, there will be
only one FG and one BG set of dmts motion parameters still. In the
present example, the overall process of DMC Reference Picture
generation provides improved prediction due to higher FG/BG
accuracy.
[0309] Referring to FIG. 30, Dominant Motion Compensation is
performed with approximate region-layers of merged quad-tree
partitioned tiles or CTBs each using affine motion parameters for
generation of an affine SMC reference. Specifically, a modification
of the tile-based process for reference frame 2800 is provided to
improve the accuracy of approximate FG/BG region-layer
classification by horizontally and vertically splitting the tiles
into quarter-tiles (such as that from Quad-Tree partitioning) can
be used in addition to full tiles. Thus, for instance, the current
frame 3002 may have an FG group of tiles (or approximate
region-layer) and may include mostly full tiles 3004, but several
quarter tiles 3006 and 3008 as well. The background may also
include full tiles 3010 as well quarter tiles 3012. A reference
frame 3000 may include a corresponding full background tile 3018
that is shifted to a warped tile or rectangle 3014 by trajectories
3020, while trajectories 3022 warp a quarter-tile 3024 to a shifted
position 3016 as well. The FG/BG segmentation (or merge) map may
require a slightly higher bit total due to higher accuracy in the
FG/BG approximation. Overall, there will be only one FG and one BG
set of dmts motion parameters still. In the present example, the
overall process of DMC Reference Picture generation provides
improved prediction due to higher FG/BG accuracy.
[0310] Referring again to FIG. 18, encoder 1800 may be modified to
form a subsystem or portion of NGV/HEVC Extension-2b encoder to
perform Dominant Motion Compensation with approximate region-layers
of tiles, merged Bi-/Quad-tree Partitioned Tiles or CTBs each using
affine motion parameters. In other words, the encoder 1800 may be
modified to perform approximate region-layer gmt based DMC approach
as discussed with reference frames 2800, 2900, and 3000 (FIGS.
28-30), and similar to that already described above with reference
frames 2000, 2100, and 2200 that groups tiles into regions albeit
for delta motion vectors. The encoder 1800 may have a Content
Blocks Props Analyzer & Approximate Region Segmenter 1822 that
analyzes each input picture of video sequence, and segments it into
approximate region-layers. Here it's assumed each picture is
segmented into two region-layers, an approximate foreground (FG)
region--layer and an approximate background (BG) region-layer and a
merge map is provided to carry data mapping of tiles (and
partitions of tiles) into one of the two categories.
[0311] As before, the merge map information is then input to the
Global or Dominant Motion Estimator & Compensated Prediction
Pictures/Tiles/Partitions Generator 1804, the other inputs of which
include both the current picture (being processed) as well as the
past decoded reference picture (one of the pictures from DPR
Picture Buffer 1802) so that gmt DMC parameters can be computed for
each of the two approximate regions so that the DMC Reference
Picture can be generated. The DMC approximate region based
predictions using DMC Reference Pictures are fed directly to the
Prediction Modes & Ref Types Analyzer 1816 along with other
inputs such as intra prediction and morphed and synthesized
predictions. On a tile or partition basis, the Pred Modes & Ref
Types Analyzer 1816 determines the best prediction from various
choices (e.g. DMC prediction is one of the many choices available)
and outputs it to a differencer (in a portion of the circuit not
shown here) that generates prediction error for coding. Further,
the entropy coder 1820 encodes DMC data, such as dmts parameters
and the merge map, along with other data such as mvs, mop (morphing
parameters) and syp (synthesis parameters) and mode info (not
shown). The other components of FIG. 18 that are not mentioned here
are described above with other implementations.
[0312] Referring again to FIG. 19, a decoder 1900 may be modified
to be part of a sub-system or portion of NGV/HEVC Extension-2b
decoder to perform Dominant Motion Compensation with approximate
region-layers of Tiles, merged Bi-/Quad-tree Partitioned Tiles or
CTBs each using affine motion parameters. The entropy decoder
decodes DMC data such as dmts parameters, and merge map, as well
other data such as mvs, mops (morphing parameters), and syp
(synthesis parameters), and mode info (not shown). The decoded
merge map data is used by DMC predictor 1904 (Global or Dominant
Motion Compensated Prediction Pictures/Tiles/Partitions Generator),
and MC predictor 1912 (Bi-Tree Char. and Motion Compensated
Adaptive Precision (AP) Filtering Predictor). The Dominant Motion
Compensated Prediction Pictures/Tile/Partitions Generator 1904 also
uses the decoded dmts parameters on decoded picture/s from DPR
Picture Buffer 1902 to first generate DMC approximate FG and BG
regions. In case of overlap of regions, the DMC predictor 1904 also
generates reconstructed pixels in the area of overlap as an average
of pixels from the two approximate regions, and fills any holes
(average of nearest boundary pixels of two regions, or boundary
extension on borders of picture) to create a complete DMC Reference
picture for prediction. The Pred Mode Selector 1914 uses the
tiles/partitions-based mode information sent by the encoder via the
bitstream to use tiles/partitions of either DMC approx. FG or BG
regions from DMC Reference Picture, or one of the several available
morphed predictors, or synthesized predictors, or intra predictors.
The resulting prediction is then added back (this portion of
decoder is external and not shown) to decoded quantized prediction
error decoded at the decoder to reconstruct final decoded video
picture. The other components of FIG. 19 that are not mentioned
here are described above with other implementations.
[0313] Referring to FIG. 31, another variation of the gmt type of
DMC performs Dominant Motion Compensation with segmented
region-layers each using affine motion parameters for generation of
an Affine SMC reference. This variation continues process 2600 with
define object associated regions 2614 as the alternative to define
frame portions for local GMC. In this variation, a current frame
3100 is segmented into two or more region-layers. For instance, the
current frame 3100 may be segmented into a first region-layer 3104
corresponding to the foreground (FG) region-layer while the
remaining portions of the picture may be referred to as the
background (BG) region-layer 3102. Further, by one example, the FG
region-layer is enclosed in the tightest fitting bounding box or
rectangle, or any other convenient shape, and gmt DMC parameters
are calculated for either the entire bounding box or the bounding
box with masked background, with, for example, a past decoded
reference frame 3108 from a DPR picture buffer as a reference. As
an alternative, the boundary may be set at a certain distance from
the object rather than closest fit. Note other alternatives may
include having a boundary that matches or corresponds with (aligns
with) the shape of the object in the foreground, here a head and
shoulders 3106.
[0314] As with the other local global motion compensation
alternatives, the gmt (or dmt) DMC parameters 3120a, 3120b, and
3120c are applied to the vertices of the boundary, and specifically
the bounding box region in the reference frame 3108 to form a
warped bounding box region 3016 from the unwarped position 3014,
and that represents the warped FG region-layer providing a warped
head and shoulder (object) position 3118. Similarly, a set of gmt
or dmt DMC parameters 3112a, 3112b, and 3112c are computed for the
BG region-layer 3124 by using the frame rectangle with FG the
region-layer 3116 masked, and the computed gmt DMC parameters
3112a, 3112b, and 3112c are then applied to vertices of the entire
frame 3126, resulting in a warped BG region-layer 3110. Since it is
possible for the two morphed region-layers 3110 and 3116 to
overlap, the area of overlap 3128 can be reconstructed by averaging
overlapping pixels from the two regions. Further, since the two
warped region-layers 3110 and 3116 can have holes 3130 within the
frame 3108, that area is filled by averaged interpolation from
neighboring pixels of both regions. Further, as before, any
unfilled area close to the frame border is boundary extended as
before. In this method, two sets of gmt DMC trajectories (one for
FG region-layer and the other for BG region-layer), as well as
FG/BG segmentation boundary map is sent via bitstream to the
decoder. This particular variation is best used in context of the
encoder that already uses region based coding.
[0315] Referring again to FIG. 24, the encoder 2400 may be modified
to be a sub-system or portion of an advanced region based encoder
to perform Dominant Motion Compensation with segmented
region-layers each using affine motion parameters. The encoder 2400
may have a Content Analyzer and Region Segmenter 2426 that analyzes
each input picture of video sequence and segments it into
region-layers. For the purpose of explanation let's assume each
picture is segmented into region layers such as a two region-layer
structure with a foreground (FG) region-layer and a background (BG)
region-layer to create a region bndry with data that carries the
foreground boundary shape (the remaining being the background or
example). The region bndry shape information is then input to the
global or Dominant Motion Estimator & Compensated Prediction
Pictures/Regions Generator 2404, the other inputs of which include
both the current picture (being processed) as well as the past
decoded reference picture (one of the pictures from DPR Picture
Buffer 2402) so that gmt DMC parameters can be computed for each of
the two regions so that the DMC Reference Picture can be generated.
The DMC reference pictures are place in the dominant motion
compensated prediction local/picture buffer 2406, and DMC region
based predictions formed by using the DMC Reference Pictures are
fed directly to the Prediction Modes & Ref Types Analyzer 2416
(as well as other inputs such as intra prediction and morphed and
synthesized predictions). On a local basis (such as sub-regions of
a region-layer), the Prediction Modes & Ref Types Analyzer 2416
determines the best prediction from various choices (e.g. DMC
prediction is one of the many choices available) and outputs it to
a differencer (in a portion of the circuit not shown here) that
generates prediction error for coding. Further, the entropy coder
2420 encodes DMC data, such as dmts, parameters and region bndry,
along with other mop (morphing parameters) and syp (synthesis
parameters) and mode info (not shown). The other components of FIG.
24 that are not mentioned here are described above with other
implementations.
[0316] As mentioned earlier this type of region based DMC makes
best sense in context of overall region based video encoder coder
where a frame is divided into region-layers, and for ease of
processing, each region is divided into sub-regions. While it is
possible for sub-regions to be tiles or blocks, they could also be
arbitrary in shape at some precision (say 4.times.4 block
accuracy). This applies whenever sub-region processing is mentioned
herein.
[0317] Referring again to FIG. 25, the decoder 2500 may be modified
to be part of a sub-system or portion of Advanced Region-based
Decoder to perform Dominant Motion Compensation with segmented
region-layers using affine motion parameters. The decoder 2500 may
have an entropy decoder 2518 (Entropy Decoder Morphing &
Synthesis Params) that decodes DMC data such as dmts parameters,
and region bndry, as well other data, such as mvs, mops (morphing
parameters), and syp (synthesis parameters), and mode info (not
shown). The decoded region bndry data is used by the DMC predictor
2504 (Global or Dominant Motion Compensated Prediction
Pictures/Regions Generator), and MC predictor 2512 (Regions Char.
and Motion Compensated Adaptive Precision (AP) Filtering
Predictor). The Global or Dominant Motion Compensated Prediction
Pictures/Regions Generator 2504 also uses the decoded dmts
parameters on decoded picture(s) from the DPR Picture Buffer 2502
to first generate DMC FG and BG regions, and in case of overlap of
regions, generating reconstructed pixels in an area of overlap as
an average of pixels from the two regions, and filling holes
(average of nearest boundary pixels of two regions, or boundary
extension on borders of a picture) to create a complete DMC
Reference picture for prediction. The Prediction Mode Selector 2514
uses the sub-region based mode information sent by the encoder via
the bitstream to use sub-regions of either DMC FG or BG regions
from DMC Reference Picture, or sub-regions of one of the several
available morphed predictors, or synthesized predictors, or intra
predictors. The resulting prediction is then added back (this
portion of decoder is external or not shown here) to decoded
quantized prediction error decoded at the decoder to reconstruct
final decoded video picture. The components of FIG. 25 that are not
mentioned here are described above with other implementations.
[0318] Referring to FIGS. 32-33, by another implementation, the two
main DMC types: (1) delta motion vector-based DMC, and (2) local
global motion compensation DMC, are combined. There are several
ways of doing this. A simple approach would be to use block-based
delta mv type DMC on a tile basis, and also use local gmt type of
DMC on a tile basis, and based on a reduction of DMC prediction
error, choose the best DMC mode. In such an approach, for tiles
that use block-based delta mv type DMC for example, block-based
delta mvs and frame-based GMC gmt parameters would be sent via the
bitstream to the decoder, and for tiles that use local gmt type of
DMC, tile based gmt parameters would be sent in the bitstream. In
addition, a binary map indicating block delta mv DMC versus local
gmt type DMC selection on a tile basis also would be carried in the
bitstream for use by the decoder. This approach of simultaneously
using two types of DMC can be further explained with an encoder
3200 (FIG. 32) and a decoder 3300 (FIG. 33) as follows.
[0319] Referring to FIG. 32, an encoder 3200 may be part of a
sub-system or portion of a NGV/HEVC Extension-3 Encoder used to
perform Dominant Motion Compensation of blocks each using delta MV
correction as well as tiles or CTBs each using affine motion
parameters. Encoder 3200 may store decoded and filtered frames in a
DPR picture buffer 3202 for use by Global Motion Estimator &
Compensated Prediction Pictures Generator 3204, and Dominant Motion
Estimator & Compensated Prediction Picture Generator 3220, and
as well as by Other Morph Analyzer Generator & Picture/s Buffer
3208, and Synth Analyzer, Generator & Picture/s Buffer 3210.
The high level operation of the two DMC operations was discussed
earlier and will not be repeated here. Thus, components of the
other encoders described herein that are similar to components of
the encoder 3200 operate similarly. The operation of DMC components
related to the dual operation of MV based DMC and local gmt based
DMC are described below.
[0320] For encoder 3200, by one approach, each or multiple tiles of
a frame from DPR buffer 3202 is input to the Dominant Motion
Estimator & Compensated Prediction Picture Generator 3220 for
computation of an independent set of gmt DMC parameters (each of
three vertices of the tile can be provided independent motion with
a fourth one being dependent) instead of a single set of gmt GMC
parameters for the entire picture. Using individual corresponding
gmt GMC parameters, each tile of a previous reference frame (from
DPR buffer 3202) is warped to generate individual local gmt GMC
morphed tiles such that some warped tiles result in overlapped
pixels and others result in holes not covered by any of the tiles.
For areas of overlap of tiles, reconstruction is done by averaging
of common pixels, and for the areas of holes, averaging from border
pixels of neighboring warped tiles is performed. In the case where
some tiles are coded by other coding modes, and thus are missing
gmt DMC parameters or are at the boundary of the picture, simple
boundary extension is performed. The resulting filled picture is
then the DMC Reference Picture and is stored in a Dominant Motion
Compensated Prediction Local buffer 3222, and can be used for tile
based DMC motion compensation. The term dmt (or its plural dmts)
may be used to refer to local tile based gmt DMC parameters to
differentiate themselves from gmt itself.
[0321] With regard to motion vector-based DMC, the same or a
different reference frame from DPR buffer 3202 is input to Global
Motion Estimator & Compensated Prediction Picture Generator
3204 that performs global motion estimation (GME) producing global
motion parameters (represented as gmt trajectories) and generating
a GMC reference frame that is stored in Dominant Motion
Compensation Prediction Local/Picture Buffer 3206. Next, a block
motion estimation and partitions motion assembler 3212 may perform
tile based motion estimation resulting in delta motion vectors of
tiles that can be used for correction or motion, and that are used
by motion compensated predictor 3214 (referred to here as (Bi-Tree
Partitions) Char and Motion Compensated Adaptive Precision
Filtering Predictor) and that generates prediction tiles by
sub-pixel interpolation using the GMC Reference frame.
[0322] The output prediction tiles from the delta mv based DMC and
local gmt based DMC (as well other morphed blocks/tiles and
synthesized blocks/tiles) are fed to the Prediction Modes &
Reference Types Analyzer 3216 as well as inputs such as intra
predicted blocks/tiles from the intra directional prediction
analyzer and generator 3218. On a block or tile basis, the
Prediction Modes & Reference Types Analyzer 3216 determines the
best prediction block or tile from various choices. For example,
the available choices may include DMC prediction as one of the many
morphing choices available including one choice that is local gmt
DMC based and another choice that is delta motion vector DMC based.
The analyzer 3216 outputs the best prediction to a differencer (in
portion of circuit not shown here) that generates prediction error
for coding. The analyzer 3216 also outputs a map of the DMC mode
selection information (dmsi) on a portion (tile or region or other
partition) basis that was used in the processing. Further, the
entropy coder 3224 (also called Entropy Encoder Morphing and
Synthesis Parameters & MVs) encodes GMC/DMC parameters and
data, such as the gmt, .DELTA.mvs, dmts, and dmsi.
[0323] Referring to FIG. 33, a decoder 3300 may be part of a
sub-system or portion of a NGV/HEVC Extension-3 decoder use to
perform Dominant Motion Compensation of blocks each using delta MV
corrections as well as tiles or CTBs each using affine motion
parameters. The decoder 3300 may enter and store decoded and
filtered frames in DPR picture buffer 3302 for use by Global Motion
Compensated Prediction Picture Generator 3204, Dominant Motion
Compensated Prediction Picture Generator 3218, Other Morph
Generator & Pictures Buffer 3208, and Synth Generator &
Pictures Buffer 3210. The high level operation of the two processes
(motion vector-based and local gmt based DMC) was explained earlier
with other implementations, and will not be repeated here. Thus,
components in the other implementations that are similar to the
components of decoder 3300 operate similarly.
[0324] Depending on the DMC mode of a tile as carried by the dmsi
map, either Global Motion Compensated Prediction Picture Generator
3204 along with block motion compensated predictor 3212 (shown here
as (Bi-Tree Partitions) Char. and Motion Compensated Adaptive
Precision (AP) Filtering Predictor), or Dominant Motion Compensated
Prediction Pictures/Tiles Generator 3218 is deployed to generate
the appropriate DMC motion compensation. However, for many tiles or
blocks, the DMC mode may not be used since the encoder selects the
best mode which may have been one of the other morphed prediction
modes, one of the other synthesized prediction modes, or an intra
mode.
[0325] Referring now to FIG. 34, an example video coding system
3500 and video coding process 3400 in operation to implement the
dual DMC process of encoder 3200 and decoder 3300 for example, are
arranged in accordance with at least some implementations of the
present disclosure. In the illustrated implementation, process 3400
may include one or more operations, functions or actions as
illustrated by one or more of actions 3401 to 3413. By way of
non-limiting example, process 3400 will be described herein with
reference to example video coding system 3500 including encoder 100
of FIG. 1 and decoder 200 of FIG. 2, as is discussed further below
with respect to FIG. 35. In various examples, process 3400 may be
undertaken by a system including both an encoder and decoder or by
separate systems with one system employing an encoder (and
optionally a decoder) and another system employing a decoder (and
optionally an encoder). It is also noted, as discussed above, that
an encoder may include a local decode loop employing a local
decoder as a part of the encoder system.
[0326] In the illustrated implementation, video coding system 3500
may include a processing unit such as a graphics processing unit
3520 with logic circuitry 3550, the like, and/or combinations
thereof. For example, logic circuitry 3550 may include encoder
system 100 of FIG. 1, or alternatively 3200 of FIG. 32 and/or
decoder system 200 of FIG. 2 or alternatively 3300 of FIG. 33, and
may include any modules as discussed with respect to any of the
encoder systems or subsystems described herein and/or decoder
systems or subsystems described herein. Although video coding
system 3500, as shown in FIG. 34 may include one particular set of
blocks or actions associated with particular modules, these blocks
or actions may be associated with different modules than the
particular modules illustrated here. Although process 3500, as
illustrated, is directed to encoding and decoding, the concepts
and/or operations described may be applied to encoding and/or
decoding separately, and, more generally, to video coding.
[0327] Process 3400 may begin with "receive input video frames of a
video sequence" 3401, where input video frames of a video sequence
may be received via encoder 100 for example. This may include a
current frame and a past or previous frame (that is to be used as a
reference frame), and that is to be used for motion compensation to
reconstruct a predicted frame as the current frame.
[0328] The process 3400 also comprises "perform prediction and
coding partitioning, quantization, and decoding loop to decode a
reference frame" 3402. Here, video frames are coded and then
decoded in a decoder loop at an encoder in order to provide coded
data that can be accurately obtained by a process that is repeated
at the decoder.
[0329] The process 3400 also comprises "perform frame-wide global
motion compensation to form a warped GMC reference frame" 3403. In
this case, global motion trajectories are applied at the corner of
the frame to warp the frame for delta motion-vector based DMC. The
process may be implemented as explained for the other
implementations described herein.
[0330] The process 3400 also comprises "determine delta motion
vector for individual portions of the warped GMC reference" 3404.
This may include first defining the portion as blocks, tiles,
regions formed by tiles grouped together, or regions without
defining tiles first. This may also create a tile merge map or
region boundary map as needed. Motion estimation may then be
applied to determine the delta motion vector (Amy) for each
portion. This may be performed on all blocks or other portions in a
frame, or merely selected blocks or selected portions of a
frame.
[0331] The process 3400 may then continue with "determine motion
vector-based prediction for the individual portions" 3405. Thus, a
motion predictor such as those described above (such as predictor
123 or 1814) may determine a prediction portion (prediction block,
tile, sub-tile, region, sub-region, or other partition) that is
provided to a prediction selector. The prediction, if selected, may
be used for comparison to the original frame to determine whether a
difference or residual exists that warrants coding.
[0332] Alternatively, or additionally, the process 3400 may
comprise "perform local-global motion compensation on individual
portions" 3406. This process also is described above with the other
implementations, and the portion once again may be tiles, regions
formed by grouping tiles, or regions without tile groupings, and
may be object associated regions. Here, the gmt or more accurately
the dmt, are applied at the boundary, such as the corners or
vertices, of each portion where the compensation is desired. By one
form, this is applied to all of the portions in a frame, but need
not always be. A tile merge map or region boundary map may also be
formed as needed.
[0333] The process 3400 may also comprise "determine local GMC
based prediction for the individual portions" 3407. Thus, the pixel
values of the warped portions are used as predictions, and provided
to a prediction selector on a portion (block, tile, sub-tile,
region, sub-region, or other partition) basis as described
previously.
[0334] The process 3400 also comprises "select best prediction for
coding of individual portions" 3408. Particularly, the prediction
selector compares the different predictions to the original frame
and selects the best fit, or may use other criteria. If a
difference exists, the difference or residual is placed in the
bitstream for transmission to the decoder.
[0335] The process 3400 also comprises "transmit motion data and
dominant motion compensation parameters to decoder" 3409. The
motion data may include residuals and motion vectors, and the
dominant motion compensation parameters may include the indicator
or map of the selected prediction (dmsi), any tile merge map and/or
region bndry map, the dmt and/or gmt trajectories, as well as the
delta motion vectors .DELTA.mvs, and these may be provided on a
block, tile, region or other portion basis as needed. Where the
same trajectories or motion vectors are applied to all or multiple
portions in a region, the values may only need to be sent once with
an explanatory map.
[0336] The process 3400 also comprises "receive and decode
bitstream" 3410. This may include parsing the bitstream into
dominant motion compensation data, motion or other residual data,
and image or frame data, and then entropy decoding the data.
[0337] The process 3400 also comprises "perform inverse decoding
operations to obtain dominant motion compensation parameters with
dominant or global motion compensation trajectories, delta motion
vectors, prediction selection map, merge map, and/or boundary map"
3411. This will finally reconstruct that values for each type of
data.
[0338] The process 3400 also comprises "perform dominant (local
global) or motion vector based global motion compensation to obtain
DMC reference frames or DMC portions" 3412. Thus, the DMC reference
frames may be reconstructed, and the same frame or frame portion
may be provided with alternative DMC reference frames or portions.
This may include applying local global motion trajectories to a
boundary to warp individual portions, and alternatively, applying
delta motion vectors to portions on a warped reference frame, in
order to obtain the pixel values at the resulting prediction
portions.
[0339] The process 3400 also comprises "provide the predictions to
a predictions selector" 3413 where the best prediction for a frame
portion is selected and used to form the final frame for display,
storage, further encoding, and so forth.
[0340] Various components of the systems described herein may be
implemented in software, firmware, and/or hardware and/or any
combination thereof. For example, various components of system 300
may be provided, at least in part, by hardware of a computing
System-on-a-Chip (SoC) such as may be found in a computing system
such as, for example, a smart phone. Those skilled in the art may
recognize that systems described herein may include additional
components that have not been depicted in the corresponding
figures. For example, the systems discussed herein may include
additional components such as bit stream multiplexer or
de-multiplexer modules and the like that have not been depicted in
the interest of clarity.
[0341] While implementation of the example processes herein may
include the undertaking of all operations shown in the order
illustrated, the present disclosure is not limited in this regard
and, in various examples, implementation of the example processes
herein may include the undertaking of only a subset of the
operations shown and/or in a different order than illustrated.
[0342] Some additional and/or alternative details related to
process 1300, 2600, and 3400, and other processes discussed herein
may be illustrated in one or more examples of implementations
discussed herein and, in particular, with respect to FIG. 35
below.
[0343] FIG. 35 is an illustrative diagram of example video coding
system 3500, arranged in accordance with at least some
implementations of the present disclosure. In the illustrated
implementation, video coding system 3500 may include imaging
device(s) 3501, video encoder 100, video decoder 200 (and/or a
video coder implemented via logic circuitry 3550 of processing
unit(s) 3520), an antenna 3502, one or more processor(s) 3503, one
or more memory store(s) 3504, and/or a display device 3505.
[0344] As illustrated, imaging device(s) 3501, antenna 3502,
processing unit(s) 3520, logic circuitry 3550, video encoder 100,
video decoder 200, processor(s) 3503, memory store(s) 3504, and/or
display device 3505 may be capable of communication with one
another. As discussed, although illustrated with both video encoder
100 and video decoder 200, video coding system 3500 may include
only video encoder 100 or only video decoder 200 in various
examples.
[0345] As shown, in some examples, video coding system 3500 may
include antenna 3502. Antenna 3502 may be configured to transmit or
receive an encoded bitstream of video data, for example. Further,
in some examples, video coding system 3500 may include display
device 3505. Display device 3505 may be configured to present video
data. As shown, in some examples, logic circuitry 3550 may be
implemented via processing unit(s) 3520. Processing unit(s) 3520
may include application-specific integrated circuit (ASIC) logic,
graphics processor(s), general purpose processor(s), or the like.
Video coding system 3500 also may include optional processor(s)
3503, which may similarly include application-specific integrated
circuit (ASIC) logic, graphics processor(s), general purpose
processor(s), or the like. In some examples, logic circuitry 3550
may be implemented via hardware, video coding dedicated hardware,
or the like, and processor(s) 3503 may implemented general purpose
software, operating systems, or the like. In addition, memory
store(s) 3504 may be any type of memory such as volatile memory
(e.g., Static Random Access Memory (SRAM), Dynamic Random Access
Memory (DRAM), etc.) or non-volatile memory (e.g., flash memory,
etc.), and so forth. In a non-limiting example, memory store(s)
3504 may be implemented by cache memory. In some examples, logic
circuitry 3550 may access memory store(s) 3504 (for implementation
of an image buffer for example). In other examples, logic circuitry
3550 and/or processing unit(s) 3520 may include memory stores
(e.g., cache or the like) for the implementation of an image buffer
or the like.
[0346] In some examples, video encoder 100 implemented via logic
circuitry may include an image buffer (e.g., via either processing
unit(s) 3520 or memory store(s) 3504)) and a graphics processing
unit (e.g., via processing unit(s) 3520). The graphics processing
unit may be communicatively coupled to the image buffer. The
graphics processing unit may include video encoder 100 as
implemented via logic circuitry 3550 to embody the various modules
as discussed with respect to FIG. 1 and/or any other encoder system
or subsystem described herein. For example, the graphics processing
unit may include coding partitions generator logic circuitry,
adaptive transform logic circuitry, content pre-analyzer, encode
controller logic circuitry, adaptive entropy encoder logic
circuitry, and so on. The logic circuitry may be configured to
perform the various operations as discussed herein.
[0347] Video decoder 200 may be implemented in a similar manner as
implemented via logic circuitry 3550 to embody the various modules
as discussed with respect to decoder 200 of FIG. 2 and/or any other
decoder system or subsystem described herein.
[0348] In some examples, antenna 3502 of video coding system 3500
may be configured to receive an encoded bitstream of video data. As
discussed, the encoded bitstream may include data associated with
the coding partition (e.g., transform coefficients or quantized
transform coefficients, optional indicators (as discussed), and/or
data defining the coding partition (e.g., data associated with
defining bi-tree partitions or k-d tree partitions using a
symbol-run coding or codebook technique or the like)). Video coding
system 3500 may also include video decoder 200 coupled to antenna
3502 and configured to decode the encoded bitstream.
[0349] In some implementations, the decoder system may include a
video decoder configured to decode an encoded bitstream. In some
examples, the video decoder may be further configured to receive
the bitstream.
[0350] In some embodiments, features described herein may be
undertaken in response to instructions provided by one or more
computer program products. Such program products may include signal
bearing media providing instructions that, when executed by, for
example, a processor, may provide the functionality described
herein. The computer program products may be provided in any form
of one or more machine-readable media. Thus, for example, a
processor including one or more processor core(s) may undertake one
or more features described herein in response to program code
and/or instructions or instruction sets conveyed to the processor
by one or more machine-readable media. In general, a
machine-readable medium may convey software in the form of program
code and/or instructions or instruction sets that may cause any of
the devices and/or systems described herein to implement at least
portions of the features described herein.
[0351] FIG. 36 is an illustrative diagram of an example system
3600, arranged in accordance with at least some implementations of
the present disclosure. In various implementations, system 3600 may
be a media system although system 3600 is not limited to this
context. For example, system 3600 may be incorporated into a
personal computer (PC), laptop computer, ultra-laptop computer,
tablet, touch pad, portable computer, handheld computer, palmtop
computer, personal digital assistant (PDA), cellular telephone,
combination cellular telephone/PDA, television, smart device (e.g.,
smart phone, smart tablet or smart television), mobile internet
device (MID), messaging device, data communication device, cameras
(e.g. point-and-shoot cameras, super-zoom cameras, digital
single-lens reflex (DSLR) cameras), and so forth.
[0352] In various implementations, system 3600 includes a platform
3602 coupled to a display 3620. Platform 3602 may receive content
from a content device such as content services device(s) 3630 or
content delivery device(s) 3640 or other similar content sources. A
navigation controller 3650 including one or more navigation
features may be used to interact with, for example, platform 3602
and/or display 3620. Each of these components is described in
greater detail below.
[0353] In various implementations, platform 3602 may include any
combination of a chipset 3605, processor 3610, memory 3612, antenna
3613, storage 3614, graphics subsystem 3615, applications 3616
and/or radio 3618. Chipset 3605 may provide intercommunication
among processor 3610, memory 3612, storage 3614, graphics subsystem
3615, applications 3616 and/or radio 3618. For example, chipset
3605 may include a storage adapter (not depicted) capable of
providing intercommunication with storage 3614.
[0354] Processor 3610 may be implemented as a Complex Instruction
Set Computer (CISC) or Reduced Instruction Set Computer (RISC)
processors, x86 instruction set compatible processors, multi-core,
or any other microprocessor or central processing unit (CPU). In
various implementations, processor 3610 may be dual-core
processor(s), dual-core mobile processor(s), and so forth.
[0355] Memory 3612 may be implemented as a volatile memory device
such as, but not limited to, a Random Access Memory (RAM), Dynamic
Random Access Memory (DRAM), or Static RAM (SRAM).
[0356] Storage 3614 may be implemented as a non-volatile storage
device such as, but not limited to, a magnetic disk drive, optical
disk drive, tape drive, an internal storage device, an attached
storage device, flash memory, battery backed-up SDRAM (synchronous
DRAM), and/or a network accessible storage device. In various
implementations, storage 3614 may include technology to increase
the storage performance enhanced protection for valuable digital
media when multiple hard drives are included, for example.
[0357] Graphics subsystem 3615 may perform processing of images
such as still or video for display. Graphics subsystem 3615 may be
a graphics processing unit (GPU) or a visual processing unit (VPU),
for example. An analog or digital interface may be used to
communicatively couple graphics subsystem 3615 and display 3620.
For example, the interface may be any of a High-Definition
Multimedia Interface, DisplayPort, wireless HDMI, and/or wireless
HD compliant techniques. Graphics subsystem 3615 may be integrated
into processor 3610 or chipset 3605. In some implementations,
graphics subsystem 3615 may be a stand-alone device communicatively
coupled to chipset 3605.
[0358] The graphics and/or video processing techniques described
herein may be implemented in various hardware architectures. For
example, graphics and/or video functionality may be integrated
within a chipset. Alternatively, a discrete graphics and/or video
processor may be used. As still another implementation, the
graphics and/or video functions may be provided by a general
purpose processor, including a multi-core processor. In further
embodiments, the functions may be implemented in a consumer
electronics device.
[0359] Radio 3618 may include one or more radios capable of
transmitting and receiving signals using various suitable wireless
communications techniques. Such techniques may involve
communications across one or more wireless networks. Example
wireless networks include (but are not limited to) wireless local
area networks (WLANs), wireless personal area networks (WPANs),
wireless metropolitan area network (WMANs), cellular networks, and
satellite networks. In communicating across such networks, radio
3618 may operate in accordance with one or more applicable
standards in any version.
[0360] In various implementations, display 3620 may include any
television type monitor or display. Display 3620 may include, for
example, a computer display screen, touch screen display, video
monitor, television-like device, and/or a television. Display 3620
may be digital and/or analog. In various implementations, display
3620 may be a holographic display. Also, display 3620 may be a
transparent surface that may receive a visual projection. Such
projections may convey various forms of information, images, and/or
objects. For example, such projections may be a visual overlay for
a mobile augmented reality (MAR) application. Under the control of
one or more software applications 3616, platform 3602 may display
user interface 3622 on display 3620.
[0361] In various implementations, content services device(s) 3630
may be hosted by any national, international and/or independent
service and thus accessible to platform 3602 via the Internet, for
example. Content services device(s) 3630 may be coupled to platform
3602 and/or to display 3620. Platform 3602 and/or content services
device(s) 3630 may be coupled to a network 3660 to communicate
(e.g., send and/or receive) media information to and from network
3660. Content delivery device(s) 3640 also may be coupled to
platform 3602 and/or to display 3620.
[0362] In various implementations, content services device(s) 3630
may include a cable television box, personal computer, network,
telephone, Internet enabled devices or appliance capable of
delivering digital information and/or content, and any other
similar device capable of unidirectionally or bidirectionally
communicating content between content providers and platform 3602
and/display 3620, via network 3660 or directly. It will be
appreciated that the content may be communicated unidirectionally
and/or bidirectionally to and from any one of the components in
system 3600 and a content provider via network 3660. Examples of
content may include any media information including, for example,
video, music, medical and gaming information, and so forth.
[0363] Content services device(s) 3630 may receive content such as
cable television programming including media information, digital
information, and/or other content. Examples of content providers
may include any cable or satellite television or radio or Internet
content providers. The provided examples are not meant to limit
implementations in accordance with the present disclosure in any
way.
[0364] In various implementations, platform 3602 may receive
control signals from navigation controller 3650 having one or more
navigation features. The navigation features of controller 3650 may
be used to interact with user interface 3622, for example. In
various embodiments, navigation controller 3650 may be a pointing
device that may be a computer hardware component (specifically, a
human interface device) that allows a user to input spatial (e.g.,
continuous and multi-dimensional) data into a computer. Many
systems such as graphical user interfaces (GUI), and televisions
and monitors allow the user to control and provide data to the
computer or television using physical gestures.
[0365] Movements of the navigation features of controller 3650 may
be replicated on a display (e.g., display 3620) by movements of a
pointer, cursor, focus ring, or other visual indicators displayed
on the display. For example, under the control of software
applications 3616, the navigation features located on navigation
controller 3650 may be mapped to virtual navigation features
displayed on user interface 3622. In various embodiments,
controller 3650 may not be a separate component but may be
integrated into platform 3602 and/or display 3620. The present
disclosure, however, is not limited to the elements or in the
context shown or described herein.
[0366] In various implementations, drivers (not shown) may include
technology to enable users to instantly turn on and off platform
3602 like a television with the touch of a button after initial
boot-up, when enabled, for example. Program logic may allow
platform 3602 to stream content to media adaptors or other content
services device(s) 3630 or content delivery device(s) 3640 even
when the platform is turned "off." In addition, chipset 3605 may
include hardware and/or software support for 5.1 surround sound
audio and/or high definition 7.1 surround sound audio, for example.
Drivers may include a graphics driver for integrated graphics
platforms. In various embodiments, the graphics driver may comprise
a peripheral component interconnect (PCI) Express graphics
card.
[0367] In various implementations, any one or more of the
components shown in system 3600 may be integrated. For example,
platform 3602 and content services device(s) 3630 may be
integrated, or platform 3602 and content delivery device(s) 3640
may be integrated, or platform 3602, content services device(s)
3630, and content delivery device(s) 3640 may be integrated, for
example. In various embodiments, platform 3602 and display 3620 may
be an integrated unit. Display 3620 and content service device(s)
3630 may be integrated, or display 3620 and content delivery
device(s) 3640 may be integrated, for example. These examples are
not meant to limit the present disclosure.
[0368] In various embodiments, system 3600 may be implemented as a
wireless system, a wired system, or a combination of both. When
implemented as a wireless system, system 3600 may include
components and interfaces suitable for communicating over a
wireless shared media, such as one or more antennas, transmitters,
receivers, transceivers, amplifiers, filters, control logic, and so
forth. An example of wireless shared media may include portions of
a wireless spectrum, such as the RF spectrum and so forth. When
implemented as a wired system, system 3600 may include components
and interfaces suitable for communicating over wired communications
media, such as input/output (I/O) adapters, physical connectors to
connect the I/O adapter with a corresponding wired communications
medium, a network interface card (NIC), disc controller, video
controller, audio controller, and the like. Examples of wired
communications media may include a wire, cable, metal leads,
printed circuit board (PCB), backplane, switch fabric,
semiconductor material, twisted-pair wire, co-axial cable, fiber
optics, and so forth.
[0369] Platform 3602 may establish one or more logical or physical
channels to communicate information. The information may include
media information and control information. Media information may
refer to any data representing content meant for a user. Examples
of content may include, for example, data from a voice
conversation, videoconference, streaming video, electronic mail
("email") message, voice mail message, alphanumeric symbols,
graphics, image, video, text and so forth. Data from a voice
conversation may be, for example, speech information, silence
periods, background noise, comfort noise, tones and so forth.
Control information may refer to any data representing commands,
instructions or control words meant for an automated system. For
example, control information may be used to route media information
through a system, or instruct a node to process the media
information in a predetermined manner. The implementations,
however, are not limited to the elements or in the context shown or
described in FIG. 36.
[0370] As described above, system 3600 may be embodied in varying
physical styles or form factors. FIG. 37 illustrates
implementations of a small form factor device 3700 in which system
3700 may be embodied. In various embodiments, for example, device
3700 may be implemented as a mobile computing device a having
wireless capabilities. A mobile computing device may refer to any
device having a processing system and a mobile power source or
supply, such as one or more batteries, for example.
[0371] As described above, examples of a mobile computing device
may include a personal computer (PC), laptop computer, ultra-laptop
computer, tablet, touch pad, portable computer, handheld computer,
palmtop computer, personal digital assistant (PDA), cellular
telephone, combination cellular telephone/PDA, television, smart
device (e.g., smart phone, smart tablet or smart television),
mobile internet device (MID), messaging device, data communication
device, cameras (e.g. point-and-shoot cameras, super-zoom cameras,
digital single-lens reflex (DSLR) cameras), and so forth.
[0372] Examples of a mobile computing device also may include
computers that are arranged to be worn by a person, such as a wrist
computer, finger computer, ring computer, eyeglass computer,
belt-clip computer, arm-band computer, shoe computers, clothing
computers, and other wearable computers. In various embodiments,
for example, a mobile computing device may be implemented as a
smart phone capable of executing computer applications, as well as
voice communications and/or data communications. Although some
embodiments may be described with a mobile computing device
implemented as a smart phone by way of example, it may be
appreciated that other embodiments may be implemented using other
wireless mobile computing devices as well. The embodiments are not
limited in this context.
[0373] As shown in FIG. 37, device 3700 may include a housing 3702,
a display 3704 which may include a user interface 3710, an
input/output (I/O) device 3706, and an antenna 3708. Device 3700
also may include navigation features 3712. Display 3704 may include
any suitable display unit for displaying information appropriate
for a mobile computing device. I/O device 3706 may include any
suitable I/O device for entering information into a mobile
computing device. Examples for I/O device 3706 may include an
alphanumeric keyboard, a numeric keypad, a touch pad, input keys,
buttons, switches, rocker switches, microphones, speakers, voice
recognition device and software, and so forth. Information also may
be entered into device 3700 by way of microphone (not shown). Such
information may be digitized by a voice recognition device (not
shown). The embodiments are not limited in this context.
[0374] While implementation of the example processes herein may
include the undertaking of all operations shown in the order
illustrated, the present disclosure is not limited in this regard
and, in various examples, implementation of the example processes
herein may include the undertaking of only a subset of the
operations shown and/or in a different order than illustrated.
[0375] In addition, any one or more of the operations discussed
herein may be undertaken in response to instructions provided by
one or more computer program products. Such program products may
include signal bearing media providing instructions that, when
executed by, for example, a processor, may provide the
functionality described herein. The computer program products may
be provided in any form of one or more machine-readable media.
Thus, for example, a processor including one or more processor
core(s) may undertake one or more of the operations of the example
processes herein in response to program code and/or instructions or
instruction sets conveyed to the processor by one or more
machine-readable media. In general, a machine-readable medium may
convey software in the form of program code and/or instructions or
instruction sets that may cause any of the devices and/or systems
described herein to implement at least portions of the video
systems as discussed herein.
[0376] As used in any implementation described herein, the term
"module" refers to any combination of software logic, firmware
logic and/or hardware logic configured to provide the functionality
described herein. The software may be embodied as a software
package, code and/or instruction set or instructions, and
"hardware", as used in any implementation described herein, may
include, for example, singly or in any combination, hardwired
circuitry, programmable circuitry, state machine circuitry, and/or
firmware that stores instructions executed by programmable
circuitry. The modules may, collectively or individually, be
embodied as circuitry that forms part of a larger system, for
example, an integrated circuit (IC), system on-chip (SoC), and so
forth. For example, a module may be embodied in logic circuitry for
the implementation via software, firmware, or hardware of the
coding systems discussed herein.
[0377] Various embodiments may be implemented using hardware
elements, software elements, or a combination of both. Examples of
hardware elements may include processors, microprocessors,
circuits, circuit elements (e.g., transistors, resistors,
capacitors, inductors, and so forth), integrated circuits,
application specific integrated circuits (ASIC), programmable logic
devices (PLD), digital signal processors (DSP), field programmable
gate array (FPGA), logic gates, registers, semiconductor device,
chips, microchips, chip sets, and so forth. Examples of software
may include software components, programs, applications, computer
programs, application programs, system programs, machine programs,
operating system software, middleware, firmware, software modules,
routines, subroutines, functions, methods, procedures, software
interfaces, application program interfaces (API), instruction sets,
computing code, computer code, code segments, computer code
segments, words, values, symbols, or any combination thereof.
Determining whether an embodiment is implemented using hardware
elements and/or software elements may vary in accordance with any
number of factors, such as desired computational rate, power
levels, heat tolerances, processing cycle budget, input data rates,
output data rates, memory resources, data bus speeds and other
design or performance constraints.
[0378] One or more aspects of at least one embodiment may be
implemented by representative instructions stored on a
machine-readable medium which represents various logic within the
processor, which when read by a machine causes the machine to
fabricate logic to perform the techniques described herein. Such
representations, known as "IP cores" may be stored on a tangible,
machine readable medium and supplied to various customers or
manufacturing facilities to load into the fabrication machines that
actually make the logic or processor.
[0379] While certain features set forth herein have been described
with reference to various implementations, this description is not
intended to be construed in a limiting sense. Hence, various
modifications of the implementations described herein, as well as
other implementations, which are apparent to persons skilled in the
art to which the present disclosure pertains are deemed to lie
within the spirit and scope of the present disclosure.
[0380] The following examples pertain to further
implementations.
[0381] In one example, a computer-implemented method for video
coding comprising: obtaining frames of pixel data and having a
current frame and a decoded reference frame to use as a motion
compensation reference frame for the current frame; forming a
warped global compensated reference frame by displacing at least
one portion of the decoded reference frame by using global motion
trajectories; determining a motion vector indicating the motion of
the at least one portion and motion from a position based on the
warped global compensated reference frame to a position at the
current frame; and forming a prediction portion based, at least in
part, on the motion vectors and corresponding to a portion on the
current frame.
[0382] By another example, the method also may be comprising
wherein the at least one portion is at least one of: (1) a block of
pixels used as a unit to divide the current frame and the reference
frame into a plurality of the blocks; (2) at least one tile of
pixels, each tile being at least 64.times.64 pixels, and used as a
unit to divide the current frame and the reference frame into a
plurality of the tiles; the method (2) comprising at least one of:
(a) grouping tiles together based on common association with an
object in the frame to form the at least one portion; and forming a
single motion vector for each group of tiles, (b) grouping the
tiles based on a merge map transmittable from an encoder to a
decoder; or (3) a region of pixels shaped and sized depending on an
object associated with the region, wherein a boundary of the region
is at least one of: a shape that resembles the shape of the object
associated with the region, and a rectangle placed around the
object associated with the region; wherein the region is associated
with at least one of: a background of the frame, a foreground of
the frame, and a moving object in the frame; the method comprising
defining the region based on a boundary map transmittable from an
encoder to a decoder; wherein forming a warped global compensated
reference frame comprises using the global motion trajectories at
the outer corners of the frame; wherein forming a warped global
compensated reference frame comprises using an affine or
perspective global motion compensation method; wherein the at least
one portion comprises a frame divided into a background and a
foreground, and wherein determining motion vectors comprises
providing the background and foreground each with one motion
vector; the method comprising performing dominant motion
compensation comprising locally applied global motion compensation
so that at least one other set of global motion trajectories are
used at corners of at least one region on the frame that is less
than the entire frame to form a displaced region; and using the
pixel values of the displaced region to form a prediction region
that corresponds to a region on the current frame; the method
comprising at least one of: (a) performing local global motion
compensation on multiple regions of the frame by using a different
set of global motion trajectories on each region; (b) wherein each
region is a tile, and dividing the frame into the tiles, and
wherein each tile has a set of global motion trajectories; (c)
providing the option to perform local global motion compensation on
a fraction of a tile in addition to entire tiles; wherein each
region is shaped and sized depending on an object associated with
the region; wherein the object is one of: a foreground, a
background, and an object moving in the frame; the method
comprising providing the option on the at least one region on a
region-by-region basis to select a prediction formed by: (1) a
motion vector to form a prediction for the at least one region and
using global motion compensation applied to the entire frame, or
(2) applying local global motion compensation with a set of global
motion trajectories at the region and using displaced pixel values
of the region to form a prediction; the method comprising applying
local global motion compensation with a set of global motion
trajectories applied at a region of the reference frame that has an
area less than the entire reference frame, and using motion vectors
to form a prediction for the at least one region; the method
comprising providing the option to select a mode for a frame among:
(1) use the dominant motion compensated reference frame prediction,
(2) use blended prediction of multiple dominant motion compensated
reference frames, (3) use dominant motion compensated reference
with differential translational motion vector for prediction, and
(4) use dominant motion compensated reference with differential
translational motion vector for prediction, blended with another
reference frame; the method comprising at least one of (a) to
(c):
[0383] (a) performing motion compensated morphed reference
prediction using bilinear interpolation and motion compensation
(MC) filter to form a morphed reference frame MRef, tPred.sub.h as
the intermediate horizontal interpolation, and pred.sub.ji as the
final motion compensated morphed reference prediction:
MRef [ i ' ] [ j ' ] = ( ( 8 - p x ) ( 8 - p y ) Ref [ y 0 ] [ x 0
] + p x ( 8 - p y ) Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 - p x ) Ref [
y 0 + 1 ] [ x 0 ] + p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ] + 31 ) 6
##EQU00008## tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p j ] [
k ] MRef [ i ' + m ] [ j ' + n + k - N t 2 + 1 ] ##EQU00008.2##
where m = [ - N t / 2 + 1 , H b + N t / 2 - 1 ] , n = [ 0 , W b - 1
] , Pred ji [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p i ] [ k ] tPred
h [ m + k - N t 2 + 1 ] [ n ] ##EQU00008.3## where m = [ 0 , H b -
1 ] , n = [ 0 , W b - 1 ] , ##EQU00008.4##
[0384] and where:
[0385] (iMVx, iMVy) is the transmitted motion vector in Sub-Pel
Unit (f.sub.s) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.t] of norm T, f.sub.s is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.t is the number MC Filter Taps, and
i'=i+(iMVy/f.sub.s).
j'=j+(iMVx/f.sub.s)
p.sub.i=iMVy & (f.sub.s-1)
p.sub.j=iMvx & (f.sub.s-1)
[0386] (j', i') is integer motion adjusted current pixel location
in Morphed Reference Image. p.sub.j, p.sub.i are the 1/8.sup.th pel
phases in the Morphed Reference Image;
x=(A*j+B*i'+C<<r)>>r
y=(D*j'+E*i'+F<<s)>>s
where (x, y) is the reference pixel coordinate in 1/8.sup.th Pel
accuracy for location (j', i')
p.sub.y=y & 0.times.7
p.sub.x=x & 0.times.7
y.sub.0=y>>3
x.sub.0=x>>3
[0387] where (x.sub.0, y.sub.0) is the integer pel location in Ref
Image. p.sub.x, p.sub.y is the 1/8.sup.th pel phase;
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub.x*(8-p-
.sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][x.sub.0-
]+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
[0388] where m=[-N.sub.t/2-1, H.sub.b+N.sub.t/2], where n=[0,
W.sub.b-1], where k=[-N.sub.t/2-1, N.sub.t/2],
[0388]
Pred.sub.ji[m][n]=SUM.sub.k(h[p.sub.j][k]*tPred.sub.h[m+k][n])/T,
[0389] where m=[0, H.sub.b-1], where n=[0, W.sub.b-1], where
k=[-N.sub.t/2-1, +N.sub.t/2];
[0390] (b) performing morphed reference prediction using block
motion compensation (MC) filtering to form a morphed reference
frame Mref, and Predh as the intermediate horizontal
interpolation:
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] ##EQU00009## m = [ - N t / 2 + 1
, H s + N t / 2 - 1 ] , n = [ 0 , W s - 1 ] , MRef [ i + m ] [ j +
n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [ m + k - N t 2 +
1 ] [ n ] ##EQU00009.2## where m = [ 0 , H s - 1 ] , n = [ 0 , W s
- 1 ] , ##EQU00009.3##
[0391] and where A, B, C, D, E, & F are affine parameters
calculated from the three Motion trajectories transmitted; using
separable MC Filters with filter coefficients h[fs][N.sub.t] of
norm T; fs is the Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel,
8=Eighth Pel), and where N.sub.t is the number MC Filter Taps
x=(A*j+B*i+C<<r)>>r
y=(D*j+E*i+F<<s)>>s
[0392] (j, i) is every (W.sub.s x sub-block location in current
image, x and y are reference pixel coordinates in 1/8th Pel
accuracy;
p.sub.y=y & 0.times.7
p.sub.x=x & 0.times.7
y.sub.0=y>>3
x.sub.0=x>>3
[0393] (x.sub.0, y.sub.0) is the integer pel location in the
reference frame (Ref Image); p.sub.x, p.sub.y is the 1/8th pel
phase.
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
, [0394] m=[-N.sub.t/2-1, H.sub.S+N.sub.t/2], n=[0, W.sub.s-1],
k=[-N.sub.t/2-1, +N.sub.t/2]; and
[0394]
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
[0395] m=[0, H.sub.s-1], n=[0, W.sub.s-1], k=[-N.sub.t/2-1,
+N.sub.t/2]; and
[0396] (c) performing motion compensated morphed reference
prediction using single loop motion compensation (MC) filtering to
form a morphed reference (Mref) and predictions tPred.sub.h as the
intermediate horizontal interpolation, and Pred.sub.ji as the final
motion compensated morphed reference prediction for block of size
W.sub.b.times.H.sub.b at (j, i):
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] ##EQU00010## for : m = [ - N t /
2 + 1 , H s + N t / 2 - 1 ] , n = [ 0 , W s - 1 ] , Pred ji [ uH s
+ m ] [ vW s + n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [
m + k - N t 2 + 1 ] [ n ] ##EQU00010.2## for : m = [ 0 , H s - 1 ]
, n = [ 0 , W s - 1 ] , u = [ 0 , H b / H s - 1 ] , v = [ 0 , W b /
W s - 1 ] , ##EQU00010.3##
[0397] and where:
[0398] (iMVx, iMVy) is the transmitted Motion Vector in Sub-Pel
Units (fs) for a block at (j, i) of size (W.sub.b.times.H.sub.b).
A, B, C, D, E, & F are affine parameters calculated from the
three Motion trajectories transmitted; using separable MC Filters
with filter coefficients h[fs][N.sub.t] of norm T, fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and
N.sub.t is the number MC Filter Taps;
i'=(i+u*H.sub.s)*fs+iMVx
j'=(j+v*W.sub.s)*fs+iMVy
[0399] where (j, i) is the current block pixel location, (u, v) is
the index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block. Below, i', j' is motion adjusted
current pixel location in fs sub-pel accuracy,
x=((A*j'+B*i'+(C*f.sub.s)<<r)>>(r+3)
y=((D*j'+E*i'+(F*f.sub.s)<<s)>>(s+3)
[0400] where x & y are reference pixel coordinates in fs
sub-pel accuracy
p.sub.y=y & (f.sub.s-1)
p.sub.x=x & (f.sub.s-1)
y.sub.0=y/fs
x.sub.0=x/fs
[0401] where y.sub.0, x.sub.0 is the integer pel location in Ref
Image, p.sub.x, p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
,
m=[-N.sub.t/2-1,H.sub.s+N.sub.t/2],
n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2]
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.-
h[m+k][n])/T,
m=[0,H.sub.s-1],
n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
v=[0,W.sub.b/W.sub.s-1],
u=[0,H.sub.b/H.sub.s-1].
[0402] By another approach, a computer-implemented method for video
coding, comprising: obtaining frames of pixel data and having a
current frame and a decoded reference frame to use as a motion
compensation reference frame for the current frame; dividing the
reference frame into a plurality of portions that are less than the
area of the entire frame; performing dominant motion compensation
comprising applying local global motion compensation on at least
one of the portions by displacing the at least one portion of the
decoded reference frame by using global motion trajectories at a
boundary of the portion; and forming a prediction portion that
corresponds to a portion on the current frame, and by using the
pixel values of the displaced portion.
[0403] By yet another approach, the method may also be comprising
performing local global motion compensation on a plurality of the
portions by using a different set of global motion trajectories on
each portion of the plurality of portions; wherein each portion is
a tile, the method comprising dividing the frame into the tiles,
and wherein each tile has a set of global motion trajectories; the
method comprising providing the option to perform local global
motion compensation on a fraction of a tile in addition to entire
tiles; wherein local global motion compensation trajectories are
provided to half-tiles or quarter-tiles; the method comprising at
least one of: (a) grouping a plurality of the tiles into a region,
and applying the same global motion trajectories on the tiles
within the same region, and different sets of global motion
trajectories depending on the region, and (b) grouping a plurality
of the portions into a region, and applying the same global motion
trajectories on the portions within the same region, and different
sets of global motion trajectories depending on the region; wherein
each portion is shaped and sized depending on an object associated
with the portion; wherein the object is one of: a foreground, a
background, and an object moving in the frame; wherein the portion
is a rectangle placed about the object; the method comprising
forming a portion of the background of the reference frame, and a
portion of the foreground of the reference frame each with a
different set of local global motion trajectories for each
portion.
[0404] By a further example, a coder comprising: an image buffer;
and a graphics processing unit configured to: obtain frames of
pixel data and having a current frame and a decoded reference frame
to use as a motion compensation reference frame for the current
frame; divide the reference frame into a plurality of portions that
are less than the area of the entire frame; perform dominant motion
compensation comprising applying local global motion compensation
on at least one of the portions by displacing the at least one
portion of the decoded reference frame by using global motion
trajectories at a boundary of the portion; and form a prediction
portion that corresponds to a portion on the current frame and by
using the pixel values of the displaced portion.
[0405] By a further example, the coder may have the graphics
processing unit configured to: perform local global motion
compensation on a plurality of the portions by using a different
set of global motion trajectories on each portion of the plurality
of portions; wherein each portion is a tile, the graphics
processing unit configured to divide the frame into the tiles, and
wherein each tile has a set of global motion trajectories; the
graphics processing unit configured to provide the option to
perform local global motion compensation on a fraction of a tile in
addition to entire tiles; wherein local global motion compensation
trajectories are provided to half-tiles or quarter-tiles; the
graphics processing unit configured to at least one of: (a) group a
plurality of the tiles into a region, and apply the same global
motion trajectories on the tiles within the same region, and
different sets of global motion trajectories depending on the
region; and (b) group a plurality of the portions into a region,
and apply the same global motion trajectories on the portions
within the same region, and different sets of global motion
trajectories depending on the region; wherein each portion is
shaped and sized depending on an object associated with the
portion; wherein the object is one of: a foreground, a background,
and an object moving in the frame; wherein the portion is a
rectangle placed about the object; the graphics processing unit
configured to form a portion of the background of the reference
frame, and a portion of the foreground of the reference frame each
with a different set of local global motion trajectories for each
portion.
[0406] By yet another approach, a coder may comprise: an image
buffer; and a graphics processing unit configured to: obtain frames
of pixel data and having a current frame and a decoded reference
frame to use as a motion compensation reference frame for the
current frame; form a warped global compensated reference frame by
displacing at least one portion of the decoded reference frame by
using global motion trajectories; determine a motion vector
indicating the motion of the at least one portion and motion from a
position based on the warped global compensated reference frame to
a position at the current frame; and form a prediction portion
based, at least in part, on the motion vectors and corresponding to
a portion on the current frame.
[0407] By yet a further approach, the coder may comprise wherein
the at least one portion is at least one of: (1) a block of pixels
used as a unit to divide the current frame and the reference frame
into a plurality of the blocks; (2) at least one tile of pixels,
each tile being at least 64.times.64 pixels, and used as a unit to
divide the current frame and the reference frame into a plurality
of the tiles; the graphics processing unit of (2) being configured
to at least one of: (a) group tiles together based on common
association with an object in the frame to form the at least one
portion; and form a single motion vector for each group of tiles,
(b) group the tiles based on a merge map transmittable from an
encoder to a decoder; (3) a region of pixels shaped and sized
depending on an object associated with the region, wherein a
boundary of the region of (3) is at least one of: a shape that
resembles the shape of the object associated with the region, and a
rectangle placed around the object associated with the region;
wherein the region is associated with at least one of: a background
of the frame, a foreground of the frame, and a moving object in the
frame; the graphics processing unit being configured to define the
region based on a boundary map transmittable from an encoder to a
decoder; wherein form a warped global compensated reference frame
comprises using the global motion trajectories at the outer corners
of the frame; wherein form a warped global compensated reference
frame comprises using an affine or perspective global motion
compensation method. The coder wherein the at least one portion
comprises a frame divided into a background and a foreground, and
wherein determining motion vectors comprises providing the
background and foreground each with one motion vector; the graphics
processing unit configured to perform dominant motion compensation
comprising locally applied global motion compensation so that at
least one other set of global motion trajectories are used at
corners of at least one region on the frame that is less than the
entire frame to form a displaced region; and use the pixel values
of the displaced region to form a prediction region that
corresponds to a region on the current frame; the graphics
processing unit configured to at least one of: perform local global
motion compensation on multiple regions of the frame by using a
different set of global motion trajectories on each region; wherein
each region is a tile, and dividing the frame into the tiles, and
wherein each tile has a set of global motion trajectories; provide
the option to perform local global motion compensation on a
fraction of a tile in addition to entire tiles; wherein each region
is shaped and sized depending on an object associated with the
region; wherein the object is one of: a foreground, a background,
and an object moving in the frame; the graphics processing unit
being configured to provide the option on the at least one region
on a region-by-region basis to select a prediction formed by: (1) a
motion vector to form a prediction for the at least one region and
using global motion compensation applied to the entire frame, or
(2) apply local global motion compensation with a set of global
motion trajectories at the region and using displaced pixel values
of the region to form a prediction; the graphics processing unit
configured to apply local global motion compensation with a set of
global motion trajectories applied at a region of the reference
frame that has an area less than the entire reference frame, and
use motion vectors to form a prediction for the at least one
region; the graphics processing unit configured to provide the
option to select a mode for a frame among: (1) use the dominant
motion compensated reference frame prediction, (2) use blended
prediction of multiple dominant motion compensated reference
frames, (3) use dominant motion compensated reference with
differential translational motion vector for prediction, and (4)
use dominant motion compensated reference with differential
translational motion vector for prediction, blended with another
reference frame; the graphics processing unit configured to at
least one of (a) to (c):
[0408] (a) perform motion compensated morphed reference prediction
using bilinear interpolation and motion compensation (MC) filter to
form a morphed reference frame MRef, tPred.sub.h as the
intermediate horizontal interpolation, and pred.sub.ji as the final
motion compensated morphed reference prediction:
MRef [ i ' ] [ j ' ] = ( ( 8 - p x ) ( 8 - p y ) Ref [ y 0 ] [ x 0
] + p x ( 8 - p y ) Ref [ y 0 ] [ x 0 + 1 ] + p y ( 8 - p x ) Ref [
y 0 + 1 ] [ x 0 ] + p y p x Ref [ y 0 + 1 ] [ x 0 + 1 ] + 31 ) 6
##EQU00011## tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p j ] [
k ] MRef [ i ' + m ] [ j ' + n + k - N t 2 + 1 ] ##EQU00011.2##
where m = [ - N t / 2 + 1 , H b + N t / 2 - 1 ] , n = [ 0 , W b - 1
] , Pred ji [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p i ] [ k ] tPred
h [ m + k - N t 2 + 1 ] [ n ] ##EQU00011.3## where m = [ 0 , H b -
1 ] , n = [ 0 , W b - 1 ] , ##EQU00011.4##
and where:
[0409] (iMVx, iMVy) is the transmitted motion vector in Sub-Pel
Unit (f.sub.s) for a block at (j, i) of size
(W.sub.b.times.H.sub.b); A, B, C, D, E, & F are affine
parameters calculated from the three Motion trajectories
transmitted; using separable motion compensation (MC) Filters with
filter coefficients h[f.sub.s][N.sub.t] of norm T, fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel),
where N.sub.t is the number MC Filter Taps, and
i'=i+(iMVy/f.sub.s).
j'=j+(iMVx/f.sub.s)
p.sub.i=iMVy & (f.sub.s-1)
p.sub.j=iMvx & (f.sub.s-1)
[0410] (j', i') is integer motion adjusted current pixel location
in Morphed Reference Image. p.sub.j, p.sub.i are the 1/8.sup.th pel
phases in the Morphed Reference Image;
x=(A*j+B*i'+C<<r)>>r
y=(D*j'+E*i'+F<<s)>>s
[0411] where (x, y) is the reference pixel coordinate in 1/8.sup.th
Pel accuracy for location (j', i')
p.sub.y=y & 0.times.7
p.sub.x=x & 0.times.7
y.sub.0=y>>3
x.sub.0=x>>3
[0412] where (x.sub.0, y.sub.0) is the integer pel location in Ref
Image. p.sub.x, p.sub.y is the 1/8.sup.th pel phase;
MRef[i'][j']=((8-p.sub.x)*(8-p.sub.y)*Ref[y.sub.0][x.sub.0]+p.sub.x*(8-p-
.sub.y)*Ref[y.sub.0][x.sub.0+1]+p.sub.y*(8-p.sub.x)*Ref[y.sub.0+1][x.sub.0-
]+p.sub.y*p.sub.x*Ref[y.sub.0+1][x.sub.0+1]+31)>>6
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.j][k]*MRef[i'+m][j'+n+k])/T,
[0413] where m=[-N/2-1, H.sub.b+N/2], where n=[0, W.sub.b-1], where
k=[-N.sub.t/2-1, N.sub.t/2],
[0413]
Pred.sub.ji[m][n]=SUM.sub.k(h[p.sub.j][k]*tPred.sub.h[m+k][n])/T,
[0414] where m=[0, H.sub.b-1], where n=[0, W.sub.b-1], where
k=[-N/2-1, +N.sub.t/2];
[0415] (b) perform morphed reference prediction using block motion
compensation (MC) filtering to form a morphed reference frame Mref,
and Predh as the intermediate horizontal interpolation:
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] ##EQU00012## m = [ - N t / 2 + 1
, H s + N t / 2 - 1 ] , n = [ 0 , W s - 1 ] , MRef [ i + m ] [ j +
n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [ m + k - N t 2 +
1 ] [ n ] ##EQU00012.2## where m = [ 0 , H s - 1 ] , n = [ 0 , W s
- 1 ] , ##EQU00012.3##
[0416] and where A, B, C, D, E, & F are affine parameters
calculated from the three Motion trajectories transmitted; using
separable MC Filters with filter coefficients h[fs][N.sub.t] of
norm T; fs is the Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel,
8=Eighth Pel), and where N.sub.t is the number MC Filter Taps
x=(A*j+B*i'+C<<r)>>r
y=(D*j'+E*i'+F<<s)>>s
[0417] (j, i) is every (W.sub.s.times.H.sub.s) sub-block location
in current image, x and y are reference pixel coordinates in 1/8th
Pel accuracy;
p.sub.y=y & 0.times.7
p.sub.x=x & 0.times.7
y.sub.0=y>>3
x.sub.0=x>>3
[0418] (x.sub.0, y.sub.0) is the integer pel location in the
reference frame (Ref Image); p.sub.x, p.sub.y is the 1/8th pel
phase.
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
, [0419] m=[-N.sub.t/2-1, H.sub.S+N.sub.t/2], n=[0, W.sub.s-1],
k=[-N.sub.t/2-1, +N.sub.t/2]; and
[0419]
MRef[i+m][j+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.h[m+k][n])/T,
[0420] m=[0, H.sub.s-1], n=[0, W.sub.s-1], k=[-N.sub.t/2-1,
+N.sub.t/2]; and
[0421] (c) perform motion compensated morphed reference prediction
using single loop motion compensation (MC) filtering to form a
morphed reference (Mref) and predictions tPred.sub.h as the
intermediate horizontal interpolation, and Pred.sub.ji as the final
motion compensated morphed reference prediction for block of size
W.sub.b.times.H.sub.b at (j, i):
tPred h [ m ] [ n ] = 1 T ' k = 0 N t - 1 h [ p x ] [ k ] Ref [ y 0
+ m ] [ x 0 + n + k - N t 2 + 1 ] ##EQU00013## for : m = [ - N t /
2 + 1 , H s + N t / 2 - 1 ] , n = [ 0 , W s - 1 ] , Pred ji [ uH s
+ m ] [ vW s + n ] = 1 T ' k = 0 N t - 1 h [ p y ] [ k ] tPred h [
m + k - N t 2 + 1 ] [ n ] ##EQU00013.2## for : m = [ 0 , H s - 1 ]
, n = [ 0 , W s - 1 ] , u = [ 0 , H b / H s - 1 ] , v = [ 0 , W b /
W s - 1 ] , ##EQU00013.3##
[0422] and where:
[0423] (iMVx, iMVy) is the transmitted Motion Vector in Sub-Pel
Units (fs) for a block at (j, i) of size (W.sub.b.times.H.sub.b).
A, B, C, D, E, & F are affine parameters calculated from the
three Motion trajectories transmitted; using separable MC Filters
with filter coefficients h[fs][N.sub.t] of norm T, fs is the
Sub-Pel Factor (e.g. 2=Half Pel, 4=Quarter Pel, 8=Eighth Pel), and
N.sub.t is the number MC Filter Taps;
i'=(i+u*H.sub.s)*fs+iMVx
j'=(j+v*W.sub.s)*fs+iMVy
[0424] where (j, i) is the current block pixel location, (u, v) is
the index of every (W.sub.s.times.H.sub.s) sub-block within given
current block of (W.sub.b.times.H.sub.b), and
(W.sub.s.times.H.sub.s) sub-block. Below, i', j' is motion adjusted
current pixel location in fs sub-pel accuracy,
x=((A*j'+B*i'+(C*fs)<<r)>>(r+3)
y=((D*j'+E*i'+(F*fs)<<s)>>(s+3)
[0425] where x & y are reference pixel coordinates in fs
sub-pel accuracy
p.sub.y=y & (fs-1)
p.sub.x=x & (fs-1)
y.sub.0=y/fs
x.sub.0=x/fs
[0426] where y.sub.0, x.sub.0 is the integer pel location in Ref
Image, p.sub.x, p.sub.y is the 1/8th pel phase;
tPred.sub.h[m][n]=SUM.sub.k(h[p.sub.x][k]*Ref[y.sub.0+m][x.sub.0+n+k])/T-
,
m=[-N.sub.t/2-1,H.sub.sN.sub.t/2],
n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
Pred.sub.ji[u*H.sub.s+m][v*W.sub.s+n]=SUM.sub.k(h[p.sub.y][k]*tPred.sub.-
h[m+k][n])/T,
m=[0,H.sub.s-1],
n=[0,W.sub.s-1],
k=[-N.sub.t/2-1,+N.sub.t/2],
v=[0,W.sub.b/W.sub.s-1],
u=[0,H.sub.b/H.sub.s-1].
[0427] By one implementation, at least one computer readable memory
comprising instructions, that when executed by a computing device,
cause the computing device to: obtain frames of pixel data and
having a current frame and a decoded reference frame to use as a
motion compensation reference frame for the current frame; divide
the reference frame into a plurality of portions that are less than
the area of the entire frame; perform dominant motion compensation
comprising applying local global motion compensation on at least
one of the portions by displacing the at least one portion of the
decoded reference frame by using global motion trajectories at a
boundary of the portion; and form a prediction portion that
corresponds to a portion on the current frame and by using the
pixel values of the displaced portion.
[0428] By another implementation, the computer readable memory may
also include wherein the instructions cause the computing device
to: perform local global motion compensation on a plurality of the
portions by using a different set of global motion trajectories on
each portion of the plurality of portions; wherein each portion is
a tile, the instructions cause the computing device to divide the
frame into the tiles, and wherein each tile has a set of global
motion trajectories; the instructions cause the computing device to
provide the option to perform local global motion compensation on a
fraction of a tile in addition to entire tiles; wherein local
global motion compensation trajectories are provided to half-tiles
or quarter-tiles; the instructions cause the computing device to at
least one of: (a) group a plurality of the tiles into a region, and
apply the same global motion trajectories on the tiles within the
same region, and different sets of global motion trajectories
depending on the region; and (b) group a plurality of the portions
into a region, and apply the same global motion trajectories on the
portions within the same region, and different sets of global
motion trajectories depending on the region; wherein each portion
is shaped and sized depending on an object associated with the
portion; wherein the object is one of: a foreground, a background,
and an object moving in the frame; wherein the portion is a
rectangle placed about the object; the instructions cause the
computing device to form a portion of the background of the
reference frame, and a portion of the foreground of the reference
frame each with a different set of local global motion trajectories
for each portion.
[0429] By a further example, at least one computer readable memory
comprising instructions, that when executed by a computing device,
cause the computing device to: obtain frames of pixel data and
having a current frame and a decoded reference frame to use as a
motion compensation reference frame for the current frame; form a
warped global compensated reference frame by displacing at least
one portion of the decoded reference frame by using global motion
trajectories; determine a motion vector indicating the motion of
the at least one portion and motion from a position based on the
warped global compensated reference frame to a position at the
current frame; and form a prediction portion based, at least in
part, on the motion vectors and corresponding to a portion on the
current frame.
[0430] By yet a further example, the computer readable memory may
also comprise wherein the at least one portion is at least one of:
(1) a block of pixels used as a unit to divide the current frame
and the reference frame into a plurality of the blocks; (2) at
least one tile of pixels, each tile being at least 64.times.64
pixels, and used as a unit to divide the current frame and the
reference frame into a plurality of the tiles; the instructions
causing the computing device of (2) to at least one of: (a) group
tiles together based on common association with an object in the
frame to form the at least one portion; and forming a single motion
vector for each group of tiles, (b) group the tiles based on a
merge map transmittable from an encoder to a decoder. (3) a region
of pixels shaped and sized depending on an object associated with
the region, wherein a boundary of the region is at least one of: a
shape that resembles the shape of the object associated with the
region, and a rectangle placed around the object associated with
the region; wherein the region is associated with at least one of:
a background of the frame, a foreground of the frame, and a moving
object in the frame; the instructions causing the computing device
to define the region based on a boundary map transmittable from an
encoder to a decoder; wherein form a warped global compensated
reference frame comprises using the global motion trajectories at
the outer corners of the frame; wherein form a warped global
compensated reference frame comprises using an affine or
perspective global motion compensation method. The memory wherein
the at least one portion comprises a frame divided into a
background and a foreground, and wherein determining motion vectors
comprises providing the background and foreground each with one
motion vector; the instructions causing the computing device to
perform dominant motion compensation comprising locally applied
global motion compensation so that at least one other set of global
motion trajectories are used at corners of at least one region on
the frame that is less than the entire frame to form a displaced
region; and use the pixel values of the displaced region to form a
prediction region that corresponds to a region on the current
frame; the instructions causing the computing device to at least
one of: (a) perform local global motion compensation on multiple
regions of the frame by using a different set of global motion
trajectories on each region; (b) wherein each region is a tile, and
dividing the frame into the tiles, and wherein each tile has a set
of global motion trajectories; (c) provide the option to perform
local global motion compensation on a fraction of a tile in
addition to entire tiles; wherein each region is shaped and sized
depending on an object associated with the region; wherein the
object is one of: a foreground, a background, and an object moving
in the frame; the instructions causing the computing device to
provide the option on the at least one region on a region-by-region
basis to select a prediction formed by: (1) a motion vector to form
a prediction for the at least one region and use global motion
compensation applied to the entire frame, or (2) apply local global
motion compensation with a set of global motion trajectories at the
region and using displaced pixel values of the region to form a
prediction; the instructions causing the computing device to apply
local global motion compensation with a set of global motion
trajectories applied at a region of the reference frame that has an
area less than the entire reference frame, and use motion vectors
to form a prediction for the at least one region; and the
instructions causing the computing device to provide the option to
select a mode for a frame among: (1) use the dominant motion
compensated reference frame prediction, (2) use blended prediction
of multiple dominant motion compensated reference frames, (3) use
dominant motion compensated reference with differential
translational motion vector for prediction, and (4) use dominant
motion compensated reference with differential translational motion
vector for prediction, blended with another reference frame.
[0431] In a further example, at least one machine readable medium
may include a plurality of instructions that in response to being
executed on a computing device, causes the computing device to
perform the method according to any one of the above examples.
[0432] In a still further example, an apparatus may include means
for performing the methods according to any one of the above
examples.
[0433] The above examples may include specific combination of
features. However, such the above examples are not limited in this
regard and, in various implementations, the above examples may
include the undertaking only a subset of such features, undertaking
a different order of such features, undertaking a different
combination of such features, and/or undertaking additional
features than those features explicitly listed. For example, all
features described with respect to the example methods may be
implemented with respect to the example apparatus, the example
systems, and/or the example articles, and vice versa.
* * * * *