U.S. patent application number 17/109972 was filed with the patent office on 2021-03-25 for motion candidate derivation based on spatial neighboring block in sub-block motion vector prediction.
The applicant listed for this patent is Beijing Bytedance Network Technology Co., Ltd., Bytedance Inc.. Invention is credited to Hongbin LIU, Yue WANG, Kai ZHANG, Li ZHANG.
Application Number | 20210092436 17/109972 |
Document ID | / |
Family ID | 1000005266128 |
Filed Date | 2021-03-25 |
![](/patent/app/20210092436/US20210092436A1-20210325-D00000.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00001.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00002.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00003.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00004.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00005.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00006.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00007.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00008.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00009.TIF)
![](/patent/app/20210092436/US20210092436A1-20210325-D00010.TIF)
View All Diagrams
United States Patent
Application |
20210092436 |
Kind Code |
A1 |
ZHANG; Li ; et al. |
March 25, 2021 |
MOTION CANDIDATE DERIVATION BASED ON SPATIAL NEIGHBORING BLOCK IN
SUB-BLOCK MOTION VECTOR PREDICTION
Abstract
Devices, systems and methods for the simplification of sub-block
motion candidate lists for video coding are described. In a
representative aspect, a method for video processing includes
determining, during a conversion between a current block and a
bitstream representation of the current block, a temporal motion
vector prediction candidate for a sub-block of the current block.
The temporal motion vector prediction candidate is completely
determined based on K neighboring blocks of the current block, K
being a positive integer. The method also includes performing the
conversion based on the temporal motion vector prediction candidate
for the sub-block.
Inventors: |
ZHANG; Li; (San Diego,
CA) ; ZHANG; Kai; (San Diego, CA) ; LIU;
Hongbin; (Beijing, CN) ; WANG; Yue; (Beijing,
CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Beijing Bytedance Network Technology Co., Ltd.
Bytedance Inc. |
Beijing
Los Angeles |
CA |
CN
US |
|
|
Family ID: |
1000005266128 |
Appl. No.: |
17/109972 |
Filed: |
December 2, 2020 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/IB2019/059109 |
Oct 24, 2019 |
|
|
|
17109972 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 19/52 20141101;
H04N 19/176 20141101; H04N 19/184 20141101 |
International
Class: |
H04N 19/52 20060101
H04N019/52; H04N 19/176 20060101 H04N019/176; H04N 19/184 20060101
H04N019/184 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 24, 2018 |
CN |
PCT/CN2018/111587 |
Dec 28, 2018 |
CN |
PCT/CN2018/124984 |
Claims
1. A method for video processing, comprising: constructing, during
a conversion between a current block of visual media data and a
bitstream representation of the current block, a merge candidate
list, wherein the current block comprises at least one subblock;
checking one spatial neighboring block in a pre-defined relative
position compared to the current block to determine a temporal
motion vector offset to locate at least one collocated region in a
collocated picture, wherein the at least one collocated region is
used to determine a temporal motion vector prediction candidate to
be added in the merge candidate list; and determining, based on the
merge candidate list, motion information for the at least one
subblock; and performing the conversion based on the motion
information.
2. The method of claim 1, wherein the merge candidate list is a
subblock merge candidate list and the temporal motion vector
prediction candidate comprises a subblock based temporal motion
vector prediction candidate.
3. The method of claim 1, wherein a position relation of the
current block with respect to the spatial neighboring block is same
as a position relation of a video block with respect to a spatial
neighboring block checked in a non-subblock merge candidate list
construction process of the video block.
4. The method of claim 1, wherein the spatial neighboring block is
adjacent to a bottom-left corner of the current block.
5. The method of claim 1, wherein the spatial neighboring block is
a spatial neighboring block A.sub.1.
6. The method of claim 5, wherein the spatial neighboring block A1
covers a luma location (xCb-1, yCb+cbHeight-1), wherein (xCb,yCb)
is a luma location of the top-left sample of the current block
relative to the top-left luma sample of a current picture
comprising the current block and cbHeight is a height of the
current block.
7. The method of claim 6, wherein the temporal motion vector offset
is determined without checking any spatial neighboring block other
than the spatial neighboring block A1.
8. The method of claim 1, wherein the spatial neighboring block is
coded prior to performing the conversion of the current block.
9. The method of claim 1, wherein the spatial neighboring block
which is determined to be available according to the checking
result is within a same tile as the current block.
10. The method of claim 2, wherein the temporal motion vector
prediction candidate is used to derive motion information for the
at least one sub-block of the current block.
11. The method of claim 1, wherein a size of a sub-block of the
current block is 8.times.8.
12. The method of claim 1, wherein a size of a sub-block of the
current block is same as a block size.
13. The method of claim 1, wherein checking the spatial neighboring
block comprises: determining whether the spatial neighboring block
is available to determine the temporal motion vector offset.
14. The method of claim 1, wherein the conversion comprises
encoding the current block into the bitstream representation.
15. The method of claim 1, wherein the conversion comprises
decoding the bitstream representation from the current block.
16. An apparatus for processing video data comprising a processor
and a non-transitory memory with instructions thereon, wherein the
instructions upon execution by the processor, cause the processor
to: construct, during a conversion between a current block of
visual media data and a bitstream representation of the current
block, a merge candidate list, wherein the current block comprises
at least one subblock; check one spatial neighboring block in a
pre-defined relative position compared to the current block to
determine a temporal motion vector offset to locate at least one
collocated region in a collocated picture, wherein the at least one
collocated region is used to determine a temporal motion vector
prediction candidate to be added in the merge candidate list; and
determine, based on the merge candidate list, motion information
for the at least one subblock; and perform the conversion based on
the motion information.
17. The apparatus of claim 16, wherein the spatial neighboring
block is adjacent to a bottom-left corner of the current block.
18. The apparatus of claim 16, wherein the spatial neighboring
block is a spatial neighboring block A1.
19. A non-transitory computer-readable storage medium storing
instructions that cause a processor to: construct, during a
conversion between a current block of visual media data and a
bitstream representation of the current block, a merge candidate
list, wherein the current block comprises at least one subblock;
check one spatial neighboring block in a pre-defined relative
position compared to the current block to determine a temporal
motion vector offset to locate at least one collocated region in a
collocated picture, wherein the at least one collocated region is
used to determine a temporal motion vector prediction candidate to
be added in the merge candidate list; and determine, based on the
merge candidate list, motion information for the at least one
subblock; and perform the conversion based on the motion
information.
20. A non-transitory computer-readable recording medium storing a
bitstream representation which is generated by a method performed
by a video processing apparatus, wherein the method comprises:
constructing, during a conversion between a current block of visual
media data and a bitstream representation of the current block, a
merge candidate list, wherein the current block comprises at least
one subblock; checking one spatial neighboring block in a
pre-defined relative position compared to the current block to
determine a temporal motion vector offset to locate at least one
collocated region in a collocated picture, wherein the at least one
collocated region is used to determine a temporal motion vector
prediction candidate to be added in the merge candidate list; and
determining, based on the merge candidate list, motion information
for the at least one subblock; and performing the conversion based
on the motion information.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of International
Application No. PCT/IB2019/059109, filed on Oct. 24, 2019, which
claims the priority to and benefit of International Patent
Application No. PCT/CN2018/111587, filed on Oct. 24, 2018 and
International Patent Application No. PCT/CN2018/124984, filed on
Dec. 28, 2018. All the aforementioned patent applications are
hereby incorporated by reference in their entireties.
TECHNICAL FIELD
[0002] This patent document is directed generally to image and
video coding technologies.
BACKGROUND
[0003] In spite of the advances in video compression, digital video
still accounts for the largest bandwidth use on the internet and
other digital communication networks. As the number of connected
user devices capable of receiving and displaying video increases,
it is expected that the bandwidth demand for digital video usage
will continue to grow.
SUMMARY
[0004] Devices, systems and methods related to digital video
coding, and specifically, to simplifying sub-block motion candidate
lists for video coding are described. The described methods may be
applied to both the existing video coding standards (e.g., High
Efficiency Video Coding (HEVC)) and future video coding standards
or video codecs.
[0005] In one representative aspect, the disclosed technology can
be used to provide a method for video processing. The method
includes determining, during a conversion between a current block
of visual media data and a bitstream representation of the current
block, a temporal motion vector prediction candidate for at least a
sub-block of the current block and performing the conversion based
on the temporal motion vector prediction candidate for the
sub-block. The temporal motion vector prediction candidate is
determined based on K neighboring blocks of the current block, K
being a positive integer.
[0006] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes determining, during a conversion between a current block
of a video and a bitstream representation of the video, a temporal
motion vector prediction candidate based on a temporal neighboring
block of the current block. The temporal neighboring block is
identified based on motion information of a spatial neighboring
block selected from one or more spatial neighboring blocks that are
different from at least one spatial neighboring block used in a
merge list construction process of a video block. The method also
includes performing the conversion based on the temporal motion
vector prediction candidate.
[0007] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes maintaining, for a conversion between a current block of a
video and a bitstream representation of the video, a table of
motion candidates based on past conversions of the video and the
bitstream representation; deriving a temporal motion vector
prediction candidate based on the table of motion candidates; and
performing the conversion based on the temporal motion vector
prediction candidate.
[0008] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes determining, for a conversion between a current block of a
video and a bitstream representation of the video, one or more
temporal motion vector prediction candidates for the current block
and performing the conversion based on the one or more temporal
motion vector prediction candidates. The one or more temporal
motion vector prediction candidates can be determined by
identifying a first temporal adjacent block of the current block
based on an initial motion vector, wherein the first temporal
adjacent block includes invalid motion information, and examining
additional temporal adjacent blocks to obtain the one or more
temporal motion vector prediction candidates.
[0009] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes determining, for a conversion between a current block of a
video and a bitstream representation of the video, one or more
temporal motion vector prediction candidates for the current block.
The one or more temporal motion vector prediction candidates
comprise a default temporal motion vector prediction candidate. The
method also includes performing the conversion based on the one or
more temporal motion vector prediction candidates.
[0010] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes determining, for a conversion between a current block of a
video and a bitstream representation of the video, a sub-block
level merge candidate list that includes at least one sub-block
coding type. The method also includes performing the conversion
based on the sub-block level merge candidate list.
[0011] In another representative aspect, the disclosed technology
may be used to provide a method for video processing that includes
determining, for a conversion between a current block of a video
and a bitstream representation of the video, a sub-block level
coding technique based on an indication that is signaled in a
picture header, a picture parameter set (PPS), a slice header, or a
tile group header. The method also includes performing the
conversion based on the sub-block level coding technique.
[0012] In another representative aspect, the disclosed technology
may be used to provide a method for video processing that includes
determining, for a conversion between a current block of a video
and a bitstream representation of the video, a sub-block level
temporal motion candidate using a derivation process applicable to
a block level temporal motion vector prediction candidate
conversion between the current block and the bitstream
representation, and performing the conversion based on the
sub-block level temporal motion candidate.
[0013] In another representative aspect, the disclosed technology
may be used to provide a method for video processing that includes
determining, for a conversion between a current block of a video
and a bitstream representation of the video, a block level temporal
motion vector prediction candidate using a derivation process
applicable to a sub-block level temporal motion candidate
conversion between the current block and the bitstream
representation, and performing the conversion based on the block
level temporal motion vector prediction candidate.
[0014] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes selecting, for sub-block level processing of a current
video block, motion information associated with a spatial
neighboring block, deriving, based on the motion information, a
motion vector prediction candidate, adding the motion vector
prediction candidate to a sub-block based merge list that is
different from a merge list, where the sub-block based merge list
excludes block-level prediction candidates, and reconstructing the
current video block or decoding other video blocks based on the
motion vector prediction candidate.
[0015] In another representative aspect, the disclosed technology
may be used to provide a method for video processing. This method
includes deriving, for sub-block level processing of a current
video block, a motion vector prediction candidate, assigning a
merge index to a type of the motion vector prediction candidate,
and adding the motion vector prediction candidate and the merge
index to a sub-block based merge list that is different from a
merge list, where the sub-block based merge list excludes
block-level prediction candidates.
[0016] In yet another representative aspect, the disclosed
technology may be used to provide a method for video processing.
This method includes deriving, for sub-block level processing of a
current video block, a motion vector prediction candidate, and
adding, based on an adaptive ordering, the motion vector prediction
candidate to a sub-block based merge list that is different from a
merge list, where the sub-block based merge list excludes
block-level prediction candidates.
[0017] In another example aspect, a method of video processing is
disclosed. The method includes determining a default motion
candidate for a sub-block based coding mode for a conversion
between a current video block and a bitstream representation of the
current video block using one of the following: (a) a
uni-prediction candidate that is derived by scaling a starting
motion candidate to a reference picture index within a reference
picture list X; or (b) a bi-prediction candidate that is derived by
scaling to reference picture indexes within two reference picture
lists; or (c) candidate in either (a) or (b) depending on a picture
type or a slice type of the current video block; or (d) a candidate
derived for a temporal motion vector predictor (TMVP) process of
the current video block.
[0018] In yet another representative aspect, the above-described
method is embodied in the form of processor-executable code and
stored in a computer-readable program medium.
[0019] In yet another representative aspect, a device that is
configured or operable to perform the above-described method is
disclosed. The device may include a processor that is programmed to
implement this method.
[0020] In yet another representative aspect, a video decoder
apparatus may implement a method as described herein.
[0021] The above and other aspects and features of the disclosed
technology are described in greater detail in the drawings, the
description and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG. 1 shows an example of sub-block based prediction.
[0023] FIGS. 2A and 2B show examples of the simplified 4-parameter
affine model and the simplified 6-parameter affine model,
respectively.
[0024] FIG. 3 shows an example of an affine motion vector field
(MVF) per sub-block.
[0025] FIGS. 4A and 4B show example candidates for the AF_MERGE
affine motion mode.
[0026] FIG. 5 shows an example of candidate positions for affine
merge mode.
[0027] FIG. 6 shows another example of candidate positions for
affine merge mode.
[0028] FIG. 7 shows an example of one coding unit (CU) with
sub-blocks and neighboring blocks of the CU.
[0029] FIG. 8 shows yet another example of candidate positions for
affine merge mode.
[0030] FIG. 9 shows an example of spatial neighboring blocks using
for alternative temporal motion vector prediction (ATMVP) temporal
block identification.
[0031] FIG. 10 shows an example of identifying an alternative
starting point for ATMVP.
[0032] FIG. 11 shows a flowchart of an example method for video
coding in accordance with the disclosed technology.
[0033] FIG. 12 shows a flowchart of another example method for
video coding in accordance with the disclosed technology.
[0034] FIG. 13 shows a flowchart of yet another example method for
video coding in accordance with the disclosed technology.
[0035] FIG. 14 is a block diagram of an example of a hardware
platform for implementing a visual media decoding or a visual media
encoding technique described in the present document.
[0036] FIG. 15 shows an example of how to identify the represented
block for default motion derivation.
[0037] FIG. 16 is a block diagram of an example video processing
system in which disclosed techniques may be implemented.
[0038] FIG. 17 is a flowchart representation of a method for video
processing in accordance with the present disclosure.
[0039] FIG. 18 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0040] FIG. 19 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0041] FIG. 20 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0042] FIG. 21 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0043] FIG. 22 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0044] FIG. 23 is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0045] FIG. 24A is a flowchart representation of another method for
video processing in accordance with the present disclosure.
[0046] FIG. 24B is a flowchart representation of yet another method
for video processing in accordance with the present disclosure.
DETAILED DESCRIPTION
[0047] Due to the increasing demand of higher resolution video,
video coding methods and techniques are ubiquitous in modern
technology. Video codecs typically include an electronic circuit or
software that compresses or decompresses digital video, and are
continually being improved to provide higher coding efficiency. A
video codec converts uncompressed video to a compressed format or
vice versa. There are complex relationships between the video
quality, the amount of data used to represent the video (determined
by the bit rate), the complexity of the encoding and decoding
algorithms, sensitivity to data losses and errors, ease of editing,
random access, and end-to-end delay (latency). The compressed
format usually conforms to a standard video compression
specification, e.g., the High Efficiency Video Coding (HEVC)
standard (also known as H.265 or MPEG-H Part 2), the Versatile
Video Coding (VVC) standard to be finalized, or other current
and/or future video coding standards.
[0048] Sub-block based prediction is first introduced into the
video coding standard by the High Efficiency Video Coding (HEVC)
standard. With sub-block based prediction, a block, such as a
Coding Unit (CU) or a Prediction Unit (PU), is divided into several
non-overlapped sub-blocks. Different sub-blocks may be assigned
different motion information, such as reference index or motion
vector (MV), and motion compensation (MC) is performed individually
for each sub-block. FIG. 1 shows an example of sub-block based
prediction.
[0049] Embodiments of the disclosed technology may be applied to
existing video coding standards (e.g., HEVC, H.265) and future
standards to reduce hardware implementation complexity or improve
coding performance. Section headings are used in the present
document to improve readability of the description and do not in
any way limit the discussion or the embodiments (and/or
implementations) to the respective sections only.
1. Examples of the Joint Exploration Model (JEM)
[0050] In some embodiments, future video coding technologies are
explored using a reference software known as the Joint Exploration
Model (JEM). In JEM, sub-block based prediction is adopted in
several coding tools, such as affine prediction, alternative
temporal motion vector prediction (ATMVP), spatial-temporal motion
vector prediction (STMVP), bi-directional optical flow (BIO),
Frame-Rate Up Conversion (FRUC). Affine prediction has also been
adopted into VVC.
1.1 Examples of Affine Prediction
[0051] In HEVC, only a translation motion model is applied for
motion compensation prediction (MCP). While in the real world,
there are many kinds of motion, e.g. zoom in/out, rotation,
perspective motions and the other irregular motions. In the VVC, a
simplified affine transform motion compensation prediction is
applied. As shown in FIGS. 2A and 2B, the affine motion field of
the block is described by two (in the 4-parameter affine model that
uses the variables a, b, e and f) or three (in the 6-parameter
affine model that uses the variables a, b, c, d, e and f) control
point motion vectors, respectively.
[0052] The motion vector field (MVF) of a block is described by the
following equation with the 4-parameter affine model and
6-parameter affine model respectively:
{ mv h ( x , y ) = ax - by + e = ( mv 1 h - mv 0 h ) w x - ( mv 1 v
- mv 0 v ) w y + mv 0 h mv v ( x , y ) = bx + ay + f = ( mv 1 v -
mv 0 v ) w x + ( mv 1 h - mv 0 h ) w y + mv 0 v Eq . ( 1 ) { mv h (
x , y ) = ax + cy + e = ( mv 1 h - mv 0 h ) w x + ( mv 2 h - mv 0 h
) h y + mv 0 h mv v ( x , y ) = bx + dy + f = ( mv 1 v - mv 0 v ) w
x + ( mv 2 v - mv 0 v ) h y + mv 0 v Eq . ( 2 ) ##EQU00001##
[0053] Herein, (mv.sup.h.sub.0, mv.sup.h.sub.0) is motion vector of
the top-left corner control point (CP), and (mv.sup.h.sub.1,
mv.sup.h.sub.1) is motion vector of the top-right corner control
point and (mv.sup.h.sub.2, mv.sup.h.sub.2) is motion vector of the
bottom-left corner control point, (x, y) represents the coordinate
of a representative point relative to the top-left sample within
current block. The CP motion vectors may be signaled (like in the
affine AMVP mode) or derived on-the-fly (like in the affine merge
mode). w and h are the width and height of the current block. In
practice, the division is implemented by right-shift with a
rounding operation. In VTM, the representative point is defined to
be the center position of a sub-block, e.g., when the coordinate of
the left-top corner of a sub-block relative to the top-left sample
within current block is (xs, ys), the coordinate of the
representative point is defined to be (xs+2, ys+2).
[0054] In a division-free design, Equations (1) and (2) are
implemented as:
{ iDMvHorX = ( mv 1 h - m 0 h ) << ( S - log 2 ( w ) )
iDMvHorY = ( mv 1 v - m 0 v ) << ( S - log 2 ( w ) ) Eq . ( 3
) ##EQU00002##
[0055] For the 4-parameter affine model shown in Equation (1):
{ iDMvVerX = - iDMvHorY iDMvVerY = - iDMvHorX Eq . ( 4 )
##EQU00003##
[0056] For the 6-parameter affine model shown in Equation (2):
{ iDMvVerX = ( mv 2 h - m 0 h ) << ( S - log 2 ( h ) )
iDMvVerY = ( mv 2 v - m 0 v ) << ( S - log 2 ( h ) ) Eq . ( 5
) ##EQU00004##
[0057] And thus, the motion vectors may be derived as:
{ mv h ( x , y ) = Normalize ( iDMvHorX x + iDMvVerX y + ( mv 0 h
<< S ) , S ) mv v ( x , y ) = Normalize ( iDMvHorY x +
iDMvVerY y + ( mv 0 v << S ) , S ) Eq . ( 6 ) Normalize ( Z ,
S ) = { ( Z + Off ) >> S if Z .gtoreq. 0 - ( ( - Z + Off )
>> S ) Otherwise Off = 1 << ( S - 1 ) Eq . ( 7 )
##EQU00005##
[0058] Herein, S represents the calculation precision. e.g. in VVC,
S=7. In VVC, the MV used in MC for a sub-block with the top-left
sample at (xs, ys) is calculated by Equation (6) with x=xs+2 and
y=ys+2.
[0059] To derive motion vector of each 4.times.4 sub-block, the
motion vector of the center sample of each sub-block, as shown in
FIG. 3, is calculated according to Equations (1) or (2), and
rounded to 1/16 fraction accuracy. Then the motion compensation
interpolation filters are applied to generate the prediction of
each sub-block with derived motion vector.
[0060] Affine model can be inherited from spatial neighbouring
affine-coded block such as left, above, above right, left bottom
and above left neighbouring block as shown in FIG. 4A. For example,
if the neighbour left bottom block A in FIG. 4A is coded in affine
mode as denoted by A0 in FIG. 4B, the Control Point (CP) motion
vectors mv.sub.0.sup.N, mv.sub.1.sup.N and mv.sub.2.sup.N of the
top left corner, above right corner and left bottom corner of the
neighbouring CU/PU which contains the block A are fetched. And the
motion vector mv.sub.0.sup.C, mv.sub.1.sup.C and mv.sub.2.sup.C
(which is only used for the 6-parameter affine model) of the top
left corner/top right/bottom left on the current CU/PU is
calculated based on mv.sub.0.sup.N, mv.sub.1.sup.N and
mv.sub.2.sup.N.
[0061] In some embodiments (e.g., VTM-2.0), sub-block (e.g.
4.times.4 block in VTM) LT stores mv0, RT stores mv1 if the current
block is affine coded. If the current block is coded with the
6-parameter affine model, LB stores mv2; otherwise (with the
4-parameter affine model), LB stores mv2'. Other sub-blocks stores
the MVs used for MC.
[0062] In some embodiments, when a CU is coded with affine merge
mode, e.g., in AF_MERGE mode, it gets the first block coded with
affine mode from the valid neighbour reconstructed blocks. And the
selection order for the candidate block is from left, above, above
right, left bottom to above left as shown in FIG. 4A.
[0063] The derived CP MVs mv.sub.0.sup.C, mv.sub.1.sup.C and
mv.sub.2.sup.C of current block can be used as CP MVs in the affine
merge mode. Or they can be used as MVP for affine inter mode in
VVC. It should be noted that for the merge mode, if the current
block is coded with affine mode, after deriving CP MVs of current
block, the current block may be further split into multiple
sub-blocks and each block derives its motion information based on
the derived CP MVs of current block.
2. Example Embodiments
[0064] Different from VTM wherein only one affine spatial
neighboring block may be used to derive affine motion for a block,
a separate list of affine candidates is constructed for the
AF_MERGE mode.
[0065] (1) Insert Inherited Affine Candidates into Candidate
List
[0066] In an example, inherited affine candidate means that the
candidate is derived from the valid neighbor reconstructed block
coded with affine mode.
[0067] As shown in FIG. 5, the scan order for the candidate block
is A.sub.1, B.sub.1, B.sub.0, A.sub.0 and B.sub.2. When a block is
selected (e.g., A.sub.1), the two-step procedure is applied:
[0068] (a) Firstly, use the three corner motion vectors of the CU
covering the block to derive two/three control points of current
block; and
[0069] (b) Based on the control points of current block to derive
sub-block motion for each sub-block within current block.
[0070] (2) Insert Constructed Affine Candidates
[0071] In some embodiments, if the number of candidates in affine
merge candidate list is less than MaxNumAffineCand, constructed
affine candidates are insert into the candidate list.
[0072] Constructed affine candidate means the candidate is
constructed by combining the neighbor motion information of each
control point.
[0073] The motion information for the control points is derived
firstly from the specified spatial neighbors and temporal neighbor
shown in FIG. 5. CPk (k=1, 2, 3, 4) represents the k-th control
point. A.sub.0, A.sub.1, A.sub.2, B.sub.0, B.sub.1, B.sub.2 and
B.sub.3 are spatial positions for predicting CPk (k=1, 2, 3); T is
temporal position for predicting CP4.
[0074] The coordinates of CP1, CP2, CP3 and CP4 is (0, 0), (W, 0),
(H, 0) and (W, H), respectively, where W and H are the width and
height of current block.
[0075] The motion information of each control point is obtained
according to the following priority order: [0076] For CP1, the
checking priority is B.sub.2.fwdarw.B.sub.3.fwdarw.A.sub.2. B.sub.2
is used if it is available. Otherwise, if B.sub.2 is available,
B.sub.3 is used. If both B.sub.2 and B.sub.3 are unavailable,
A.sub.2 is used. If all the three candidates are unavailable, the
motion information of CP1 cannot be obtained. [0077] For CP2, the
checking priority is B1.fwdarw.B0; [0078] For CP3, the checking
priority is A1.fwdarw.A0; [0079] For CP4, T is used.
[0080] Secondly, the combinations of controls points are used to
construct the motion model.
[0081] Motion vectors of three control points are needed to compute
the transform parameters in 6-parameter affine model. The three
control points can be selected from one of the following four
combinations ({CP1, CP2, CP4}, {CP1, CP2, CP3}, {CP2, CP3, CP4},
{CP1, CP3, CP4}). For example, use CP1, CP2 and CP3 control points
to construct 6-parameter affine motion model, denoted as Affine
(CP1, CP2, CP3).
[0082] Motion vectors of two control points are needed to compute
the transform parameters in 4-parameter affine model. The two
control points can be selected from one of the following six
combinations ({CP1, CP4}, {CP2, CP3}, {CP1, CP2}, {CP2, CP4}, {CP1,
CP3}, {CP3, CP4}). For example, use the CP1 and CP2 control points
to construct 4-parameter affine motion model, denoted as Affine
(CP1, CP2).
[0083] The combinations of constructed affine candidates are
inserted into to candidate list as following order: [0084] {CP1,
CP2, CP3}, {CP1, CP2, CP4}, {CP1, CP3, CP4}, {CP2, CP3, CP4}, {CP1,
CP2}, {CP1, CP3}, {CP2, CP3}, {CP1, CP4}, {CP2, CP4}, {CP3,
CP4}
[0085] (3) Insert Zero Motion Vectors
[0086] If the number of candidates in affine merge candidate list
is less than MaxNumAffineCand, zero motion vectors are insert into
the candidate list, until the list is full.
3. Examples of Advanced Temporal Motion Vector Prediction
(ATMVP)
[0087] In some existing implementations, advanced temporal motion
vector prediction (ATMVP) was included in the benchmark set
(BMS)-1.0 reference software, which derives multiple motion for
sub-blocks of one coding unit (CU) based on the motion information
of the collocated blocks from temporal neighboring pictures.
Although it improves the efficiency of temporal motion vector
prediction, the following complexity issues are identified for the
existing ATMVP design: [0088] The collocated pictures of different
ATMVP CUs may not be the same if multiple reference pictures are
used. This means the motion fields of multiple reference pictures
need to be fetched. [0089] The motion information of each ATMVP CU
is always derived based on 4.times.4 units, resulting in multiple
invocations of motion derivation and motion compensation for each
4.times.4 sub-block inside one ATMVP CU.
3.1 Examples of Simplified Collocated Block Derivation with One
Fixed Collocated Picture
[0090] In this example method, one simplified design is described
to use the same collocated picture as in HEVC, which is signaled at
the slice header, as the collocated picture for ATMVP derivation.
At the block level, if the reference picture of a neighboring block
is different from this collocated picture, the MV of the block is
scaled using the HEVC temporal MV scaling method, and the scaled MV
is used in ATMVP.
[0091] Denote the motion vector used to fetch the motion field in
the collocated picture R.sub.col as MV.sub.col. To minimize the
impact due to MV scaling, the MV in the spatial candidate list used
to derive MV.sub.col is selected in the following way: if the
reference picture of a candidate MV is the collocated picture, this
MV is selected and used as MV.sub.col without any scaling.
Otherwise, the MV having a reference picture closest to the
collocated picture is selected to derive MV.sub.col with
scaling.
3.2 Examples of Adaptive ATMVP Sub-Block Size
[0092] In this example method, the slice-level adaptation of the
sub-block size is supported for ATMVP motion derivation. In some
cases, the ATMVP is also known as sub-block temporal motion vector
prediction (sbTMVP). Specifically, one default sub-block size that
is used for the ATMVP motion derivation is signaled at sequence
level. Additionally, one flag is signaled at slice-level to
indicate if the default sub-block size is used for the current
slice. If the flag is false, the corresponding ATMVP sub-block size
is further signaled in the slice header for the slice.
3.3 Examples of a Simplified ATMVP Derivation
[0093] In some embodiments, ATMVP predicts the motion vectors of
the sub-CUs within a CU in two steps. The first step is to identify
the corresponding block in the collocated picture signaled at the
slice header. The second step is to split the current CU into
sub-CUs and obtain the motion information of each sub-CU from the
block corresponding to each sub-CU in the collocated picture.
[0094] In the first step, the collocated block is identified by
always scanning the MVs of the spatial merge candidates twice (once
for each list). The construction of merge candidates list is
performed by checking
A.sub.1.fwdarw.B.sub.1.fwdarw.B.sub.0.fwdarw.A.sub.0.fwdarw.ATMVP.fwdarw.-
B.sub.2.fwdarw.TMVP, as shown in FIG. 6. Therefore, the number of
MVP candidates in the merge list is up to 4 before ATMVP, which
means that in the worst case, the scanning process in the first
step needs to check all the 4 candidate blocks for each list.
[0095] To simplify the neighboring blocks' scanning process, the
method restricts the number of scanning process for deriving the
collocated block to one time, which means that only the first
available candidate in the merge list is checked. If the candidate
doesn't satisfy the condition of ATMVP neighboring blocks scanning
in current VVC working draft (none of motion vectors associated
with list 0 and list 1 is pointing to the collocated picture), zero
motion vector will be used to derive the collocated block in the
collocated picture. In this method, the checking process is
performed up to 1 time. Such a motion vector (e.g., in current
design, it could be motion associated with one spatial neighboring
block, or zero motion vector) is called a starting point MV for
ATMVP.
3.4 Derivation of Sub-Blocks' Motion Information
[0096] Two steps are performed in order to fill in all motion
information of different sub-blocks.
[0097] 1. Find default motion information: [0098] 1. Identify a
block based on the center position within the current block and the
starting point MV in the collocated picture (i.e., a block covering
(x0+W/2+(SPMV_X>>K), y0+H/2+(SPMV_Y>>K)) wherein (x0,
y0) is the top-left sample's coordinate, (W, H) is the block's
width and height, respectively; (SPMV_X, SPMV_Y) are the starting
point MV and K represents the motion vector's precision,
(SPMV_X>>K, SPMV_Y>>K) denotes the integer MV). [0099]
2. If the identified block is intra coded, ATMVP process is
terminated and ATMVP candidate is set to unavailable. [0100] 3.
Otherwise (the identified block is inter coded), motion information
of the identified block may be utilized to derive default motion
information (e.g., scaled to certain reference pictures). The
default motion information could be either uni-prediction or
bi-prediction depending on the reference pictures.
[0101] FIG. 15 shows an example of how to identify the represented
block for default motion derivation. The block covering the
position (filled circle) in the collocated picture is the
represented block for default motion derivation.
[0102] 2. If default motion is found, for each sub-block, based on
its center position within the sub-block and the starting point MV
to locate a representative block in the collocated picture. [0103]
1. If the representative block is coded as inter-mode, the motion
information of the representative block is utilized to derive the
final sub-block's motion (i.e., scaled to certain reference
pictures). [0104] 2. If the representative block is coded as
intra-mode, the sub-block's motion is set to the default motion
information.
4. Examples of Spatial-Temporal Motion Vector Prediction
(STMVP)
[0105] In the STMVP method, the motion vectors of the sub-CUs are
derived recursively, following raster scan order. FIG. 7 shows an
example of one CU with four sub-blocks and neighboring blocks.
Consider an 8.times.8 CU which contains four 4.times.4 sub-CUs A,
B, C, and D. The neighbouring 4.times.4 blocks in the current frame
are labelled as a, b, c, and d.
[0106] The motion derivation for sub-CU A starts by identifying its
two spatial neighbours. The first neighbour is the N.times.N block
above sub-CU A (block c). If this block c is not available or is
intra coded the other N.times.N blocks above sub-CU A are checked
(from left to right, starting at block c). The second neighbour is
a block to the left of the sub-CU A (block b). If block b is not
available or is intra coded other blocks to the left of sub-CU A
are checked (from top to bottom, staring at block b). The motion
information obtained from the neighbouring blocks for each list is
scaled to the first reference frame for a given list. Next,
temporal motion vector predictor (TMVP) of sub-block A is derived
by following the same procedure of TMVP derivation as specified in
HEVC. The motion information of the collocated block at location D
is fetched and scaled accordingly. Finally, after retrieving and
scaling the motion information, all available motion vectors (up to
3) are averaged separately for each reference list. The averaged
motion vector is assigned as the motion vector of the current
sub-CU.
5. Example Embodiments of Affine Merge Candidate Lists
5.1 Example Embodiments
[0107] In the affine merge mode of VTM-2.0.1, only the first
available affine neighbour can be used to derive motion information
of affine merge mode. In some embodiments, a candidate list for
affine merge mode is constructed by searching valid affine
neighbours and combining the neighbor motion information of each
control point.
[0108] The affine merge candidate list is constructed as following
steps:
[0109] (1) Insert Inherited Affine Candidates
[0110] Inherited affine candidate means that the candidate is
derived from the affine motion model of its valid neighbor affine
coded block. In the common base, as shown in FIG. 8, the scan order
for the candidate positions is: A1, B1, B0, A0 and B2.
[0111] After a candidate is derived, full pruning process is
performed to check whether same candidate has been inserted into
the list. If a same candidate exists, the derived candidate is
discarded.
[0112] (2) Insert Constructed Affine Candidates
[0113] If the number of candidates in affine merge candidate list
is less than MaxNumAffineCand (set to 5 in this example),
constructed affine candidates are inserted into the candidate list.
Constructed affine candidate means the candidate is constructed by
combining the neighbor motion information of each control
point.
[0114] The motion information for the control points is derived
firstly from the specified spatial neighbors and temporal neighbor
shown in FIG. 8. CPk (k=1, 2, 3, 4) represents the k-th control
point. A0, A1, A2, B0, B1, B2 and B3 are spatial positions for
predicting CPk (k=1, 2, 3); T is temporal position for predicting
CP4.
[0115] The coordinates of CP1, CP2, CP3 and CP4 is (0, 0), (W, 0),
(H, 0) and (W, H), respectively, where W and H are the width and
height of current block.
[0116] The motion information of each control point is obtained
according to the following priority order: [0117] For CP1, the
checking priority is B.sub.2.fwdarw.B.sub.3.fwdarw.A.sub.2. B.sub.2
is used if it is available. Otherwise, if B.sub.2 is available,
B.sub.3 is used. If both B.sub.2 and B.sub.3 are unavailable,
A.sub.2 is used. If all the three candidates are unavailable, the
motion information of CP1 cannot be obtained. [0118] For CP2, the
checking priority is B1.fwdarw.B0; [0119] For CP3, the checking
priority is A1.fwdarw.A0; [0120] For CP4, T is used.
[0121] Secondly, the combinations of controls points are used to
construct the motion model.
[0122] Motion information of three control points are needed to
construct a 6-parameter affine candidate. The three control points
can be selected from one of the following four combinations ({CP1,
CP2, CP4}, {CP1, CP2, CP3}, {CP2, CP3, CP4}, {CP1, CP3, CP4}).
Combinations {CP1, CP2, CP3}, {CP2, CP3, CP4}, {CP1, CP3, CP4} are
converted to a 6-parameter motion model represented by top-left,
top-right and bottom-left control points.
[0123] Motion information of two control points are needed to
construct a 4-parameter affine candidate. The two control points
can be selected from one of the following six combinations ({CP1,
CP4}, {CP2, CP3}, {CP1, CP2}, {CP2, CP4}, {CP1, CP3}, {CP3, CP4}).
Combinations {CP1, CP4}, {CP2, CP3}, {CP2, CP4}, {CP1, CP3}, {CP3,
CP4} are converted to a 4-parameter motion model represented by
top-left and top-right control points.
[0124] The combinations of constructed affine candidates are
inserted into to candidate list as following order: [0125] {CP1,
CP2, CP3}, {CP1, CP2, CP4}, {CP1, CP3, CP4}, {CP2, CP3, CP4}, {CP1,
CP2}, {CP1, CP3}, {CP2, CP3}, {CP1, CP4}, {CP2, CP4}, {CP3,
CP4}
[0126] For reference picture list X (X being 0 or 1) of a
combination, the reference picture index with highest usage ratio
in the control points is selected as the reference picture index of
list X, and motion vectors point to difference reference picture
are scaled.
[0127] After a candidate is derived, full pruning process is
performed to check whether same candidate has been inserted into
the list. If a same candidate exists, the derived candidate is
discarded.
[0128] (3) Padding with Zero Motion Vectors
[0129] If the number of candidates in affine merge candidate list
is less than 5, zero motion vectors with zero reference indices are
insert into the candidate list, until the list is full.
[0130] Therefore, the complexity of this separate affine merge list
is generated as follows:
TABLE-US-00001 Max Max inherited constructed Max Merge affine
affine candidate MV Additional list size candidate candidate
comparison scaling buffer 5 1 6 0 2 2x
6. Examples of a Sub-Block Merge Candidate List
[0131] In some embodiments, all the sub-block related motion
candidates are put in a separate merge list in addition to the
regular merge list for non-sub block merge candidates. For example,
the sub-block related motion candidates are put in a separate merge
list is named as `sub-block merge list`. In one example, the
sub-block merge list includes affine merge candidates, and ATMVP
candidate, and/or sub-block based STMVP candidate.
6.1 Example Embodiments
[0132] In some embodiments, the ATMVP merge candidate in the normal
merge list is moved to the first position of the affine merge list.
Such that all the merge candidates in the new list (e.g., sub-block
based merge candidate list) are based on sub-block coding
tools.
7. Drawbacks of Existing Methods
[0133] The idea of using the first available spatial merge
candidate is beneficial for the case when ATMVP candidate is added
to the regular merge mode. When ATMVP candidate is added to the
sub-block based merge list, it still requires to go through the
regular merge list construction process which interrupts the
motivation of adding ATMVP to the sub-block based merge list, that
is, reducing the interaction between sub-block merge list and
regular merge list. For the worst case, it still requires to check
the availability of four spatial neighboring blocks and check
whether it is intra coded or not.
[0134] In some embodiments, an ATMVP candidate is always inserted
to the merge list before affine motion candidates which may be not
be efficient for sequences with affine motion.
[0135] In some embodiments, an ATMVP candidate may be unavailable
after checking the temporal block in the co-located picture.
Therefore, for a given merge index, e.g., equal to 0, it may
represent an ATMVP candidate or an affine merge candidate which is
not compatible with a simplified hardware implementation.
8. Example Methods for Simplifying Sub-Block Motion Candidate
Lists
[0136] Embodiments of the disclosed technology simplify generating
sub-block motion candidate lists, which may improve video coding
efficiency and enhance both existing and future video coding
standards, are elucidated in the following examples described for
various implementations. In the following examples, which should
not be construed to be limiting, the term `ATMVP` is not restricted
to be `sub-block based ATMVP`, it could also represent the
`non-sub-block based ATMVP` which could also be interpreted as a
TMVP candidate. In addition, the following methods may also be
applied to other motion candidate list construction process, such
as AMVP candidate list, regular merge candidate with non-sub-block
merge candidates.
[0137] Furthermore, in the following examples, a motion category is
defined as including all motion candidates derived with the same
coding tool. In other words, for each coding tool, such as affine,
ATMVP, STMVP, the corresponding motion candidates belonging to a
single motion category.
Example 1
[0138] Instead of finding the first available merge candidate in
the regular merge list for the ATMVP candidate derivation, the
motion information of only one spatial neighboring block may be
accessed and utilized in the ATMVP candidate derivation process.
For example, if the motion information of the only spatial
neighboring block is available, the ATMVP candidate can be
determined based on such motion information. As another example, if
the motion information of the only spatial neighboring block is not
available, the ATMVP candidate can be determined based on default
motion information, such as a zero motion vector.
[0139] In some embodiments, the only spatial neighboring block is
defined as the first spatial neighboring block to be checked in the
regular merge list, such as A.sub.1 depicted in FIG. 5.
[0140] In some embodiments, the only spatial neighbouring block is
defined as the first available spatial neighbouring block in a
checking order, such as A1, B1, B0, A0, B2. For example, when a
neighboring block exists and has been coded when coding the current
block, it is treated as available. In some embodiments, when a
neighboring block exists in the same tile and has been coded when
coding the current block, it is treated as available. In one
example, the neighbouring blocks to be checked in order are A1,
B1.
[0141] In some embodiments, the only spatial neighboring block may
be different from those used in the regular merge mode derivation
process.
[0142] In some embodiments, the motion information of the first K
spatial neighbouring blocks may be accessed. In one example, K is
equal to 2, 3.
[0143] The checking order of spatial neighbouring blocks may be the
same or different from that used in the regular merge list
derivation process.
Example 2
[0144] In some embodiments, the ATMVP candidates may be derived
from a temporal block identified by motion information of a spatial
neighboring block of the coding unit that is not used in the
regular merge list derivation process.
[0145] In some embodiments, the spatial neighboring blocks used in
the ATMVP process can be totally different from those used in the
regular merge list derivation process. For example, blocks B3, A2,
A3 in FIG. 5 can be used.
[0146] In some embodiments, part of the spatial neighboring blocks
used in the ATMVP process may be the same as those used in the
regular merge list derivation process while the remaining are
different. For example, blocks A1, B1, B3, A2, A3 as shown in FIG.
5 can be used.
[0147] In some embodiments, the motion information of selected
spatial neighboring block(s) may be further scaled before
identifying the temporal block.
Example 3
[0148] In some embodiments, instead of relying on motion
information of a neighboring block, a History-based MV Prediction
(HMVP) candidate fetched from a HMVP table or list can be used to
derive the ATMVP candidate. History-based Motion Vector Prediction
(HMVP) methods, e.g., as described in PCT/CN2018/105193 and
PCT/CN2018/093987, use previously coded motion information for
prediction. That is, an ATMVP candidate can be derived based on a
table of motion candidates (e.g., can include ATMVP candidates and
non-ATMVP candidates) derived during the video processing. The
derived ATMVP candidate for the current coding unit can be used to
update the table of motion candidates. For example, the derived
ATMVP candidate can be added to the table after pruning is
performed. Subsequent processing can be performed based on the
updated table of motion candidates.
[0149] In some embodiments, scaling may be applied to the HMVP
candidate.
Example 4
[0150] In some embodiments, usage of neighbouring block(s) or
HMVP(s) to derive the ATMVP candidate may be adaptive.
[0151] In some embodiments, which block(s) are used may be signaled
from the encoder to the decoder in VPS/SPS/PPS/slice header/tile
group header/tile/CTU/CU/PU/CTU row.
[0152] In some embodiments, which block(s) are used may depend on
the width and/or height of the current block. FIG. 9 shows examples
of spatial neighboring blocks used for ATMVP temporal block
identification.
Example 5
[0153] When temporal block identified in the ATMVP process (such as
pointed by the (scaled) motion vector from the first available
merge candidate in current design or by zero motion vector)
couldn't return a valid ATMVP candidate (e.g., the temporal block
is intra-coded), more temporal blocks may be searched till one or
multiple ATMVP candidate is found.
[0154] The bottom-right of the identified temporal block may be
further checked. An example is depicted in FIG. 10. FIG. 10 shows
examples of alternative starting point identified by bottom-right
block of the starting point founded by prior art.
[0155] In some embodiments, a searching order may be defined, e.g.,
from the neighboring left, above, right, bottom of the identified
temporal block; then non-adjacent left, above, right, bottom of the
identified temporal block with a step, and so on.
[0156] In one example, all the temporal blocks to be checked shall
be within a certain region, such as within the same CTU as the
identified temporal block; or within the same CTU row of the
identified temporal block.
[0157] In some embodiments, if there is no available ATMVP
candidate after checking the identified temporal block and/or more
temporal blocks, default ATMVP candidate may be utilized.
[0158] In one example, default ATMVP candidate may be defined as a
motion candidate inherited from a spatial neighboring block. In
some embodiments, the motion candidate inherited from a spatial
neighboring block may be further scaled.
[0159] In some embodiments, default ATMVP candidate may be derived
from the starting point MV.
[0160] i. Example 1 may be utilized to find the starting point
MV.
[0161] ii. In one example, the starting point MV may be a motion
vector associated with a spatial adjacent or non-adjacent or
temporal block that its corresponding reference picture is the
collocated reference picture.
[0162] iii. In one example, the starting point MV may be a motion
vector associated with the first spatial block that its
corresponding reference picture is the collocated reference
picture.
[0163] iv. In one example, the starting point MV may be a zero
motion vector.
[0164] v. In one example, the starting point MV may be defined in
the same way as the current VVC design, that is, if the first
spatial neighboring block (e.g., with checking order of A1, B1, B0,
A0, B2) that is inter-coded, and its motion vectors of List X (X
being 0 or 1) is pointing to the collocated picture, the starting
point MV is set to the associated MV of the first spatial
neighboring block for List X. otherwise, the starting point MV is
set to zero motion vector.
[0165] vi. In one example, when the associated motion of the
represented block identified by the starting point MV and the
center position of current block is unavailable (e.g., the
represented block is intra-coded or the represented block is
unavailable (e.g., out of the restricted region)), the starting
point MV is treated as the motion information of the represented
block. In some embodiments, default motion information is derived
from the starting point MV (i.e., from the motion information of
the represented block).
[0166] vii. In some embodiments, furthermore, for any sub-block, if
the associated motion of its represented block identified by the
starting point MV and the center position of current sub-block is
unavailable, the starting point MV is treated as the motion
information of the represented block and utilized to derive the
sub-block motion.
[0167] In one example, default ATMVP candidate may be set to zero
motion vectors. In some embodiments, furthermore, the refence
picture associated with the ATMVP candidate may be set to the
collocated picture.
Example 6
[0168] When a motion vector is utilized to derive default motion
information for the ATMVP candidate (i.e., default ATMVP
candidate), a uni-prediction candidate may be derived by scaling a
starting motion vector to a reference picture index within the
reference picture list X. That is, the default ATMVP candidate is a
uni-prediction candidate.
[0169] In one example, the reference picture index is set to 0.
[0170] In one example, the reference picture index is set to the
smallest reference picture index that is corresponding to a
short-term reference picture.
[0171] In one example, the reference picture index is set to the
one that is used by TMVP candidate for reference picture list
X.
[0172] In one example, the reference picture list X is set to List
0 or list 1.
[0173] In one example, the reference picture list X is dependent on
slice/picture type and/or the reference picture list that
collocated picture is from.
[0174] In one example, X is set to List (B Slice/picture ?
1-getColFromL0Flag( ): 0). The function getColFromL0Flag( ) returns
1 when collocated picture is from List 0; and returns 0 when
collocated picture is from List 1.
Example 7
[0175] When a motion vector is utilized to derive default motion
information for the ATMVP candidate (i.e., default ATMVP
candidate), a bi-prediction candidate may be derived by scaling the
motion vector to certain reference picture indices within two
reference picture lists. That is, default ATMVP candidate is a
bi-prediction candidate.
[0176] For each reference picture, a certain reference picture
index is selected. In one example, it may be defined to be the same
as that used for the target reference picture index (e.g., 0 in
current VVC design) of TMVP candidate.
Example 8
[0177] Whether to set default motion information to uni or
B.sub.1-prediction candidate may depend on the picture/slice type.
In some embodiments, it may depend on block dimension. In one
example, if there are less than 64 samples, uni-prediction default
motion information is utilized in the ATMVP process.
Example 9
[0178] The final merge candidate list includes at least one
candidate for each motion category. A motion category can be a
temporal motion vector prediction candidate category, an affine
motion candidate category, or other types of categories. In some
embodiments, at least one ATMVP candidate is always included in the
merge candidate list. In some embodiments, at least one affine
candidate is always included in the merge candidate list.
Example 10
[0179] A merge index may be assigned to a given motion category.
When the merge index is known, the decoder can be ready to load
information from a branch corresponding to this motion
category.
[0180] For example, merge index within the range [m, n], inclusive,
may correspond to ATMVP candidates. Merge index within the range
[k, 1], inclusive, may correspond to affine candidates. In one
example, m=n=0, k=1, 1>=k
[0181] In some embodiments, the assigned index(s) may be adaptive.
In one example, the assigned index(s) may be signaled from the
encoder to the decoder in VPS/SPS/PPS/slice header/tile group
header/tile/CTU/CU/PU/CTU row. In one example, the assigned
index(s) may depend on the width and/or height of the current
block
Example 11
[0182] When multiple ATMVP candidates are added to the merge
candidate list (e.g., the sub-block merge candidate list), affine
motion candidates can be added before all ATMVP candidates. In some
embodiments, ATMVP candidates and affine motion candidates may be
inserted in an interleaved way, i.e., one or more affine motion
candidates are before an ATMVP candidate, some after.
Example 12
[0183] The order of affine motion candidates and non-affine motion
candidates (e.g., ATMVP and/or STMVP candidates) may be adaptively
changed from block to block, or from tile to tile, or from picture
to picture, or from sequence to sequence.
[0184] The adaptive order may depend on the neighboring blocks'
coded information and/or coded information of current block. In one
example, if all or majority of selected neighboring blocks are
coded with affine mode, affine motion candidates may be added
before other non-affine motion candidates.
[0185] The adaptive order may depend on the number of available
affine motion candidates and/or number of non-affine candidates. In
one example, if the ratio between number of available affine motion
candidates and non-affine candidates is larger than a threshold,
affine motion candidates may be inserted before non-affine motion
candidates.
[0186] The adaptive order may be only applicable to the first K
affine motion candidates (e.g., K is set to 1). In this case, only
the first K affine motion candidates may be adaptively decided
whether to be inserted before or after non-affine motion
candidates.
[0187] When there are even more than 2 categories (i.e., only
affine and ATMVP candidates in current design), the adaptive order
of inserting different motion candidate can still be applied.
Example 13
[0188] An indication of sub-block related technologies can be
signaled in picture header/PPS/slice header/tile group header. When
the indication tells a sub-block related technology is disabled,
there is no need to signal any related information for such a
technology in block level.
[0189] In one example, an indication (such as a flag) of ATMVP at
picture header/slice header/tile header may be signaled.
[0190] In one example, an indication (such as a flag) of affine at
picture header/slice header/tile header may be signaled.
Example 14
[0191] The order of motion candidates for different motion
categories (e.g., ATMVP, affine, STMVP) may be pre-defined or
signaled in SPS/VPS/PPS/picture header/tile group header/slice etc.
al.
[0192] In one example, a flag may be signaled to indicate whether
affine motion candidates should be after all non-affine motion
candidates.
[0193] In one example, a flag may be signaled to indicate whether
ATMVP motion candidates should be before all affine motion
candidates.
Example 15
[0194] It is desirable to unify the ATMVP sub-block motion
derivation process and TMVP process. In one example, the sub-block
motion derivation process reuses the TMVP process. In one example,
the TMVP process reuse the sub-block motion derivation process.
Example 16
[0195] For the sub-block merge candidate list, the ATMVP candidate
can always be available and the temporal information is disallowed
to derive affine candidates. In one example, merge index to the
sub-block merge candidate list equal to 0 is always corresponding
to an ATMVP candidate. In one example, merge index to the sub-block
merge candidate list unequal to 0 is always corresponding to an
affine candidate.
9. Additional Embodiment Examples
[0196] This section givens an embodiment that how to make ATMVP
candidate always being available. The changes compared to the
latest VVC specification are bold (for newly added) or italicized
(for deleted).
9.1 Example #1 (a Uni-Prediction Default ATMVP Candidate to Fill in
Sub-Blocks if Needed)
[0197] 8.3.4.4 Derivation process for subblock-based temporal
merging base motion data (note: default motion information) Inputs
to this process are: [0198] the location (xCtb, yCtb) of the
top-left sample of the luma coding tree block that contains the
current coding block, [0199] the location (xColCtrCb, yColCtrCb) of
the top-left sample of the collocated luma coding block that covers
the below-right center sample. [0200] the availability flags
availableFlagA.sub.0, availableFlagA.sub.1, availableFlagB.sub.0,
and availableFlagB.sub.1 of the neighbouring coding units, [0201]
the reference indices refIdxLXA.sub.0, refIdxLXA.sub.1,
refIdxLXB.sub.0, and refIdxLXB.sub.1 of the neighbouring coding
units, [0202] the prediction list utilization flags
predFlagLXA.sub.0, predFlagLXA.sub.1, predFlagLXB.sub.0, and
predFlagLXB.sub.1 of the neighbouring coding units, [0203] the
motion vectors in 1/16 fractional-sample accuracy mvLXA.sub.0,
mvLXA.sub.1, mvLXB.sub.0, and mvLXB.sub.1 of the neighbouring
coding units. Outputs of this process are: [0204] the motion
vectors ctrMvL0 and ctrMvL1, [0205] the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, [0206] the reference indices
ctrRefIdxL0 and ctrRefIdxL1, [0207] the temporal motion vector
tempMV. The variable tempMv is set as follows:
[0207] tempMv[0]=0 (8-501)
tempMv[1]=0 (8-502)
The variable currPic specifies the current picture. The variable
availableFlagN is set equal to FALSE, and the following applies:
[0208] When availableFlagA.sub.0 is equal to 1, the following
applies: [0209] availableFlagN is set equal to TRUE, [0210]
refIdxLXN is set equal to refIdxLXA.sub.0 and mvLXN is set equal to
mvLXA.sub.0, for X being replaced by 0 and 1. [0211] When
availableFlagN is equal to FALSE and availableFlagLB.sub.0 is equal
to 1, the following applies: [0212] availableFlagN is set equal to
TRUE, [0213] refIdxLXN is set equal to refIdxLXB.sub.0 and mvLXN is
set equal to mvLXB.sub.0, for X being replaced by 0 and 1. [0214]
When availableFlagN is equal to FALSE and availableFlagB.sub.1 is
equal to 1, the following applies: [0215] availableFlagN is set
equal to TRUE. [0216] refIdxLXN is set equal to refIdxLXB.sub.1 and
mvLXN is set equal to mvLXB.sub.1, for X being replaced by 0 and 1.
[0217] When availableFlagN is equal to FALSE and
availableFlagA.sub.1 is equal to 1, the following applies: [0218]
availableFlagN is set equal to TRUE. [0219] refIdxLXN is set equal
to refIdxLXA.sub.1 and mvLXN is set equal to mvLXA.sub.1, for X
being replaced by 0 and 1. tempMV is set to zero motion vector.
When availableFlagN is equal to TRUE, the following applies: [0220]
If all of the following conditions are true, tempMV is set equal to
mvL1N: [0221] predFlagL1N is equal to 1, [0222]
DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to 0,
[0223] DiffPicOrderCnt(aPic, currPic) is less than or equal to 0
for every picture aPic in every reference picture list of the
current slice, [0224] slice_type is equal to B, [0225]
collocated_from_l0_flag is equal to 0. [0226] Otherwise if all of
the following conditions are true, tempMV is set equal to mvL0N:
[0227] predFlagL0N is equal to 1, [0228] DiffPicOrderCnt(ColPic,
RefPicList0[refIdxL0N]) is equal to 0. The location (xColCb,
yColCb) of the collocated block inside ColPic is derived as
follows.
[0228]
xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<Ctb
Log 2SizeY)+3), (8-503)
xColCtrCb+(tempMv[0]>>4))
yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<Ctb
Log 2SizeY)-1), (8-504)
yColCtrCb+(tempMv[1]>>4))
The array colPredMode is set equal to the prediction mode array
CuPredMode of the collocated picture specified by ColPic.
[0229] The motion vectors ctrMvL0 and ctrMvL1, the prediction list
utilization flags ctrPredFlagL0 and ctrPredFlagL1, and the
reference indices ctrRefIdxL0 and ctrRefIdxL1 are derived as
follows: [0230] Set ctrPredFlagL0=0, ctrPredFlagL1=0. [0231] If
colPredMode[xColCb][yColCb] is equal to MODE_INTER, the following
applies: [0232] The variable currCb specifies the luma coding block
covering (xCtrCb,yCtrCb) inside the current picture. [0233] The
variable colCb specifies the luma coding block covering the
modified location given by ((xColCb>>3)<<3,
(yColCb>>3)<<3) inside the ColPic. [0234] The luma
location (xColCb, yColCb) is set equal to the top-left sample of
the collocated luma coding block specified by colCb relative to the
top-left luma sample of the collocated picture specified by ColPic.
[0235] The derivation process for temporal motion vector prediction
in subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb,
yColCb), centerRefIdxL0, and sbFlag set equal to 1 as inputs and
the output being assigned to ctrMvL0 and ctrPredFlagL0. [0236] The
derivation process for temporal motion vector prediction in
subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb, yColCb),
centerRefIdxL1, and sbFlag set equal to 1 as inputs and the output
being assigned to ctrMvL1 and ctrPredFlagL1. [0237] If both
ctrPredFlagL0 and ctrPredFlagL1 are equal to 0, the following
applies: [0238] Set target reference picture list index
X=slice.isInterB( )? 1-slice.getColFromL0Flag( ): 0 [0239] Scale
tempMV to reference picture list X and reference picture index
equal to 0 and set ctrMvLX to the scaled tempMV. [0240]
ctrPredFlagLX=1. [0241] Otherwise, the following applies:
[0241] ctrPredFlagL0=0 (8-505)
ctrPredFlagL1=0 (8-506)
Example #2 (a Bi-Prediction Default ATMVP Candidate to Fill in
Sub-Blocks if Needed)
8.3.4.4 Derivation Process for Subblock-Based Temporal Merging Base
Motion Data (Note: Default Motion Information)
[0242] Inputs to this process are: [0243] the location (xCtb, yCtb)
of the top-left sample of the luma coding tree block that contains
the current coding block, [0244] the location (xColCtrCb,
yColCtrCb) of the top-left sample of the collocated luma coding
block that covers the below-right center sample. [0245] the
availability flags availableFlagA.sub.0, availableFlagA.sub.1,
availableFlagB.sub.0, and availableFlagB.sub.1 of the neighbouring
coding units, [0246] the reference indices refIdxLXA.sub.0,
refIdxLXA.sub.1, refIdxLXB.sub.0, and refIdxLXB.sub.1 of the
neighbouring coding units, [0247] the prediction list utilization
flags predFlagLXA.sub.0, predFlagLXA.sub.1, predFlagLXB.sub.0, and
predFlagLXB.sub.1 of the neighbouring coding units, [0248] the
motion vectors in 1/16 fractional-sample accuracy mvLXA.sub.0,
mvLXA.sub.1, mvLXB.sub.0, and mvLXB.sub.1 of the neighbouring
coding units. Outputs of this process are: [0249] the motion
vectors ctrMvL0 and ctrMvL1, [0250] the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, [0251] the reference indices
ctrRefIdxL0 and ctrRefIdxL1, [0252] the temporal motion vector
tempMV. The variable tempMv is set as follows:
[0252] tempMv[0]=0 (8-501)
tempMv[1]=0 (8-502)
The variable currPic specifies the current picture. The variable
availableFlagN is set equal to FALSE, and the following applies:
[0253] When availableFlagA.sub.0 is equal to 1, the following
applies: [0254] availableFlagN is set equal to TRUE, [0255]
refIdxLXN is set equal to refIdxLXA.sub.0 and mvLXN is set equal to
mvLXA.sub.0, for X being replaced by 0 and 1. [0256] When
availableFlagN is equal to FALSE and availableFlagLB.sub.0 is equal
to 1, the following applies: [0257] availableFlagN is set equal to
TRUE, [0258] refIdxLXN is set equal to refIdxLXB.sub.0 and mvLXN is
set equal to mvLXB.sub.0, for X being replaced by 0 and 1. [0259]
When availableFlagN is equal to FALSE and availableFlagB.sub.1 is
equal to 1, the following applies: [0260] availableFlagN is set
equal to TRUE. [0261] refIdxLXN is set equal to refIdxLXB.sub.1 and
mvLXN is set equal to mvLXB.sub.1, for X being replaced by 0 and 1.
[0262] When availableFlagN is equal to FALSE and
availableFlagA.sub.1 is equal to 1, the following applies: [0263]
availableFlagN is set equal to TRUE. [0264] refIdxLXN is set equal
to refIdxLXA.sub.1 and mvLXN is set equal to mvLXA.sub.1, for X
being replaced by 0 and 1. tempMV is set to zero motion vector.
When availableFlagN is equal to TRUE, the following applies: [0265]
If all of the following conditions are true, tempMV is set equal to
mvL1N: [0266] predFlagL1N is equal to 1, [0267]
DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to 0,
[0268] DiffPicOrderCnt(aPic, currPic) is less than or equal to 0
for every picture aPic in every reference picture list of the
current slice, [0269] slice_type is equal to B, [0270]
collocated_from_l0_flag is equal to 0. [0271] Otherwise if all of
the following conditions are true, tempMV is set equal to mvL0N:
[0272] predFlagL0N is equal to 1, [0273] DiffPicOrderCnt(ColPic,
RefPicList0[refIdxL0N]) is equal to 0. The location (xColCb,
yColCb) of the collocated block inside ColPic is derived as
follows.
[0273]
xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<Ctb
Log 2SizeY)+3), (8-503)
xColCtrCb+(tempMv[0]>>4))
yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<Ctb
Log 2SizeY)-1), (8-504)
yColCtrCb+(tempMv[1]>>4))
The array colPredMode is set equal to the prediction mode array
CuPredMode of the collocated picture specified by ColPic. The
motion vectors ctrMvL0 and ctrMvL1, the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, and the reference indices
ctrRefIdxL0 and ctrRefIdxL1 are derived as follows: [0274] Set
ctrPredFlagL0=0, ctrPredFlagL1=0. [0275] If
colPredMode[xColCb][yColCb] is equal to MODE_INTER, the following
applies: [0276] The variable currCb specifies the luma coding block
covering (xCtrCb,yCtrCb) inside the current picture. [0277] The
variable colCb specifies the luma coding block covering the
modified location given by ((xColCb>>3)<<3,
(yColCb>>3)<<3) inside the ColPic. [0278] The luma
location (xColCb, yColCb) is set equal to the top-left sample of
the collocated luma coding block specified by colCb relative to the
top-left luma sample of the collocated picture specified by ColPic.
[0279] The derivation process for temporal motion vector prediction
in subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb,
yColCb), centerRefIdxL0, and sbFlag set equal to 1 as inputs and
the output being assigned to ctrMvL0 and ctrPredFlagL0. [0280] The
derivation process for temporal motion vector prediction in
subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb, yColCb),
centerRefIdxL1, and sbFlag set equal to 1 as inputs and the output
being assigned to ctrMvL1 and ctrPredFlagL1. [0281] If both
ctrPredFlagL0 and ctrPredFlagL1 are equal to 0, the following
applies: [0282] Set target reference picture list index X=0 [0283]
Scale tempMV to reference picture list X and reference picture
index equal to 0 and set ctrMvLX to the scaled tempMV. [0284]
ctrPredFlagLX=1. [0285] If current slice/picture is B slice, [0286]
1. set target reference picture list index X=1 [0287] 2. Scale
tempMV to reference picture list X and reference picture index
equal to 0 and set ctrMvLX to the scaled tempMV. [0288] 3.
ctrPredFlagLX=1. [0289] Otherwise, the following applies:
[0289] ctrPredFlagL0=0 (8-505)
ctrPredFlagL1=0 (8-506)
Example 3: ATMVP Candidate Starting Point MV from One Spatial
Block
8.3.4.4 Derivation Process for Subblock-Based Temporal Merging Base
Motion Data (Note: Default Motion Information)
[0290] Inputs to this process are: [0291] the location (xCtb, yCtb)
of the top-left sample of the luma coding tree block that contains
the current coding block, [0292] the location (xColCtrCb,
yColCtrCb) of the top-left sample of the collocated luma coding
block that covers the below-right center sample. [0293] the
availability flags availableFlagA.sub.0, availableFlagA.sub.1,
availableFlagB.sub.0, and availableFlagB.sub.1 of the neighbouring
coding units, [0294] the reference indices refIdxLXA.sub.0,
refIdxLXA.sub.1, refIdxLXB.sub.0, and refIdxLXB.sub.1 of the
neighbouring coding units, [0295] the prediction list utilization
flags predFlagLXA.sub.0, predFlagLXA.sub.1, predFlagLXB.sub.0, and
predFlagLXB.sub.1 of the neighbouring coding units, [0296] the
motion vectors in 1/16 fractional-sample accuracy mvLXA.sub.0,
mvLXA.sub.1, mvLXB.sub.0, and mvLXB.sub.1 of the neighbouring
coding units. Outputs of this process are: [0297] the motion
vectors ctrMvL0 and ctrMvL1, [0298] the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, [0299] the reference indices
ctrRefIdxL0 and ctrRefIdxL1, [0300] the temporal motion vector
tempMV. The variable tempMv is set as follows:
[0300] tempMv[0]=0 (8-501)
tempMv[1]=0 (8-502)
The variable currPic specifies the current picture. The variable
availableFlagN is set equal to FALSE, and the following applies:
[0301] When availableFlagA.sub.0 is equal to 1, the following
applies: [0302] availableFlagN is set equal to TRUE, [0303]
refIdxLXN is set equal to refIdxLXA.sub.0 and mvLXN is set equal to
mvLXA.sub.0, for X being replaced by 0 and 1. [0304] When
availableFlagN is equal to FALSE and availableFlagLB.sub.0 is equal
to 1, the following applies: [0305] availableFlagN is set equal to
TRUE, [0306] refIdxLXN is set equal to refIdxLXB.sub.0 and mvLXN is
set equal to mvLXB.sub.0, for X being replaced by 0 and 1. [0307]
When availableFlagN is equal to FALSE and availableFlagB.sub.1 is
equal to 1, the following applies: [0308] availableFlagN is set
equal to TRUE. [0309] refIdxLXN is set equal to refIdxLXB.sub.1 and
mvLXN is set equal to mvLXB.sub.1, for X being replaced by 0 and 1.
[0310] When availableFlagN is equal to FALSE and
availableFlagA.sub.1 is equal to 1, the following applies: [0311]
availableFlagN is set equal to TRUE. [0312] refIdxLXN is set equal
to refIdxLXA.sub.1 and mvLXN is set equal to mvLXA.sub.1, for X
being replaced by 0 and 1. When availableFlagN is equal to TRUE,
the following applies: [0313] . . .
Example #4 (Alignment of Sub-Block and TMVP Process)
8.3.4.4 Derivation Process for Subblock-Based Temporal Merging Base
Motion Data
[0314] Inputs to this process are: [0315] the location (xCtb, yCtb)
of the top-left sample of the luma coding tree block that contains
the current coding block, [0316] the location (xColCtrCb,
yColCtrCb) of the top-left sample of the collocated luma coding
block that covers the below-right center sample. [0317] the
availability flags availableFlagA.sub.0, availableFlagA.sub.1,
availableFlagB.sub.0, and availableFlagB.sub.1 of the neighbouring
coding units, [0318] the reference indices refIdxLXA.sub.0,
refIdxLXA.sub.1, refIdxLXB.sub.0, and refIdxLXB.sub.1 of the
neighbouring coding units, [0319] the prediction list utilization
flags predFlagLXA.sub.0, predFlagLXA.sub.1, predFlagLXB.sub.0, and
predFlagLXB.sub.1 of the neighbouring coding units, [0320] the
motion vectors in 1/16 fractional-sample accuracy mvLXA.sub.0,
mvLXA.sub.1, mvLXB.sub.0, and mvLXB.sub.1 of the neighbouring
coding units. Outputs of this process are: [0321] the motion
vectors ctrMvL0 and ctrMvL1, [0322] the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, [0323] the reference indices
ctrRefIdxL0 and ctrRefIdxL1, [0324] the temporal motion vector
tempMV. The variable tempMv is set as follows:
[0324] tempMv[0]=0 (8-501)
tempMv[1]=0 (8-502)
The variable currPic specifies the current picture. The variable
availableFlagN is set equal to FALSE, and the following applies:
[0325] . . . When availableFlagN is equal to TRUE, the following
applies: [0326] If all of the following conditions are true, tempMV
is set equal to mvL1N: [0327] predFlagL1N is equal to 1, [0328]
DiffPicOrderCnt(ColPic, RefPicList1[refIdxL1N]) is equal to 0,
[0329] DiffPicOrderCnt(aPic, currPic) is less than or equal to 0
for every picture aPic in every reference picture list of the
current slice, [0330] slice_type is equal to B, [0331]
collocated_from_l0_flag is equal to 0. [0332] Otherwise if all of
the following conditions are true, tempMV is set equal to mvL0N:
[0333] predFlagL0N is equal to 1, [0334] DiffPicOrderCnt(ColPic,
RefPicList0[refIdxL0N]) is equal to 0. The location (xColCb,
yColCb) of the collocated block inside ColPic is derived as
follows.
[0334]
xColCb=Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<Ctb
Log 2SizeY)+3), (8-503)
xColCtrCb+(tempMv[0]>>4))
yColCb=Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<Ctb
Log 2SizeY)-1), (8-504)
yColCtrCb+(tempMv[1]>>4))
The array colPredMode is set equal to the prediction mode array
CuPredMode of the collocated picture specified by ColPic. The
motion vectors ctrMvL0 and ctrMvL1, the prediction list utilization
flags ctrPredFlagL0 and ctrPredFlagL1, and the reference indices
ctrRefIdxL0 and ctrRefIdxL1 are derived as follows: [0335] If
colPredMode[xColCb][yColCb] is equal to MODE_INTER, the following
applies: [0336] The variable currCb specifies the luma coding block
covering (xCtrCb,yCtrCb) inside the current picture. [0337] The
variable colCb specifies the luma coding block covering the
modified location given by ((xColCb>>3)<<3,
(yColCb>>3)<<3) inside the ColPic. [0338] The luma
location (xColCb, yColCb) is set equal to the top-left sample of
the collocated luma coding block specified by colCb relative to the
top-left luma sample of the collocated picture specified by ColPic.
[0339] The derivation process for temporal motion vector prediction
in subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb,
yColCb), centerRefIdxL0, and sbFlag set equal to 1 as inputs and
the output being assigned to ctrMvL0 and ctrPredFlagL0. [0340] The
derivation process for temporal motion vector prediction in
subclause 8.3.2.12 is invoked with currCb, colCb, (xColCb, yColCb),
centerRefIdxL1, and sbFlag set equal to 1 as inputs and the output
being assigned to ctrMvL1 and ctrPredFlagL1. [0341] Otherwise, the
following applies:
[0341] ctrPredFlagL0=0 (8-505)
ctrPredFlagL1=0 (8-506)
8.3.2.11 Derivation Process for Temporal Luma Motion Vector
Prediction
[0342] Inputs to this process are: [0343] . . . Outputs of this
process are: [0344] the motion vector prediction mvLXCol in 1/16
fractional-sample accuracy, [0345] the availability flag
availableFlagLXCol. The variable currCb specifies the current luma
coding block at luma location (xCb, yCb). The variables mvLXCol and
availableFlagLXCol are derived as follows: [0346] If
slice_temporal_mvp_enabled_flag is equal to 0, both components of
mvLXCol are set equal to 0 and availableFlagLXCol is set equal to
0. [0347] Otherwise (slice_temporal_mvp_enabled_flag is equal to
1), the following ordered steps apply: [0348] 1. The bottom right
collocated motion vector is derived as follows:
[0348] xColBr=xCb+cbWidth (8-330)
yColBr=yCb+cbHeight (8-331) [0349] If yCb>>CtbLog2SizeY is
equal to yColBr>>CtbLog2SizeY, yColBr is less than
pic_height_in_luma_samples and xColBr is less than
pic_width_in_luma_samples, the following applies: [0350] The
variable colCb specifies the luma coding block covering the
modified location given by ((xColBr>>3)<<3,
(yColBr>>3)<<3) inside the collocated picture specified
by ColPic. [0351] The luma location (xColCb, yColCb) is set equal
to the top-left sample of the collocated luma coding block
specified by colCb relative to the top-left luma sample of the
collocated picture specified by ColPic. [0352] The derivation
process for collocated motion vectors as specified in clause
8.3.2.12 is invoked with currCb, colCb, (xColCb, yColCb), refIdxLX
and sbFlag set equal to 0 as inputs, and the output is assigned to
mvLXCol and availableFlagLXCol. [0353] Otherwise, both components
of mvLXCol are set equal to 0 and availableFlagLXCol is set equal
to 0. [0354] 2. When availableFlagLXCol is equal to 0, the central
collocated motion vector is derived as follows:
[0354] xColCtr=xCb+(cbWidth>>1) (8-332)
yColCtr=yCb+(cbHeight>>1) (8-333) [0355] The variable colCb
specifies the luma coding block covering the modified location
given by ((xColCtr>>3)<<3, (yColCtr>>3)<<3)
inside the collocated picture specified by ColPic. [0356] The luma
location (xColCb, yColCb) is set equal to the top-left sample of
the collocated luma coding block specified by colCb relative to the
top-left luma sample of the collocated picture specified by ColPic.
[0357] The derivation process for collocated motion vectors as
specified in clause 8.3.2.12 is invoked with currCb, colCb,
(xColCb, yColCb), refIdxLX and sbFlag set equal to 0 as inputs, and
the output is assigned to mvLXCol and availableFlagLXCol.
8.3.2.12 Derivation Process for Collocated Motion Vectors
[0358] Inputs to this process are: [0359] a variable currCb
specifying the current coding block, [0360] a variable colCb
specifying the collocated coding block inside the collocated
picture specified by ColPic, [0361] a luma location (xColCb,
yColCb) specifying the top-left sample of the collocated luma
coding block specified by colCb relative to the top-left luma
sample of the collocated picture specified by ColPic, [0362] a
reference index refIdxLX, with X being 0 or 1, [0363] a flag
indicating a subblock temporal merging candidate sbFlag. Outputs of
this process are: [0364] the motion vector prediction mvLXCol in
1/16 fractional-sample accuracy, [0365] the availability flag
availableFlagLXCol. The variable currPic specifies the current
picture. The arrays predFlagL0Col[x][y], mvL0Col[x][y] and
refIdxL0Col[x][y] are set equal to PredFlagL0[x][y], MvL0[x][y] and
RefIdxL0[x][y], respectively, of the collocated picture specified
by ColPic, and the arrays predFlagL1Col[x][y], mvL1Col[x][y] and
refIdxL1Col[x][y] are set equal to PredFlagL1[x][y], MvL1[x][y] and
RefIdxL1[x][y], respectively, of the collocated picture specified
by ColPic. The variables mvLXCol and availableFlagLXCol are derived
as follows: [0366] If colCb is coded in an intra prediction mode,
both components of mvLXCol are set equal to 0 and
availableFlagLXCol is set equal to 0. [0367] Otherwise, the motion
vector mvCol, the reference index refIdxCol and the reference list
identifier listCol are derived as follows: [0368] If sbFlag is
equal to 0, availableFlagLXCol is set to 1 and the following
applies: [0369] If predFlagL0Col[xColCb][yColCb] is equal to 0,
mvCol, refIdxCol and listCol are set equal to
mvL1Col[xColCb][yColCb], refIdxL1Col[xColCb][yColCb] and L1,
respectively. [0370] Otherwise, if predFlagL0Col[xColCb][yColCb] is
equal to 1 and predFlagL1Col[xColCb][yColCb] is equal to 0, mvCol,
refIdxCol and listCol are set equal to mvL0Col[xColCb][yColCb],
refIdxL0Col[xColCb][yColCb] and L0, respectively. [0371] Otherwise
(predFlagL0Col[xColCb][yColCb] is equal to 1 and
predFlagL1Col[xColCb][yColCb] is equal to 1), the following
assignments are made: [0372] If NoBackwardPredFlag is equal to 1,
mvCol, refIdxCol and listCol are set equal to
mvLXCol[xColCb][yColCb], refIdxLXCol[xColCb][yColCb] and LX,
respectively. [0373] Otherwise, mvCol, refIdxCol and listCol are
set equal to mvLNCol[xColCb][yColCb], refIdxLNCol[xColCb][yColCb]
and LN, respectively, with N being the value of
collocated_from_l0_flag. [0374] Otherwise (sbFlag is equal to 1),
the following applies: [0375] If PredFlagLXCol[xColCb][yColCb] is
equal to 1, mvCol, refIdxCol, and listCol are set equal to
mvLXCol[xColCb][yColCb], refIdxLXCol[xColCb][yColCb], and LX,
respectively, availableFlagLXCol is set to 1. [0376] Otherwise
(PredFlagLXCol[xColCb][yColCb] is equal to 0), the following
applies: [0377] If DiffPicOrderCnt(aPic, currPic) is less than or
equal to 0 for every picture aPic in every reference picture list
of the current slice and PredFlagLYCol[xColCb][yColCb] is equal to
1, mvCol, refIdxCol, and listCol are set to
mvLYCol[xColCb][yColCb], refIdxLYCol[xColCb][yColCb] and LY,
respectively, with Y being equal to !X where X being the value of X
this process is invoked for. availableFlagLXCol is set to 1. [0378]
Both the components of mvLXCol are set to 0 and availableFlagLXCol
is set equal to 0.
[0379] When availableFlagLXCol is equal to TRUE, mvLXCol and
availableFlagLXCol are derived as follows:
[0380] . . . (remaining details similar to the current version of
VVC specification).
[0381] The examples described above may be incorporated in the
context of the methods described below, e.g., methods 1100, 1200
and 1300, which may be implemented at a video decoder and/or video
encoder.
[0382] FIG. 11 shows a flowchart of an example method for video
processing. The method 1100 includes, at step 1110, selecting, for
sub-block level processing of a current video block, motion
information associated with a spatial neighboring block.
[0383] In some embodiments, and in the context of Example 1, the
spatial neighboring block is a first spatial neighboring block that
is checked in the sub-block based merge list.
[0384] In some embodiments, and in the context of Example 4,
selecting the spatial neighboring block is based on signaling in a
video parameter set (VPS), a sequence parameter set (SPS), a
picture parameter set (PPS), a slice header, a tile group header, a
coding tree unit (CTU), a tile, a coding unit (CU), a prediction
unit (PU) or a CTU row. In other embodiments, selecting the spatial
neighboring block is based on a height or a width of the current
video block.
[0385] The method 1100 includes, at step 1120, deriving, based on
the motion information, a motion vector prediction candidate.
[0386] In some embodiments, and in the context of Example 2,
deriving the motion vector prediction candidate includes the steps
of identifying, based on the motion information, a temporal
neighboring block, and deriving the motion vector prediction
candidate based on the temporal neighboring block. In some
embodiments, the motion information is scaled prior to the
identifying the temporal neighboring block.
[0387] In some embodiments, and in the context of Example 2, the
identifying the temporal neighboring block includes the steps of
performing a sequential multi-step search over each of a plurality
of temporal neighboring blocks, and terminating the sequential
multi-step search upon identifying a first of the plurality of
temporal neighboring blocks that returns at least one valid motion
vector prediction candidate. In one example, the sequential
multi-step search is over one or more temporal blocks in a coding
tree unit (CTU) that comprises the identified temporal neighboring
block. In another example, the sequential multi-step search is over
one or more temporal blocks in a single row of a coding tree unit
(CTU) that comprises the identified temporal neighboring block.
[0388] In some embodiments, and in the context of Example 3, the
motion information is replaced by a history-based motion vector
prediction (HMVP) candidate prior to deriving the motion vector
prediction candidate. In an example, the HMVP candidate is scaled
prior to deriving the motion vector prediction candidate.
[0389] The method 1100 includes, at step 1130, adding the motion
vector prediction candidate to a sub-block based merge list that is
different from a merge list and excludes block-level prediction
candidates.
[0390] The method 1100 includes, at step 1140, reconstructing the
current video block or decoding other video blocks based on the
motion vector prediction candidate.
[0391] FIG. 12 shows a flowchart of an example method for video
processing. The method 1200 includes, at step 1210, deriving, for
sub-block level processing of a current video block, a motion
vector prediction candidate.
[0392] The method 1200 includes, at step 1220, assigning a merge
index to a type of the motion vector prediction candidate.
[0393] The method 1200 includes, at step 1230, adding the motion
vector prediction candidate and the merge index to a sub-block
based merge list that is different from a merge list and excludes
block-level prediction candidates.
[0394] In some embodiments, and in the context of Example 7, the
method 1200 further includes the steps of determining the type of
motion information associated with the current video block, and
reconstructing the current video block or decoding other video
blocks based on one or more motion vector prediction candidates
from the sub-block based merge list, wherein the one or more motion
vector prediction candidates are selected based on the type. In one
example, the merge index within a first range corresponds to one or
more alternative temporal motion vector prediction (ATMVP)
candidates. In another example, the merge index within a second
range corresponds to one or more affine candidates. In yet another
example, the merge index is based on signaling in a video parameter
set (VPS), a sequence parameter set (SPS), a picture parameter set
(PPS), a slice header, a tile group header, a coding tree unit
(CTU), a tile, a coding unit (CU), a prediction unit (PU) or a CTU
row. In yet another example, the type of the motion vector
prediction candidate is an affine motion vector prediction
candidate, an alternative temporal motion vector prediction (ATMVP)
candidate or a spatial-temporal motion vector prediction (STMVP)
candidate.
[0395] In some embodiments, and in the context of Example 8, adding
the motion vector prediction candidate to the sub-block based merge
list is based on an adaptive ordering. In one example, one or more
alternative temporal motion vector prediction (ATMVP) candidates
are added to the sub-block based merge list prior to any affine
motion vector prediction candidates. In another example, one or
more affine motion vector prediction candidates are added to the
sub-block based merge list prior to any alternative temporal motion
vector prediction (ATMVP) candidates.
[0396] FIG. 13 shows a flowchart of an example method for video
processing. The method 1300 includes, at step 1310, deriving, for
sub-block level processing of a current video block, a motion
vector prediction candidate.
[0397] The method 1300 includes, at step 1320, adding, based on an
adaptive ordering, the motion vector prediction candidate to a
sub-block based merge list that is different from a merge list and
excludes block-level prediction candidates.
[0398] In some embodiments, and in the context of Example 9, the
adaptive ordering is based on coded information of the current
block. In other embodiments, the adaptive ordering is based on
coded information of one or more neighboring blocks of the current
block. In yet other embodiments, the adaptive ordering is based on
signaling in a video parameter set (VPS), a sequence parameter set
(SPS), a picture parameter set (PPS), a slice header, a tile group
header, a coding tree unit (CTU), a tile, a coding unit (CU), a
prediction unit (PU) or a CTU row. In yet other embodiments, the
adaptive ordering is based on a first number of available affine
motion vector prediction candidates and/or a second number of
available non-affine motion vector prediction candidates.
[0399] In some embodiments, e.g., as disclosed in Items 6-8 and
15-16 in section 8, an example video processing method includes
determining a default motion candidate for a sub-block based coding
mode for a conversion between a current video block and a bitstream
representation of the current video block using one of the
following: (a) a uni-prediction candidate that is derived by
scaling a starting motion candidate to a reference picture index
within a reference picture list X; or (b) a bi-prediction candidate
that is derived by scaling to reference picture indexes within two
reference picture lists; or (c) candidate in either (a) or (b)
depending on a picture type or a slice type of the current video
block; or (d) a candidate derived for a temporal motion vector
predictor (TMVP) process of the current video block. For example,
under option (a), the starting motion vector could be a motion
vector that is associated with a block pointing to a collocated
picture or the first spatially neighboring block that has a motion
vector pointing to a collocated picture or a zero motion vector or
another choice of motion vector. Additional features and
implementation options are described in Section 8, items 6-8 and
15-16.
9. Example Implementations of the Disclosed Technology
[0400] FIG. 14 is a block diagram of a video processing apparatus
1400. The apparatus 1400 may be used to implement one or more of
the methods described herein. The apparatus 1400 may be embodied in
a smartphone, tablet, computer, Internet of Things (IoT) receiver,
and so on. The apparatus 1400 may include one or more processors
1402, one or more memories 1404 and video processing hardware 1406.
The processor(s) 1402 may be configured to implement one or more
methods (including, but not limited to, methods 1100, 1200 and
1300) described in the present document. The memory (memories) 1404
may be used for storing data and code used for implementing the
methods and techniques described herein. The video processing
hardware 1406 may be used to implement, in hardware circuitry, some
techniques described in the present document.
[0401] FIG. 16 is a block diagram showing an example video
processing system 1600 in which various techniques disclosed herein
may be implemented. Various implementations may include some or all
of the components of the system 1600. The system 1600 may include
input 1602 for receiving video content. The video content may be
received in a raw or uncompressed format, e.g., 8 or 10 bit
multi-component pixel values, or may be in a compressed or encoded
format. The input 1602 may represent a network interface, a
peripheral bus interface, or a storage interface. Examples of
network interface include wired interfaces such as Ethernet,
passive optical network (PON), etc. and wireless interfaces such as
Wi-Fi or cellular interfaces.
[0402] The system 1600 may include a coding component 1604 that may
implement the various coding or encoding methods described in the
present document. The coding component 1604 may reduce the average
bitrate of video from the input 1602 to the output of the coding
component 1604 to produce a coded representation of the video. The
coding techniques are therefore sometimes called video compression
or video transcoding techniques. The output of the coding component
1604 may be either stored, or transmitted via a communication
connected, as represented by the component 1606. The stored or
communicated bitstream (or coded) representation of the video
received at the input 1602 may be used by the component 1608 for
generating pixel values or displayable video that is sent to a
display interface 1610. The process of generating user-viewable
video from the bitstream representation is sometimes called video
decompression. Furthermore, while certain video processing
operations are referred to as "coding" operations or tools, it will
be appreciated that the coding tools or operations are used at an
encoder and corresponding decoding tools or operations that reverse
the results of the coding will be performed by a decoder.
[0403] Examples of a peripheral bus interface or a display
interface may include universal serial bus (USB) or high definition
multimedia interface (HDMI) or Displayport, and so on. Examples of
storage interfaces include SATA (serial advanced technology
attachment), PCI, IDE interface, and the like. The techniques
described in the present document may be embodied in various
electronic devices such as mobile phones, laptops, smartphones or
other devices that are capable of performing digital data
processing and/or video display.
[0404] FIG. 17 is a flowchart representation of a method 1700 for
video processing in accordance with the present disclosure. The
method 1700 includes, at operation 1710, determining, during a
conversion between a current block of a video and a bitstream
representation of the video, a temporal motion vector prediction
candidate for at least a sub-block of the current block. The
temporal motion vector prediction candidate is determined based on
K neighboring blocks of the current block, K being a positive
integer. The method 1700 includes, at operation 1720, performing
the conversion based on the temporal motion vector prediction
candidate for the sub-block.
[0405] In some embodiments, the temporal motion vector prediction
candidate is completely determined based on K neighboring blocks of
the current block. In some embodiments, K=1. In some embodiments,
K=2 or 3. In some embodiments, the temporal motion vector
prediction candidate is determined without checking all motion
candidates in a merge list of the current block. In some
embodiments, one of the K spatial neighboring blocks is same as a
first spatial neighboring block checked in a merge list
construction process of a video block. In some embodiments, a
spatial neighboring block of the video block is adjacent to a
bottom-left corner of the current block. In some embodiments, at
least one of the K spatial neighboring blocks is different from
spatial neighboring blocks checked in in a merge list construction
process of a video block. In some embodiments, the K spatial
neighboring blocks are determined by checking a plurality of
available spatial neighboring blocks in a first order.
[0406] In some embodiments, the method further includes determining
that a spatial neighboring block is available in case the spatial
neighboring block is coded prior to performing the conversion of
the current block. In some embodiments, the spatial neighboring
block is within a same tile as the current block. In some
embodiments, the plurality of available spatial neighboring blocks
includes a first block adjacent to a bottom-left corner of the
current block and a second block adjacent to a top-right corner of
the current block. In some embodiments, the method includes
checking the K spatial neighboring blocks of the current block in a
first order, wherein spatial neighboring blocks in a block-based
merge list construction process of a video block are checked in a
second order, the second order being different than the first
order. In some embodiments, K is equal to 1, and the first order
indicates that a first spatial neighboring block adjacent to a
bottom-left corner of the current block is to be checked while the
second order indicates that a second spatial neighboring block
adjacent to an above-right corner of a video block is to be
checked.
[0407] In some embodiments, the temporal motion vector prediction
includes an Alternative Temporal Motion Vector Prediction (ATMVP)
candidate. In some embodiments, the method includes identifying a
temporal block according to motion information of the K spatial
neighboring blocks and deriving motion information of the sub-block
based on the motion information of the identified temporal block.
In some embodiments, the method further includes identifying a
second video block in a different picture according to motion
information of the K neighboring blocks and deriving temporal
motion information of a sub-block based on the second video block.
In some embodiments, a sub-block size is 8.times.8. In some
embodiments, a sub-block size is same as a block size.
[0408] In some embodiments, the conversion comprises encoding the
current block to generate the bitstream representation. In some
embodiments, the conversion comprises decoding the bitstream
representation to generate the current block.
[0409] FIG. 18 is a flowchart representation of a method 1800 for
video processing in accordance with the present disclosure. The
method 1800 includes, at operation 1810, determining, during a
conversion between a current block of a video and a bitstream
representation of the video, a temporal motion vector prediction
candidate based on a temporal neighboring block of the current
block. The temporal neighboring block is identified based on motion
information of a spatial neighboring block selected from one or
more spatial neighboring blocks that are different from at least
one spatial neighboring block used in a merge list construction
process of a video block. The method 1800 also includes, at
operation 1820, performing the conversion based on the temporal
motion vector prediction candidate.
[0410] In some embodiments, the temporal motion vector prediction
candidate includes an Alternative Temporal Motion Vector Prediction
(ATMVP) candidate. In some embodiments, the one or more spatial
neighboring blocks are different from all candidates in the merge
list of the current block. In some embodiments, the one or more
spatial neighboring blocks include a block adjacent to a top-left
corner of the current block. In some embodiments, a subset of the
one or more spatial neighboring blocks is same as one or more
candidates that are derived from a merge list construction process
of a video block. In some embodiments, the one or more spatial
neighboring blocks include a first block adjacent to a bottom-left
corner of the current block or a second block adjacent to a
top-right corner of the current block.
[0411] In some embodiments, the motion information is scaled before
the temporal neighboring block is identified. In some embodiments,
the spatial neighboring block is selected based on information in a
video parameter set (VPS), a sequence parameter set (SPS), a
picture parameter set (PPS), a slice header, a tile group header, a
coding tree unit (CTU), a tile, a coding unit (CU), a prediction
unit (PU) or a CTU row. In some embodiments, the spatial
neighboring block is selected based on a height or a width of the
current block.
[0412] FIG. 19 is a flowchart representation of a method 1900 for
video processing in accordance with the present disclosure. The
method 1900 includes, at operation 1910, maintaining, for a
conversion between a current block of a video and a bitstream
representation of the video, a table of motion candidates based on
past conversions of the video and the bitstream representation. The
method 1900 includes, at operation 1920, deriving a temporal motion
vector prediction candidate based on the table of motion
candidates. The method 1900 also includes, at operation 1930,
performing the conversion based on the temporal motion vector
prediction candidate.
[0413] In some embodiments, the temporal motion vector prediction
candidate includes an Alternative Temporal Motion Vector Prediction
(ATMVP) candidate. In some embodiments, the temporal motion vector
prediction candidate is scaled prior to the conversion. In some
embodiments, the method includes updating the table of motion
candidates based on the temporal motion vector prediction
candidate. In some embodiments, the method includes performing a
subsequent conversion of the video and the bitstream representation
using the updated table of motion candidates. In some embodiments,
deriving the temporal motion vector prediction candidate further
comprises deriving the temporal motion vector prediction candidate
based on spatial neighboring blocks of a second video block.
[0414] FIG. 20 is a flowchart representation of a method 2000 for
video processing in accordance with the present disclosure. The
method 2000 includes, at operation 2010, determining, for a
conversion between a current block of a video and a bitstream
representation of the video, one or more temporal motion vector
prediction candidates for the current block. The method 2000 also
includes, at operation 2020, performing the conversion based on the
one or more temporal motion vector prediction candidates. The one
or more temporal motion vector prediction candidates can be
determined by identifying a first temporal adjacent block of the
current block based on an initial motion vector and examining
additional temporal adjacent blocks to obtain the one or more
temporal motion vector prediction candidates. The first temporal
adjacent block includes invalid motion information.
[0415] In some embodiments, the one or more temporal motion vector
prediction candidates include an Alternative Temporal Motion Vector
Prediction (ATMVP) candidate. In some embodiments, the first
temporal adjacent block is intra-coded. In some embodiments, the
additional temporal adjacent blocks comprise a second temporal
adjacent block that includes a starting point positioned adjacent
to a bottom-right corner of a starting point of the first adjacent
temporal block.
[0416] In some embodiments, the additional temporal adjacent blocks
are identified based on a sequential multi-step search of blocks
associated with the first temporal adjacent block. In some
embodiments, the sequential multi-step search comprises examining
spatial adjacent blocks of the first temporal adjacent block in an
order of left, above, right, and bottom. In some embodiments, the
sequential multi-step search further comprises examining spatial
non-adjacent blocks that are one step away from the first temporal
adjacent block in an order of left, above, right, and bottom. In
some embodiments, the additional temporal adjacent blocks are
positioned within a region associated with the first temporal
adjacent block. In some embodiments, the region includes a Coding
Tree Unit (CTU) associated with the first temporal adjacent block.
In some embodiments, the region includes a single row of the CTU
associated with the first temporal adjacent block.
[0417] FIG. 21 is a flowchart representation of a method 2100 for
video processing in accordance with the present disclosure. The
method 2100 includes, at operation 2110, determining, for a
conversion between a current block of a video and a bitstream
representation of the video, one or more temporal motion vector
prediction candidates for the current block. The one or more
temporal motion vector prediction candidates comprise a default
temporal motion vector prediction candidate. The method 2100
includes, at operation 2120, performing the conversion based on the
one or more temporal motion vector prediction candidates.
[0418] In some embodiments, the default temporal motion vector
prediction candidate is determined after identifying a first
temporal adjacent block of the current block based on an initial
motion vector. The first temporal adjacent block includes invalid
motion information. In some embodiments, the default temporal
motion vector is inherited from a spatial neighboring block of the
current block. In some embodiments, the default temporal motion
vector is scaled. In some embodiments, the default temporal motion
vector prediction candidate is derived based a starting point
motion vector (or an initial motion vector). The starting point
motion vector (or the initial motion vector) is either associated
with a spatial adjacent block of the current block or a zero motion
vector. In some embodiments, the starting point motion vector is
completely determined based on motion information associated with
one or more spatial adjacent blocks of the current block. In some
embodiments, the starting point motion vector is associated with a
block whose corresponding reference picture is collocated with a
reference picture of the current block. In some embodiments, the
block includes a spatial adjacent block of the current block, a
spatial non-adjacent block of the current block, or a temporal
adjacent block of the current block.
[0419] In some embodiments, in case a first spatial adjacent block
selected from spatial adjacent blocks of the current block
according to a sequential order is inter-coded and a first motion
vector of the first spatial adjacent block is directed to a
collocated picture of the current block, the starting motion vector
is determined to be the first motion vector, and wherein the
starting motion vector is determined to be a zero motion vector
otherwise. In some embodiments, the starting point motion vector is
determined to be motion information of a represented block in case
motion information of the represented block that is identified by
the starting point motion vector and a center position of the block
is unavailable. The represented block is a block that covers a
point corresponding to the starting point motion vector in a
collocated picture. In some embodiments, the starting point motion
vector is used to derive sub-block motion.
[0420] In some embodiments, the default temporal motion vector is a
uni-prediction candidate derived by scaling a motion vector to a
reference picture index within a reference picture list X, X being
0 or 1. In some embodiments, the reference picture index is 0. In
some embodiments, the reference picture index is a smallest
reference picture index that corresponds to a short-term reference
picture. In some embodiments, X is determined based on a slice or a
picture associated with the current block.
[0421] In some embodiments, the default temporal motion vector is a
bi-prediction candidate derived by scaling a motion vector to a
reference picture index within a reference picture list. In some
embodiments, for each reference picture in the reference picture
list, the reference picture index is same as a target reference
picture index of a temporal motion vector prediction candidate. In
some embodiments, whether the default temporal motion vector is
uni-prediction candidate or a bi-prediction candidate is determined
based on a picture type of a slice_type associated with the current
block. In some embodiments, whether the default temporal motion
vector is uni-prediction candidate or a bi-prediction candidate is
determined based on a size of the current block.
[0422] FIG. 22 is a flowchart representation of a method 2200 for
video processing in accordance with the present disclosure. The
method 2200 includes, at operation 2210, determining, for a
conversion between a current block of a video and a bitstream
representation of the video, a sub-block level merge candidate list
that includes at least one sub-block coding type. The method 2200
includes, at operation 2220, performing the conversion based on the
sub-block level merge candidate list.
[0423] In some embodiments, the at least one sub-block coding type
comprises a sub-block based temporal motion vector prediction
coding type. In some embodiments, at least one sub-block coding
type comprises an affine motion prediction coding type. In some
embodiments, each of at least one sub-block coding type is assigned
with a range of merge indices. In some embodiments, the merge index
within a first range corresponds to the sub-block based temporal
motion vector prediction coding type. In some embodiments, the
first range includes a single value of 0. In some embodiments, the
merge index within a second range corresponds to the affine motion
prediction coding type. In some embodiments, the second range
excludes a value of 0.
[0424] In some embodiments, a motion candidate of the sub-block
based temporal motion vector prediction coding type is always
available in the sub-block level merge candidate list. In some
embodiments, temporal information is only allowed to derive motion
candidates of the sub-block based temporal motion vector prediction
coding type. In some embodiments, the range of merge indices for a
coding type is signaled in a video parameter set (VPS), a sequence
parameter set (SPS), a picture parameter set (PPS), a slice header,
a tile group header, a coding tree unit (CTU), a tile, a coding
unit (CU), a prediction unit (PU) or a CTU row. In some
embodiments, the range of merge indices for a coding type is based
on a width or a height of the current block.
[0425] In some embodiments, motion candidates of the at least one
sub-block coding type are added to the sub-block level merge
candidate list based on an adaptive ordering. In some embodiments,
the adaptive ordering indicates that a motion candidate of the
sub-block based temporal motion vector prediction coding type is
added to the sub-block level merge candidate list prior to a motion
candidate of the affine motion prediction coding type. In some
embodiments, the adaptive ordering indicates a motion candidate of
the sub-block based temporal motion vector prediction coding type
and a motion candidate of the affine motion prediction type are
added to the sub-block level merge candidate list in an interleaved
manner. In some embodiments, the adaptive ordering is based on
coded information of the current block or neighboring blocks of the
current block. In some embodiments, in case a majority of the
neighboring blocks of the current block is affine coded, the
adaptive ordering indicates that a motion candidate of the affine
motion prediction coding type is added to the sub-block level merge
candidate list prior to motion candidates of other types. In some
embodiments, the adaptive ordering is based on a ratio of affine
motion candidates to non-affine motion candidates in the sub-block
level merge candidate list. In some embodiments, in case the ratio
is greater than a threshold, the adaptive ordering indicates that a
motion candidate of the affine motion coding type is added to the
sub-block level merge candidate list prior to motion candidates of
other types. In some embodiments, the adaptive ordering is
applicable to first K affine motion candidates in the sub-block
level merge candidate list, K being a positive integer. In some
embodiments, the adaptive ordering is signaled by in a video
parameter set (VPS), a sequence parameter set (SPS), a picture
parameter set (PPS), a slice header, a tile group header, a coding
tree unit (CTU), a tile, a coding unit (CU), a prediction unit (PU)
or a CTU row.
[0426] FIG. 23 is a flowchart representation of a method 2300 for
video processing in accordance with the present disclosure. The
method 2300 includes, at operation 2310, determining, for a
conversion between a current block of a video and a bitstream
representation of the video, a sub-block level coding technique
based on an indication that is signaled in a picture header, a
picture parameter set (PPS), a slice header, or a tile group
header. The method 2300 includes, at operation 2320, performing the
conversion based on the sub-block level coding technique.
[0427] In some embodiments, the sub-block level coding technique
comprises a sub-block based temporal motion vector prediction
coding technique. In some embodiments, the sub-block level coding
technique comprises an affine coding technique. In some
embodiments, the indication indicates that the sub-block coding
technique is disabled.
[0428] In some embodiments, the sub-block level motion derivation
process and the block level motion derivation process can be
unified. FIG. 24A is a flowchart representation of a method 2400
for video processing in accordance with the present disclosure. The
method 2400 includes, at operation 2410, determining, for a
conversion between a current block of a video and a bitstream
representation of the video, a sub-block level temporal motion
candidate using a derivation process applicable to a block level
temporal motion vector prediction candidate conversion between the
current block and the bitstream representation. The method 2400
also includes, at operation 2420, performing the conversion based
on the sub-block level temporal motion candidate. FIG. 24B is a
flowchart representation of a method 2450 for video processing in
accordance with the present disclosure. The method 2450 includes,
at operation 2460, determining, for a conversion between a current
block of a video and a bitstream representation of the video, a
block level temporal motion vector prediction candidate using a
derivation process applicable to a sub-block level temporal motion
candidate conversion between the current block and the bitstream
representation. The method 2450 also includes, at operation 2360,
performing the conversion based on the block level temporal motion
vector prediction candidate.
[0429] In some embodiments, the conversion in the above methods
comprises encoding the current block to generate the bitstream
representation. In some embodiments, the conversion in the above
methods comprises decoding the bitstream representation to generate
the current block.
[0430] Some embodiments of the disclosed technology include making
a decision or determination to enable a video processing tool or
mode. In an example, when the video processing tool or mode is
enabled, the encoder will use or implement the tool or mode in the
processing of a block of video, but may not necessarily modify the
resulting bitstream based on the usage of the tool or mode. That
is, a conversion from the block of video to the bitstream
representation of the video will use the video processing tool or
mode when it is enabled based on the decision or determination. In
another example, when the video processing tool or mode is enabled,
the decoder will process the bitstream with the knowledge that the
bitstream has been modified based on the video processing tool or
mode. That is, a conversion from the bitstream representation of
the video to the block of video will be performed using the video
processing tool or mode that was enabled based on the decision or
determination.
[0431] Some embodiments of the disclosed technology include making
a decision or determination to disable a video processing tool or
mode. In an example, when the video processing tool or mode is
disabled, the encoder will not use the tool or mode in the
conversion of the block of video to the bitstream representation of
the video. In another example, when the video processing tool or
mode is disabled, the decoder will process the bitstream with the
knowledge that the bitstream has not been modified using the video
processing tool or mode that was enabled based on the decision or
determination.
[0432] From the foregoing, it will be appreciated that specific
embodiments of the presently disclosed technology have been
described herein for purposes of illustration, but that various
modifications may be made without deviating from the scope of the
invention. Accordingly, the presently disclosed technology is not
limited except as by the appended claims.
[0433] Implementations of the subject matter and the functional
operations described in this patent document can be implemented in
various systems, digital electronic circuitry, or in computer
software, firmware, or hardware, including the structures disclosed
in this specification and their structural equivalents, or in
combinations of one or more of them. Implementations of the subject
matter described in this specification can be implemented as one or
more computer program products, i.e., one or more modules of
computer program instructions encoded on a tangible and
non-transitory computer readable medium for execution by, or to
control the operation of, data processing apparatus. The computer
readable medium can be a machine-readable storage device, a
machine-readable storage substrate, a memory device, a composition
of matter effecting a machine-readable propagated signal, or a
combination of one or more of them. The term "data processing unit"
or "data processing apparatus" encompasses all apparatus, devices,
and machines for processing data, including by way of example a
programmable processor, a computer, or multiple processors or
computers. The apparatus can include, in addition to hardware, code
that creates an execution environment for the computer program in
question, e.g., code that constitutes processor firmware, a
protocol stack, a database management system, an operating system,
or a combination of one or more of them.
[0434] A computer program (also known as a program, software,
software application, script, or code) can be written in any form
of programming language, including compiled or interpreted
languages, and it can be deployed in any form, including as a
stand-alone program or as a module, component, subroutine, or other
unit suitable for use in a computing environment. A computer
program does not necessarily correspond to a file in a file system.
A program can be stored in a portion of a file that holds other
programs or data (e.g., one or more scripts stored in a markup
language document), in a single file dedicated to the program in
question, or in multiple coordinated files (e.g., files that store
one or more modules, sub programs, or portions of code). A computer
program can be deployed to be executed on one computer or on
multiple computers that are located at one site or distributed
across multiple sites and interconnected by a communication
network.
[0435] The processes and logic flows described in this
specification can be performed by one or more programmable
processors executing one or more computer programs to perform
functions by operating on input data and generating output. The
processes and logic flows can also be performed by, and apparatus
can also be implemented as, special purpose logic circuitry, e.g.,
an FPGA (field programmable gate array) or an ASIC (application
specific integrated circuit).
[0436] Processors suitable for the execution of a computer program
include, by way of example, both general and special purpose
microprocessors, and any one or more processors of any kind of
digital computer. Generally, a processor will receive instructions
and data from a read only memory or a random access memory or both.
The essential elements of a computer are a processor for performing
instructions and one or more memory devices for storing
instructions and data. Generally, a computer will also include, or
be operatively coupled to receive data from or transfer data to, or
both, one or more mass storage devices for storing data, e.g.,
magnetic, magneto optical disks, or optical disks. However, a
computer need not have such devices. Computer readable media
suitable for storing computer program instructions and data include
all forms of nonvolatile memory, media and memory devices,
including by way of example semiconductor memory devices, e.g.,
EPROM, EEPROM, and flash memory devices. The processor and the
memory can be supplemented by, or incorporated in, special purpose
logic circuitry.
[0437] It is intended that the specification, together with the
drawings, be considered example only, where example means an
example. Additionally, the use of "or" is intended to include
"and/or", unless the context clearly indicates otherwise.
[0438] While this patent document contains many specifics, these
should not be construed as limitations on the scope of any
invention or of what may be claimed, but rather as descriptions of
features that may be specific to particular embodiments of
particular inventions. Certain features that are described in this
patent document in the context of separate embodiments can also be
implemented in combination in a single embodiment. Conversely,
various features that are described in the context of a single
embodiment can also be implemented in multiple embodiments
separately or in any suitable subcombination. Moreover, although
features may be described above as acting in certain combinations
and even initially claimed as such, one or more features from a
claimed combination can in some cases be excised from the
combination, and the claimed combination may be directed to a
subcombination or variation of a subcombination.
[0439] Similarly, while operations are depicted in the drawings in
a particular order, this should not be understood as requiring that
such operations be performed in the particular order shown or in
sequential order, or that all illustrated operations be performed,
to achieve desirable results. Moreover, the separation of various
system components in the embodiments described in this patent
document should not be understood as requiring such separation in
all embodiments.
[0440] Only a few implementations and examples are described and
other implementations, enhancements and variations can be made
based on what is described and illustrated in this patent
document.
* * * * *