U.S. patent application number 14/348626 was filed with the patent office on 2014-08-28 for transmitting apparatus and method thereof for video processing.
The applicant listed for this patent is Telefonaktiebolaget L M Ericsson (pulb). Invention is credited to Bo Burman, Jonatan Samuelsson, Rickard Sjoberg, Magnus Westerlund.
Application Number | 20140241439 14/348626 |
Document ID | / |
Family ID | 48793513 |
Filed Date | 2014-08-28 |
United States Patent
Application |
20140241439 |
Kind Code |
A1 |
Samuelsson; Jonatan ; et
al. |
August 28, 2014 |
Transmitting Apparatus and Method Thereof for Video Processing
Abstract
The present invention relates to a method and a transmitting
apparatus for encoding a bitstream representing a sequence of
pictures of a video stream comprising a processor and memory, said
memory containing instructions executable by said processor whereby
said transmitting apparatus is operative to: send a syntax element,
wherein a value of the syntax element is indicative of restrictions
that are enforced on the bitstream in a way that guarantees a
certain level of parallelism for decoding the bitstream.
Inventors: |
Samuelsson; Jonatan;
(Stockholm, SE) ; Burman; Bo; (Upplands Vasby,
SE) ; Sjoberg; Rickard; (Stockholm, SE) ;
Westerlund; Magnus; (Kista, SE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Telefonaktiebolaget L M Ericsson (pulb) |
Stockholm |
|
SE |
|
|
Family ID: |
48793513 |
Appl. No.: |
14/348626 |
Filed: |
June 27, 2013 |
PCT Filed: |
June 27, 2013 |
PCT NO: |
PCT/SE2013/050803 |
371 Date: |
March 31, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61666056 |
Jun 29, 2012 |
|
|
|
Current U.S.
Class: |
375/240.26 |
Current CPC
Class: |
H04N 19/196 20141101;
H04N 19/436 20141101; H04N 19/70 20141101; H04N 19/46 20141101;
H04N 19/156 20141101 |
Class at
Publication: |
375/240.26 |
International
Class: |
H04N 19/70 20060101
H04N019/70; H04N 19/46 20060101 H04N019/46 |
Claims
1-33. (canceled)
34. A method for encoding a bitstream representing a sequence of
pictures of a video stream comprising sending a syntax element,
wherein a value of the syntax element is indicative of restrictions
that are enforced on the bitstream in a way that guarantees a
certain level of parallelism for decoding the bitstream.
35. The method according to claim 34, wherein the value of the
syntax element is equal to the level of parallelism.
36. The method according to claims 35, wherein the value of the
syntax element is used to impose a restriction of the number of
bytes or bits per slice, tile or wavefront.
37. The method according to claim 34, wherein the value of the
syntax element is used to impose one restriction of at least (a)
and (b) on the bitstream, and wherein the restriction (a) is that a
maximum number of luma samples per slice is restricted as a
function of the picture size and as a function of the value of the
syntax element, and the restriction (b) is that a maximum number of
luma samples per tile is restricted as a function of the picture
size and as a function of the value of the syntax element.
38. The method according to claim 37, wherein the following is
performed in order to fulfill the restriction (a): dividing
treeblocks of the pictures into slices such that each division
consist of consecutive treeblocks in raster scan order and such
that each division does not contain more luma samples than what is
allowed by the restriction (a).
39. The method according to claim 37, wherein the following is
performed in order to fulfill the restriction (b): dividing the
pictures into tiles with horizontal and vertical lines at sample
positions that are integer multiples of treeblock size such that
each tile does not contain more luma samples than what is allowed
by the restriction (b).
40. The method according to claim 37, wherein the value of the
syntax element is signaled as a highest possible value for which
every slice in the sequence fulfills the restriction (a).
41. The method according to claim 37, wherein the value of the
syntax element is signaled as a highest possible value for which
every tile in the sequence fulfills the restriction (b).
42. The method according to claim 34, wherein the value of the
syntax element is used to impose one restriction of (a), (b) and
(c) on the bitstream, and wherein the restriction (a) is that a
maximum number of luma samples per slice is restricted as a
function of the picture size and as a function of the value of the
syntax element, the restriction (b) is that a maximum number of
luma samples per tile is restricted as a function of the picture
size and as a function of the value of the syntax element, and the
restriction (c) is that wavefronts are used and treeblock size and
picture height and picture width are restricted jointly as a
function of the value of the syntax element and as a function of
the picture size.
43. The method according to claim 42, wherein for pictures to be
encoded with wavefronts, and for the entire sequence, the following
is performed in order to fulfill the restriction (c): encoding the
sequence of pictures of the video stream with wavefronts and once
for the entire sequence, given the resolution of the pictures in
the video stream, selecting treeblock size to fulfill requirement
of the restriction (c).
44. The method according to claim 42, wherein the value of the
value of the syntax element is signaled as the highest possible
value for which the restriction (c) is fulfilled.
45. The method according to claim 34, wherein the level of the
parallelism, referred to as parallelism, is determined as a
function of the value of the syntax element.
46. The method according to claim 45, wherein the level of the
parallelism, referred to as parallelism, is equal to ((value of
syntax element)/4)+1.
47. The method according to claim 45, wherein when the value of the
syntax element indicates restrictions that are enforced on the
bitstream such that at least two parallel processes can be used for
decoding, one of the following conditions must be fulfilled:
condition (a): if neither tiles nor wavefronts are used within the
sequence to be encoded, the maximum number of luma samples in a
slice should be less than or equal to floor(picture width*picture
height/parallelism); condition (b): if tiles but not wavefronts are
used within the sequence to be encoded, the maximum number of luma
samples in a tile should be less than or equal to floor(picture
width*picture height/parallelism); and condition (c): if wavefronts
but not tiles are used within the sequence to be encoded, the
syntax elements indicating picture width, picture height the
variable indicating treeblock size are restricted such that:
(2*(picture height/treeblock size)+(picture width/treeblock size))*
treeblock size*treeblock size.ltoreq.floor(maximum frame
size/parallelism, wherein floor (x) implies the largest integer
less than or equal to x.
48. The method according to claim 34, wherein the syntax element is
sent in a sequence parameter set.
49. A transmitting apparatus for encoding a bitstream representing
a sequence of pictures of a video stream comprising a processor and
memory, said memory containing instructions executable by said
processor, whereby said transmitting apparatus is operative to send
a syntax element, wherein a value of the syntax element is
indicative of restrictions that are enforced on the bitstream in a
way that guarantees a certain level of parallelism for decoding the
bitstream.
50. The transmitting apparatus according to claim 49, wherein the
transmitting apparatus is operative to set the value of the syntax
element equal to the level of parallelism.
51. The transmitting apparatus according to claim 50, wherein the
transmitting apparatus is operative to use the value of the syntax
element to impose a restriction of the number of bytes or bits per
slice, tile or wavefront.
52. The transmitting apparatus according to any of claim 49,
wherein the transmitting apparatus is operative to use the value of
the syntax element to impose one restriction of at least (a) and
(b) on the bitstream, wherein the restriction (a) is that a maximum
number of luma samples per slice is restricted as a function of the
picture size and as a function of the value of the syntax element,
and the restriction (b) is that a maximum number of luma samples
per tile is restricted as a function of the picture size and as a
function of the value of the syntax element.
53. The transmitting apparatus according to claim 52, wherein the
transmitting apparatus is operative to perform the following in
order to fulfill the restriction (a): divide treeblocks of the
pictures into slices such that each division consist of consecutive
treeblocks in raster scan order and such that each division does
not contain more luma samples than what is allowed by the
restriction (a).
54. The transmitting apparatus according to claim 52, wherein the
transmitting apparatus is operative to perform the following in
order to fulfill the restriction (b): divide the pictures into
tiles with horizontal and vertical lines at sample positions that
are integer multiples of treeblock size such that each tile does
not contain more luma samples than what is allowed by the
restriction (b).
55. The transmitting apparatus according to claim 52, wherein the
transmitting apparatus is operative to signal the value of the
syntax element as a highest possible value for which every slice in
the sequence fulfills the restriction (a).
56. The transmitting apparatus according to claim 52, wherein the
transmitting apparatus is operative to signal the value of the
syntax element as a highest possible value for which every tile in
the sequence fulfills the restriction (b).
57. The transmitting apparatus according to claim 49, wherein the
transmitting apparatus is operative to use the value of the syntax
element to impose one restriction of (a), (b) and (c) on the
bitstream, and wherein the restriction (a) is that a maximum number
of luma samples per slice is restricted as a function of the
picture size and as a function of the value of the syntax element,
the restriction (b) is that a maximum number of luma samples per
tile is restricted as a function of the picture size and as a
function of the value of the syntax element, and the restriction
(c) is that wavefronts are used and treeblock size and picture
height and picture width are restricted jointly as a function of
the value of the syntax element and as a function of the picture
size.
58. The transmitting apparatus according to claim 57, wherein the
transmitting apparatus is operative to, for pictures to be encoded
with wavefronts, and for the entire sequence, perform the following
in order to fulfill the restriction (c): encode the sequence of
pictures of the video stream with wavefronts and once for the
entire sequence, given the resolution of the pictures in the video
stream, select treeblock size to fulfill the restriction (c).
59. The transmitting apparatus according to claim 57, wherein the
transmitting apparatus is operative to signal the value of the
value of the syntax element as the highest possible value for which
the restriction (c) is fulfilled.
60. The transmitting apparatus according to claim 57, wherein the
transmitting apparatus is operative to encode the sequence of the
pictures of the video stream using a tile configuration that is
suitable for the encoder, and to signal the value of the syntax
element as a highest possible value for which every tile in the
sequence fulfills the restriction (b).
61. The transmitting apparatus according to claim 57, wherein the
transmitting apparatus is operative to indicate the value of the
syntax element according to the restriction (c), to encode the
pictures of the video stream having a picture height equal to
pic_height_in_luma_samples and a picture width equal to
pic_width_in_luma_samples and a treeblock size equal to CtbSize
using wavefront parallel processing, WPP, and to signal the value
of the syntax element as a highest possible value for which the
restriction (c) is fulfilled.
62. The transmitting apparatus according to claim 49, wherein the
transmitting apparatus is operative to determine the level of the
parallelism, referred to as parallelism, as a function of the value
of the syntax element.
63. The transmitting apparatus according to claim 62, wherein the
level of the parallelism, referred to as parallelism, is equal to
((value of syntax element)/4)+1.
64. The transmitting apparatus according to claim 62, wherein the
transmitting apparatus is configured to, when the value of the
syntax element indicates restrictions that are enforced on the
bitstream such that at least two parallel processes can be used for
decoding, ensure that one of the following conditions must be
fulfilled: condition (a): if neither tiles nor wavefronts are used
within the sequence to be encoded, the maximum number of luma
samples in a slice should be less than or equal to floor(picture
width*picture height/parallelism); condition (b): if tiles but not
wavefronts are used within the sequence to be encoded, the maximum
number of luma samples in a tile should be less than or equal to
floor( picture width*picture height/parallelism); and condition
(c): if wavefronts but not tiles are used within the sequence to be
encoded, the syntax elements indicating picture width, picture
height the variable indicating treeblock size are restricted such
that: (2*(picture height/treeblock size)+(picture width/treeblock
size))* treeblock size*treeblock size.ltoreq.floor(maximum frame
size/parallelism, wherein floor (x) implies the largest integer
less than or equal to x.
65. The transmitting apparatus according to claim 49, wherein the
transmitting apparatus is configured to send the syntax element in
a sequence parameter set.
66. The transmitting apparatus according to claim 49, wherein the
transmitting apparatus is a user device.
Description
TECHNICAL FIELD
[0001] The embodiments relate to a method and a transmitting
apparatus for improving coding performance when parallel
encoding/decoding is possible.
BACKGROUND
[0002] High Efficiency Video Coding (HEVC) is a video coding
standard being developed in Joint Collaborative Team--Video Coding
(JCT-VC). JCT-VC is a collaborative project between Moving Picture
Experts Group (MPEG) and International Telecommunication
Union--Telecommunication Standardization Sector (ITU-T). Currently,
an HEVC Model (HM) is defined that includes a number of tools and
is considerably more efficient than H.264/Advanced Video Coding
(AVC).
[0003] HEVC is a block based hybrid video coded that uses both
inter prediction (prediction from previous coded pictures) and
intra prediction (prediction from previous coded pixels in the same
picture). Each picture is divided into quadratic treeblocks
(corresponding to macroblocks in H.264/AVC) that can be of size
16.times.16, 32.times.32 or 64.times.64 pixels. A variable CtbSize
is used to denote the size of treeblocks expressed as number of
pixels of the treeblocks in one dimension i.e. 16, 32 or 64.
[0004] Regular slices are similar as in H.264/AVC. Each regular
slice is encapsulated in its own Network Abstraction Layer (NAL)
unit, and in-picture prediction (intra sample prediction, motion
information prediction, coding mode prediction) and entropy coding
dependency across slice boundaries are disabled. Thus a regular
slice can be reconstructed independently from other regular slices
within the same picture. Since the treeblock, which is a basic unit
in HEVC, can be of a relatively big size e.g., 64.times.64, a
concept of "fine granularity slices" is included in HEVC to allow
for Maximum Transmission Unit MTU size matching through slice
boundaries within a treeblock, as a special form of regular slices.
The slice granularity is signaled in a picture parameter set,
whereas the address of a fine granularity slice is still signaled
in a slice header.
[0005] The regular slice is the only tool that can be used for
parallelization in H.264/AVC. Parallelization implies that parts of
a single picture can be encoded and decoded in parallel as
illustrated in FIG. 1 where threaded decoding can be used using
slices. Regular slices based parallelization does not require much
inter-processor or inter-core communication. However, for the same
reason, regular slices can require some coding overhead due to the
bit cost of the slice header and due to the lack of prediction
across the slice border. Further, regular slices (in contrast to
some of the other tools mentioned below) also serve as the key
mechanism for bitstream partitioning to match MTU size
requirements, due to the in-picture independence of regular slices
and that each regular slice is encapsulated in its own NAL unit. In
many cases, the goal of parallelization and the goal of MTU size
matching place contradicting demands to the slice layout in a
picture. The realization of this situation led to the development
of the parallelization tools mentioned below.
[0006] In wavefront parallel processing (WPP), the picture is
partitioned into single rows of treeblocks. Entropy decoding and
prediction are allowed to use data from treeblocks in other
partitions. Parallel processing is possible through parallel
decoding of rows of treeblocks, where the start of the decoding of
a row is delayed by two treeblocks, so to ensure that data related
to a treeblock above and to the right of the subject treeblock is
available before the subject treeblock is being decoded. Using this
staggered start (which appears like a wavefront when represented
graphically as illustrated in FIG. 2), parallelization is possible
with up to as many processors/cores as the picture contains
treeblock rows. Due to the permissiveness of in-picture prediction
between neighboring treeblock rows within a picture, the required
inter-processor/inter-core communication to enable in-picture
prediction can be substantial. The WPP partitioning does not result
in the production of additional NAL units compared to when it is
not applied, thus WPP cannot be used for MTU size matching. A
wavefront segment contains exactly one line of treeblocks.
[0007] Tiles define horizontal and vertical boundaries that
partition a picture into tile columns and rows. That implies that
the tiles in HEVC divide a picture into areas with a defined width
and height as illustrated in FIG. 3. Each area of the tiles
consists of an integer number of treeblocks that are processed in
raster scan order. The tiles themselves are processed in raster
scan order throughout the picture. The exact tile configuration or
tile information (number of tiles, width and height of each tile
etc) can be signaled in a sequence parameter set (SPS) and in a
picture parameter set (PPS). The tile information contains the
width, height and position of each tile in a picture. This means
that if the coordinates of a block is known, it is also known what
tile the block belongs to.
[0008] For simplicity, restrictions on the application of the
different picture partitioning schemes are specified in HEVC. Tiles
and WPP may not be applied at the same time. Furthermore, for each
slice and tile, either or both of the following conditions must be
fulfilled: 1) all coded treeblocks in a slice belong to the same
tile; 2) all coded treeblocks in a tile belong to the same
slice.
[0009] The Sequence Parameter Set (SPS) holds information that is
valid for an entire coded video sequence. Specifically it holds the
syntax elements profile_idc and level_idc that are used to indicate
which profile and level a bitstream conforms to. Profiles and
levels specify restrictions on bitstreams and hence limits on the
capabilities needed to decode the bitstreams. Profiles and levels
may also be used to indicate interoperability points between
individual decoder implementations. The level enforces restrictions
on the bitstream for example on the Picture size (denoted MaxLumaFS
expressed in luma samples) and sample rate (denoted MaxLumaPR
expressed in luma samples per second) as well as max bit rate
(denoted MaxBR expressed in bits per second) and max coded picture
buffer size (denoted Max CPB size expressed in bits).
[0010] The Picture Parameter Set (PPS) holds information that is
valid for some (or all) pictures in a coded video sequence. The PPS
comprises syntax elements that control the usage of wavefronts and
tiles and it is required to have same value in all PPSs that are
active in the same coded video sequence.
[0011] Moreover, both HEVC and H.264 define a video usability
information (VUI) syntax structure, that can be present in a
sequence parameter set and contains parameters that do not affect
the decoding process, i.e. do not affect the pixel values.
Supplemental Enhancement Information (SEI) is another structure
that can be present in any access unit and that contains
information that does not affect the decoding process.
[0012] Hence, as mentioned above, compared to H.264/AVC, HEVC
provides better possibilities for parallelization. Parallelization
implies that parts of a single picture can be encoded and decoded
in parallel. Specifically tiles and WPP are tools developed for
parallelization purposes. Both were originally designed for encoder
parallelization but they may also be used for decoder
parallelization.
[0013] When tiles are being used for encoder parallelism, the
encoder first chooses a tile partitioning. Since tile boundaries
break all predictions between the tiles, the encoder can assign the
encoding of multiple tiles to multiple threads. As soon as there
are at least two tiles, multiple thread encoding can be done.
[0014] Accordingly, in this context, the fact that a number of
threads can be used, implies that the actual workload of the
encoding/decoding process can be divided into separate "processes"
that are performed independently of each other, i.e. they can be
performed in parallel in separate threads.
[0015] HEVC defines two types of entry points for parallel
decoding. Entry points can be used by a decoder to find the
position in the bitstream where the bits for a tile or substream
starts. The first type is entry points offsets. Those are listed in
the slice header and indicates starting points of one or more tiles
that are contained in the slice. The second type is entry point
markers which separates tiles in the bitstream. An entry point
marker is a specific codeword (start code) which cannot occur
anywhere else in the bitstream.
[0016] Thus for decoder parallelism to work, there needs to be
entry points in the bitstream. For parallel encoding, there does
not need to be entry points, the encoder can just stitch the
bitstream together after the encoding of the tiles/substreams are
complete. However, the decoder needs to know where each tile starts
in the bitstream in order to decode in parallel. If an encoder only
wants to encode in parallel but does not want to enable parallel
decoding, it could omit the entry points, but if it also wants to
enable decoding in parallel it must insert entry points.
SUMMARY
[0017] The object of the embodiments of the present invention is to
improve the performance when parallel encoding/decoding is
available. That is achieved by sending a syntax element, wherein a
value of the syntax element is indicative of restrictions that are
enforced on the bitstream in a way that guarantees a certain level
of parallelism for decoding the bitstream. A decoder, receiving an
encoded bitstream that is encoded according to the indication from
the syntax element, can use this indication when deciding how it
could decode the encoded bitstream and if the bistream can be
encoded.
[0018] According to a first aspect of the embodiments a method for
encoding a bitstream representing a sequence of pictures of a video
stream is provided. In the method, a syntax element is sent,
wherein a value of the syntax element is indicative of restrictions
that are enforced on the bitstream in a way that guarantees a
certain level of parallelism for decoding the bitstream.
[0019] According to a second aspect a transmitting apparatus for
encoding a bitstream representing a sequence of pictures of a video
stream is provided. The transmitting apparatus comprises a
processor and memory. Said memory contains instructions executable
by said processor whereby said transmitting apparatus is operative
to send a syntax element wherein a value of the syntax element is
indicative of restrictions that are enforced on the bitstream in a
way that guarantees a certain level of parallelism for decoding the
bitstream.
[0020] An advantage with at least some embodiments, is that they
provide means of indicating that a bitstream can be decoded in
parallel. This enables a decoder, that is capable of decoding in
parallel, to find out before starting decoding whether it will be
possible for the decoder to decode the stream or not. There may for
example be a decoder that can decode 720p on a single thread but
decode 1080p given that each picture is split into at least 4
independently decodeable regions (i.e. it can be decided in 4
parallel threads). By using the embodiments, the decoder will know
whether each picture is split or not, i.e. whether it can be
decoded in parallel or not.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] FIG. 1 illustrates an example of threaded decoding using
slices according to prior art.
[0022] FIG. 2 illustrates an example of threaded decoding using
wavefronts according to prior art.
[0023] FIG. 3 illustrates an example of threaded decoding using
tiles according to prior art.
[0024] FIG. 4 illustrates schematically a transmitting apparatus
configured to transmit the syntax element to a receiving apparatus
according to an embodiment of the present invention.
[0025] FIG. 5 exemplifies parallelism levels according to prior
art.
[0026] FIG. 5 illustrates flowcharts of the method according to the
present invention.
[0027] FIG. 6 illustrates schematically a transmitting apparatus
according to embodiments of the present invention.
[0028] FIG. 7 illustrates schematically an implementation of the
transmitting apparatus according to embodiments of the present
invention.
DETAILED DESCRIPTION
[0029] In this specification, the term processing unit is used to
refer to the unit that encodes or decodes a coded video sequence.
In practice that might for example correspond to a CPU, a GPU, a
DSP, an FPGA, or any type of specialized chip. It might also
correspond to a unit that contains multiple CPUs, CPUs etc or
combinations of those. The term core is used to refer to one part
of the processing unit that is capable of performing (parts of) the
encoding/decoding process in parallel with other cores of the same
processor. In practice that might for example correspond to a
logical core in a CPU, a physical core in a multi-core CPU, one CPU
in a multi-CPU architecture or a single chip in a processing board
with multiple chips.
[0030] The term thread is used to denote some processing steps that
can be performed in parallel with some other processing steps.
Multiple threads can thus be executed in parallel. When the number
of cores in a processor is greater than or equal to the number of
threads that are possible to execute, all those threads can be
executed in parallel. Otherwise, some threads will start after
others have finished or alternatively time-division multiplexing
can be applied. One particular thread cannot be executed on
multiple cores at the same time. When the decoding process at any
point is divided into different parts (steps or actions) that are
independent of each other and therefore can be performed in
parallel we say that each such part constitutes a thread.
[0031] It is said that a bitstream supports a certain level of
parallelism if it is created in such a way that it is possible to
decode certain parts of it in parallel. In this text the focus is
on parallelism within pictures but the embodiments are not limit to
that. It could be applied to any type of parallelism within video
sequences.
[0032] A decoding process, for a picture that consists of three
slices A, B and C, defines how to decode the picture in sequential
order: [0033] 1. Decode A. [0034] 2. Decode B. [0035] 3. Decode C.
[0036] 4. Perform deblocking inside and between treeblocks.
[0037] However, if a processor P has 4 cores (called i, ii, iii and
iv) the decoding can be performed as follows [0038] 1. Core i
decodes slice A, core ii decodes slice B, core iii decodes slice C
in parallel. [0039] 2. When all slices are decoded, deblocking
inside and between treeblocks is performed by any core (since all
four cores will be free).
[0040] This is possible due to that the slices are independently
decodable.
[0041] It might be the case that each core in processor P in the
example above is capable of processing a certain number of luma
samples. When the decoder that uses P for decoding reports its
capabilities e.g. in the form of a level, it would be forced to
report the capabilities on the single core decoding performance as
it would have to be prepared for bitstream that are not constructed
for parallel decoding e.g. only contains a single slice.
[0042] A decoder running that could decode a higher level given
that the bitstream supports parallel decoding can be forced to
restrict it's conformance claims to a lower level. That can be
avoided by conveying a parallel decoding property of the stream by
using a syntax element according to embodiments of the present
invention.
[0043] Therefore, a method for encoding a bitstream representing a
sequence of pictures of a video stream is provided. In the method,
a syntax element is sent, e.g. in the SPS, wherein a value of the
syntax element is indicative of restrictions that are enforced on
the bitstream in a way that guarantees a certain level of
parallelism for decoding the bitstream. That implies that the value
of the syntax element can be used for determining the level of
parallelism that the bitstream is encoded with. As illustrated in
FIG. 4, the syntax element 440 is sent in a SPS 430 from a
transmitting apparatus 400 to a receiving apparatus 450 via
respective in/out-put units 405,455. The transmitting apparatus 400
comprises an encoder 410 for encoding the bitstream with a level of
parallelism indicated by the syntax element 440. A syntax element
managing unit 420 determines the level of parallelism that should
be used. The level of parallelism can be determined based on
information 470 of the receiving apparatus' 450 decoder 460
capabilities relating to parallel decoding.
[0044] The syntax element is also referred to as parallelism_idc
which also could be denoted minSpatialSegmentation_idc. The syntax
element is valid for a sequence of pictures of a video stream which
can be a set of pictures or an entire video sequence.
[0045] According to one embodiment, the value of the syntax element
is set equal to the level of parallelism, wherein the level of
parallelism indicates the number of threads that can be used.
Referring to FIG. 5, where one picture is divided into four
independent parts of equally spatial size that can be decoded in
parallel, the level of parallelism is four and the other picture is
divided into two equally sized independent parts that can be
decoded in parallel, the level of parallelism is two.
[0046] The value of the syntax element is determined as illustrated
in the flowchart of FIG. 6, according to this embodiment. One way
of determining the value of the syntax element is to use
information from the decoder about the decoder capabilities
relating to parallel decoding.
[0047] Another way is that the encoder chooses the level of
parallelism purely for encoder purposes and provides the parallel
information to decoders since that may be useful for some
decoders.
[0048] Another way is to assume a certain decoder design for the
decoders that may decode the stream. If you know that 80% of all
smartphones have 4 cores, the encoder can always use 4 independent
parts.
[0049] When the value of the syntax element is determined 601, one
restriction referred to as e.g. a, b, and c is imposed 603 by
performing the steps as specified below to fulfill 602 the
requirements of one of the restrictions as specified below. Then
the syntax element is sent 604 to the decoder of the receiver and
the picture is encoded according to the value of the syntax
element.
[0050] According to one embodiment, the value of the syntax element
is used to impose one restriction of at least a and b on the
bitstream: [0051] a: A maximum number of luma samples per slice is
restricted as a function of the picture size and as a function of
the value of the syntax element. [0052] b: A maximum number of luma
samples per tile is restricted as a function of the picture size
and as a function of the value of the syntax element.
[0053] According to a further embodiment, the value of the syntax
element is used to impose one restriction of a, b and c on the
bitstream: [0054] a: A maximum number of luma samples per slice is
restricted as a function of the picture size and as a function of
the value of the syntax element. [0055] b: A maximum number of luma
samples per tile is restricted as a function of the picture size
and as a function of the value of the syntax element. [0056] c:
Wavefronts are used and treeblock size, picture height and picture
width are restricted jointly as a function of the value of the
syntax element and as a function of the picture size.
[0057] According of an embodiment, the requirements of restrictions
a,b or a,b,c are fulfilled 602 by the following steps as
illustrated in FIG. 7a.
[0058] Hence, an encoder may be configured to perform the following
steps in order to fulfill requirement a.
[0059] Multiple pictures constituting a video sequence are encoded
and the following is performed:
[0060] The treeblocks of the pictures are divided 602a into slices
(such that each division consist of consecutive treeblocks in
raster scan order) such that each division does not contain more
luma samples than what is allowed by restriction a.
[0061] The treeblocks are encoded in their respective slices (in
which the encoding of the different slices are performed
sequentially or in parallel).
[0062] An encoder may be configured to perform the following steps
in order to fulfill restriction b.
[0063] Multiple pictures constituting a video sequence are encoded
and the following is performed:
[0064] The pictures are divided 602b into tiles (with horizontal
and vertical lines at sample positions that are integer multiples
of CtbSize (CtbSize is equal to treeblock size)) such that each
tile does not contain more luma samples than what is allowed by
restriction b.
[0065] The treeblocks are encoded in their respective tiles (in
which the encoding of the different tiles are performed
sequentially or in parallel).
[0066] An encoder may be configured to perform the following steps
in order to fulfill restriction c.
[0067] Multiple pictures constituting a video sequence are encoded
602c with wavefronts (setting tiles_or_entropy_coding_sync_idc to 2
in each PPS) and once for the entire sequence the following is
performed:
[0068] Given the resolution (the height and the width in luma
samples) of the (pictures in the) video sequence, the CtbSize is
selected such that restriction c is fulfilled.
[0069] There are several different reasons why an encoder needs to
fulfill one (or more) of the requirements for a certain value of
the syntax element. The value of the syntax element is also
referred to as value X. One such reason could be that the encoder
is encoding a video sequence for a decoder that has indicated that
it needs a specific value X or for a decoder that needs the value X
to be in a specific set of values (such as higher than or equal to
a certain value).
[0070] If there are no specific requirement for the value X, an
encoder might want to set the value X to as high number as possible
in order to aid decoders and in order to produce a bitstream that
as many decoders as possible will be able to decode. As an example,
consider the case where an encoder splits each picture into 4 equal
parts which means that X=4.0. It is ok for the encoder to send
X=2.0 and to indicate that the pictures are split into at least 2
parts, but it could be advantageous for the decoder to send a value
X with the highest possible value, i.e. 4.0 in this example.
[0071] An encoder may be configured to perform the following steps
in order to indicate the value X according to restriction a.
[0072] 1. Multiple pictures constituting a video sequence are
encoded using one or more slice per picture.
[0073] 2. The value X is signaled as the highest possible value for
which every slice in the sequence fulfills restriction a.
[0074] An encoder may be configured to perform the following steps
in order to indicate the value X according to restriction b.
[0075] 1. Multiple pictures constituting a video sequence are
encoded using a tile configuration that is suitable for the
encoder.
[0076] 2. The value X is signaled as the highest possible value for
which every tile in the sequence fulfills restriction b.
[0077] An encoder may be configured to perform the following steps
in order to indicate the value of X according to restriction c.
[0078] 1. Multiple pictures constituting a video sequence with
height equal to pic_height_in_luma_samples and width equal to
pic_width_in_luma_samples and treeblock size equal to CtbSize are
encoded using WPP.
[0079] 2. The value X is signaled as the highest possible value for
which restriction c is fulfilled.
[0080] According to a further embodiment, the syntax element
denoted parallelism_idc indicates the restriction that are enforced
on the bitstream by that the value X also referred to as
parallelism is equal to (parallelism_idc/4)+1. Similar to the
embodiments described above, the level of parallelism indicates the
number of threads that can be used. Referring to FIG. 5, where this
is exemplified by a picture which is divided into four independent
parts that can be decoded in parallel, wherein the level of
parallelism is four and by a picture which is divided into two
independent parts that can be decoded in parallel, wherein the
level of parallelism is two.
[0081] Thus the value X (i.e. the value of the syntax element) can
be used for determining the level of parallelism that the bitstream
is encoded with by calculating value
X=parallelism=(parallelism_idc/4)+1=(syntax element/4)+1.
[0082] This embodiment presents a more specific version of the
previous embodiment and all different aspects from the previous
embodiments are not repeated in this embodiment, however, they can
be combined and/or restricted in any suitable fashion.
[0083] According to this embodiment, the level of parallelism is
calculated as:
Parallelism=(parallelism_idc/m)+1
[0084] When Parallelism is greater than 1 the constraints specified
below applies. The preferred value of m is 4 but other values can
alternatively be used.
[0085] When Parallelism is greater than 1 701, one of the following
conditions must be fulfilled 702 as illustrated in the flowchart of
FIG. 7b:
[0086] A. tiles_or_entropy_coding_sync_idc is equal to 0 in each
picture parameter set activated within the coded video sequence and
the maximum number of luma samples in a slice is less than or equal
to
floor(pic_width_in_luma_samples*pic_height_in_luma_samples/parallelism).
[0087] B. tiles_or_entropy_coding_sync_idc is equal to 1 in each
picture parameter set activated within the coded video sequence and
the maximum number of luma samples in a tile is less than or equal
to
floor(pic_width_in_luma_samples*pic_height_in_luma_samples/parallelism).
[0088] C. tiles_or_entropy_coding_sync_idc is equal to 2 in each
picture parameter set activated within the coded video sequence and
the syntax elements pic_width_in_luma_samples,
pic_height_in_luma_samples and the variable CtbSize are restricted
such that:
(2*(pic_height_in_luma_samples/CtbSize)+(pic_width_in_luma_samples/CtbSiz-
e))*CtbSize*CtbSize.ltoreq.floor(MaxLumaFS/parallelism)
[0089] It should be noted that floor (x) implies the largest
integer less than or equal to x.
pic_width_in_luma_samples is a picture width and
pic_height_in_luma_samples is a picture height.
[0090] The value X, denoted parallelism, can be used by a decoder
to calculate the maximum number of luma samples to be processed by
one thread, making the assumption that the decoder maximally
utilizes the parallel decoding information. It should be noted that
there might be inter-dependencies between the different threads
e.g. deblocking across tile and slice boundaries or entropy
synchronization. To aid decoders in planning the decoding workload
distribution it is recommended that encoders set the value of
parallelism_idc to the highest possible value for which one of the
three conditions above is fulfilled. For the case when
tiles_or_entropy_coding_sync_idc=2 that means setting
parallelism_idc=floor(4*MaxLumaFS/(2*((pic_height_in_luma_samples/CtbSize-
)+(pic_width_in_luma_samples/CtbSize))*CtbSize*CtbSize))-4
[0091] The condition A in this embodiment is a special case of
restriction a and the encoder steps that are presented for
restriction a applies for condition A.
[0092] The condition B in this embodiment is a special case of
restriction b and the encoder steps that are presented for
restriction b applies for condition B.
[0093] The condition C in this embodiment is a special case of
restriction c and the encoder steps that are presented for
restriction c applies for condition C.
[0094] Specifically, an encoder may be configured to perform the
following steps in order to indicate the value of X according to
condition C.
[0095] 1. Multiple pictures constituting a video sequence with
height equal to pic_height_in_luma_samples and width equal to
pic_width_in_luma_samples and treeblock size equal to CtbSize are
encoded using WPP.
[0096] 2. The value X is signaled, by means of the syntax element,
as the highest possible value for which restriction c is
fullfilled, i. e.
value
X=floor(MaxLumaFS/(2*((pic_height_in_luma_samples/CtbSize)+(pic_wid-
th_in_luma_samples/CtbSize))*CtbSize*CtbSize))
[0097] According to a further embodiment, the value of the syntax
element is used to impose a restriction of the number of bytes per
slice, tile or wavefront.
[0098] Accordingly, there may be a restriction on the maximum
number of bytes (or bits) in one slice, tile and/or
wavefront-substream as a function of the value X and optionally as
a function of MaxBR and/or Max CPB size. MaxBR is the maximum
bitrate for a level and CPB size is a Coded Picture Buffer
size.
[0099] The restriction could be combined with one or more of the
restrictions above.
[0100] Specifically, an encoder may be configured to perform the
following steps in order to fulfill restriction a and a requirement
on the maximum number of bytes in a slice.
[0101] 1. Multiple pictures constituting a video sequence are
encoded and the following is performed:
[0102] 2. The treeblocks of a slice are encoded right up until
before the start of the first treeblock T that fulfills one or more
of the following conditions:
[0103] i. if it would have been encoded in the same slice it would
have resulted in that the slice would have contained more luma
samples than allowed by restriction a.
[0104] ii. if it would have been encoded in the same slice it would
have resulted in that the number of bytes in the slice would be
more than what is allowed by the requirement on the maximum number
of bytes in a slice.
[0105] 3. Instead of including that specific treeblock T in the
slice that is currently coded the encoder completes the slice
before T and begins a new slice with T as the first treeblock. The
process continues from step two until the entire picture has been
encoded.
[0106] According to further aspects, a transmitting apparatus is
provided. The transmitting apparatus comprises an encoder for
encoding the bitstream. The encoder comprises a processor and
in/out-put section configured to perform the method steps as
described above. Thus the encoder is configured to encode the
bitstream e.g. with a level of parallelism according to the syntax
element, i.e. to use the value of the syntax element to impose a
restriction according to the embodiments. The in/out-put section is
configured to send the syntax element. Moreover, the encoder can be
implemented by a computer wherein the processor of the encoder is
configured to execute software code portions stored in a memory,
wherein the software code portions when executed by the processor
generates the respective encoder methods above.
[0107] Accordingly, a transmitting apparatus 400 for encoding a
bitstream representing a sequence of pictures of a video stream is
provided as illustrated in FIG. 8. The transmitting apparatus 400
comprises as described above an encoder 410 and a syntax element
managing unit 440. According to one implementation the encoder and
the syntax element managing unit is implemented by a computer 800
comprising a processor 810, also referred to as a processing unit
and a memory 820. Thus, the transmitting apparatus 400 according to
this aspect comprises a processor 810 and memory 820. Said memory
820 contains instructions executable by said processor 810 whereby
said transmitting apparatus 400 is operative to send a syntax
element, wherein a value of the syntax element is indicative of
restrictions that are enforced on the bitstream in a way that
guarantees a certain level of parallelism for decoding the
bitstream.
[0108] The transmitting apparatus 400 is operative to set the value
of the syntax element equal to the level of parallelism. That may
be achieved by a syntax element managing unit 440 e.g. implemented
by the processor.
[0109] Further, the transmitting apparatus 400 may be operative to
use the value of the syntax element to impose one restriction of a,
b and c on the bitstream by e.g. using the syntax managing
unit:
[0110] a: A maximum number of luma samples per slice is restricted
as a function of the picture size and as a function of the value of
the syntax element.
[0111] b: A maximum number of luma samples per tile is restricted
as a function of the picture size and as a function of the value of
the syntax element.
[0112] c: Wavefronts are used and treeblock size and picture height
and picture width are restricted jointly as a function of the value
of the syntax element and as a function of the picture size.
[0113] In addition, according to different embodiments, the
transmitting apparatus 400 is operative to: perform different
actions relating to the division into treeblock sizes or selection
of treeblock size in order to fulfill requirement of the
restrictions referred to as a,b and c as explained previously.
[0114] According to a further embodiment, the transmitting
apparatus, by using the in/out-put unit, is operative to signal the
value of the syntax element as a highest possible value for which
every slice in the sequence fulfills requirement of the restriction
a.
[0115] According to a further embodiment, the transmitting
apparatus, by using the in/out-put unit, is operative to signal the
value of the syntax element as a highest possible value for which
every tile in the sequence fulfills requirement of the restriction
b.
[0116] According to a further embodiment, the transmitting
apparatus, by using the in/out-put unit, is operative to signal the
value of the value of the syntax element as the highest possible
value for which requirement of the restriction c is fulfilled.
[0117] According to a further embodiment, the transmitting
apparatus, e.g. by using the syntax element managing unit, is
operative to determine the level of the parallelism as a function
of the value of the syntax element, which could be exemplified
by
Parallelism=((value of syntax element)/4)+1.
[0118] Hence the transmitting apparatus may be configured to: when
the value of the syntax element indicates restrictions that are
enforced on the bitstream such that at least two parallel processes
can be used for decoding, i.e. parallelism>1, ensure that one of
the following conditions must be fullfilled:
[0119] Condition a: If neither tiles nor wavefronts are used within
the sequence to be encoded, the maximum number of luma samples in a
slice should be less than or equal to floor (picture width*picture
height/Parallelism)
[0120] Condition b: If tiles but not wavefronts are used within the
sequence to be encoded, the maximum number of luma samples in a
tile should be less than or equal to floor (picture width*picture
height/Parallelism)
[0121] Condition c: If wavefronts but not tiles are used within the
sequence to be encoded, the syntax elements indicating picture
width, picture height the variable indicating treeblock size are
restricted such that:
(2*(picture height/treeblock size)+(picture width/treeblock
size))*treeblock size*treeblock size.ltoreq.floor(maximum frame
size/Parallelism), wherein floor (x) implies the largest integer
less than or equal to x.
[0122] The in/out-put unit of the transmitting apparatus may be
configured to send the syntax element in a sequence parameter
set.
[0123] As a yet further alternative, the transmitting apparatus is
operative to use the value of the syntax element to impose a
restriction of the number of bytes or bits per slice, tile or
wavefront, e.g. by using the syntax element managing unit.
[0124] The encoder of the transmitting apparatus and the decoder of
the receiving apparatus, respectively, can be implemented in
devices such as video cameras, displays, tablets, digital TV
receivers, network nodes etc. Accordingly, the embodiments apply to
a transmitting apparatus and any element that operates on a
bitstream (such as a network-node or a Media Aware Network
Element). The transmitting apparatus may for example be located in
a user device e.g. a video camera in e.g. a mobile device.
[0125] The embodiments are not limited to HEVC but may be applied
to any extension of HEVC such as a scalable extension or multiview
extension or to a different video codec.
* * * * *