U.S. patent application number 13/174978 was filed with the patent office on 2012-03-01 for method and apparatus for generating a stereoscopic image.
This patent application is currently assigned to Sony Corporation. Invention is credited to Hideki Ando, Jonathan Richard THORPE.
Application Number | 20120050485 13/174978 |
Document ID | / |
Family ID | 43013424 |
Filed Date | 2012-03-01 |
United States Patent
Application |
20120050485 |
Kind Code |
A1 |
THORPE; Jonathan Richard ;
et al. |
March 1, 2012 |
METHOD AND APPARATUS FOR GENERATING A STEREOSCOPIC IMAGE
Abstract
A method of producing a first stereoscopic image is described.
The first stereoscopic image has a first left eye component and a
first right eye component, by mixing a second stereoscopic image
having a second left eye component and a second right eye component
wherein depth information is associated with the second left eye
component and depth information is associated with the second right
eye component with a third image having depth information
associated therewith, the method comprising the steps of: at each
pixel position of the first left eye component, comparing the depth
information associated with the second left eye component and the
third image at that pixel position, and at each pixel position of
the first right eye component, comparing the depth information
associated with the second right eye component and the third image
at that pixel position; and determining the foreground pixel for
the first left eye component and the first right eye component at
the pixel position on the basis of said comparisons.
Inventors: |
THORPE; Jonathan Richard;
(Abbotts Barton, GB) ; Ando; Hideki; (Tokyo,
JP) |
Assignee: |
Sony Corporation
Tokyo
JP
|
Family ID: |
43013424 |
Appl. No.: |
13/174978 |
Filed: |
July 1, 2011 |
Current U.S.
Class: |
348/46 ;
348/E13.074 |
Current CPC
Class: |
H04N 2213/005 20130101;
H04N 13/156 20180501 |
Class at
Publication: |
348/46 ;
348/E13.074 |
International
Class: |
H04N 13/02 20060101
H04N013/02 |
Foreign Application Data
Date |
Code |
Application Number |
Aug 31, 2010 |
GB |
1014406.1 |
Claims
1. A method of producing a first stereoscopic image having a first
left eye component and a first right eye component, by mixing a
second stereoscopic image having a second left eye component and a
second right eye component wherein depth information is associated
with the second left eye component and depth information is
associated with the second right eye component with a third image
having depth information associated therewith, the method
comprising the steps of; at each pixel position of the first left
eye component, comparing the depth information associated with the
second left eye component and the third image at that pixel
position, and at each pixel position of the first right eye
component, comparing the depth information associated with the
second right eye component and the third image at that pixel
position; and determining the foreground pixel for the first left
eye component and the first right eye component at the pixel
position on the basis of said comparisons.
2. A method according to claim 1, wherein the foreground pixel is
determined in accordance with the same depth value being selected
for the first left eye component and the first right eye
component.
3. A method according to claim 1, wherein the foreground pixel is
determined in accordance with depth information selected from the
depth information of the second left eye component or the second
right eye component and the respective third image.
4. A method according to claim 1, wherein the third image is a
stereoscopic image having a third left eye component and a third
right eye component, whereby the third left eye component has depth
information associated therewith and the third right eye component
has depth information associated therewith.
5. A method according to claim 1, wherein the same depth value is a
mean value of the second left or right eye component depth
information and the third image depth information at that pixel
position.
6. A method according to claim 1, further comprising selecting the
same depth value for the generation of a plurality of frames of the
first stereoscopic image.
7. A method according to claim 1, comprising calculating the
intensity of each pixel in either the second left or right eye
component and the third image and selecting the foreground pixel
for the first left or right eye component respectively on the basis
of the calculated intensity.
8. A method according to claim 7, wherein the component with the
lowest intensity is selected as the foreground pixel at that pixel
position in the first stereoscopic image.
9. A method according to claim 1 further comprising outputting
depth information associated with each pixel in the mixed first
image.
10. A method of producing a first image by mixing a second image of
a captured first scene having depth information, relating to the
depth of a pixel in the first scene associated therewith and a
third image of a captured second scene having depth information,
relating to the depth of a pixel in the captured second scene
associated therewith, wherein the first image is mixed using the
depth information from the second image as a key.
11. A method according to claim 9, wherein the first and second
images are stereoscopic images
12. A method according to claim 1, wherein the depth information is
provided from either a depth map or a disparity map.
13. A computer program containing computer readable instructions
which, when loaded onto a computer, configure the computer to
perform the method according to claim 1.
14. A storage medium configured to store the computer program of
claim 13 therein or thereon.
15. An apparatus for producing a first stereoscopic image having a
first left eye component and a first right eye component, by mixing
a second stereoscopic image having a second left eye component and
a second right eye component wherein depth information is
associated with the second left eye component and depth information
is associated with the second right eye component with a third
image having depth information associated therewith, the apparatus
comprising; a left eye comparator operable to, at each pixel
position of the first left eye component, compare the depth
information associated with the second left eye component and the
third image at that pixel position, and a right eye comparator
operable to, at each pixel position of the first right eye
component, compare the depth information associated with the second
right eye component and the third image at that pixel position; and
a controller operable to determine the foreground pixel for the
first left eye component and the first right eye component at the
pixel position on the basis of said comparisons.
16. An apparatus according to claim 15, wherein the foreground
pixel is determined in accordance with the same depth value being
selected for the first left eye component and the first right eye
component.
17. An apparatus according to claim 15, wherein the foreground
pixel is determined in accordance with depth information selected
from the depth information of the second left eye component or the
second right eye component and the respective third image.
18. An apparatus according to claim 15 wherein the third image is a
stereoscopic image having a third left eye component and a third
right eye component, whereby the third left eye component has depth
information associated therewith and the third right eye component
has depth information associated therewith.
19. An apparatus according to claim 15, wherein the same depth
value is a mean value of the second left or right eye component
depth information and the third image depth information at that
pixel position.
20. An apparatus according to claim 15, further comprising a
selector operable to select the same depth value for the generation
of a plurality of frames of the first stereoscopic image.
21. An apparatus according to claim 15, comprising an intensity
calculator operable to calculate the intensity of each pixel in
either the second left or right eye component and the third image
and selecting the foreground pixel for the first left or right eye
component respectively on the basis of the calculated
intensity.
22. An apparatus according to claim 21, wherein the component with
the lowest intensity is selected as the foreground pixel at that
pixel position in the first stereoscopic image.
23. An apparatus according to claim 15 further comprising an
outputter operable to output depth information associated with each
pixel in the mixed first image.
24. An apparatus for producing a first image by mixing a second
image of a captured first scene having depth information, relating
to the depth of a pixel in the first scene associated therewith and
a third image of a captured second scene having depth information,
relating to the depth of a pixel in the captured second scene
associated therewith, wherein the first image is mixed using the
depth information from the second image as a key.
25. An apparatus according to claim 24, wherein the first and
second images are stereoscopic images
26. An apparatus according to claim 15, wherein the depth
information is provided from either a depth map or a disparity map.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] The present invention relates generally to a method and
apparatus for generating a stereoscopic image.
[0003] 2. Description of the Prior Art
[0004] As 3D television and cinematography is becoming popular, 3D
editing effects are being increasingly used.
[0005] One 2D effect that is commonly used is multiplexing one
image into another, second, image in 2D. An example of this is
shown in FIG. 3, where a first image 300 and a second image 305 are
to be mixed together. As can be seen in the resultant image 310,
the toy bear and house from the first image 300 appear over the
mask in the second image 305. In order to achieve this effect, a
depth map of each pixel in each image is used to ensure that the
positioning of artefacts in the resultant image appear correct. It
is important to ensure that when two scenes are edited together,
the mixed image appears to have artefacts in the correct physical
space. In other words, it is necessary to know which artefact
should be placed in the foreground and which should be placed in
the background.
[0006] A prior art apparatus for achieving this is shown in FIG. 1.
In FIG. 1, the first image 300 and the corresponding first depth
map 1010 are fed into the mixing apparatus 1000. Additionally, the
second image 305 and the second depth map 1020 are also fed into
the mixing apparatus 1000. The depth of each pixel is compared from
the first and second depth maps 1010 and 1020 in a map comparator
1025. This comparison results in the correct placing of each pixel
in the resultant image. In other words, from the depth map it is
possible to determine whether the pixel from the first image should
be placed behind or in front of a corresponding pixel from the
second image.
[0007] At each pixel position, the map comparator 1025 instructs a
multiplexer 1035 to select for display either the pixel from the
first image 300 or the pixel from the second image 305. This
generates the mixed image 310. Further, the map comparator 1025
selects the depth corresponding to the selected pixel. This depth
value is fed out of the mixing apparatus 1000 and forms the
resultant depth map 1045 for the mixed image.
[0008] As noted above, as 3D editing is being more frequently
required, there is a need to adapt this technique for 3D
editing.
[0009] It is an aim of the present invention to try and adapt the
above mixing technique to the 3D scenario.
SUMMARY OF THE INVENTION
[0010] According to a first aspect, there is provided a method of
producing a first stereoscopic image having a first left eye
component and a first right eye component, by mixing a second
stereoscopic image having a second left eye component and a second
right eye component wherein depth information is associated with
the second left eye component and depth information is associated
with the second right eye component with a third image having depth
information associated therewith, the method comprising the steps
of; at each pixel position of the first left eye component,
comparing the depth information associated with the second left eye
component and the third image at that pixel position, and at each
pixel position of the first right eye component, comparing the
depth information associated with the second right eye component
and the third image at that pixel position; and determining the
foreground pixel for the first left eye component and the first
right eye component at the pixel position on the basis of said
comparisons.
[0011] The foreground pixel may be determined in accordance with
the same depth value being selected for the first left eye
component and the first right eye component.
[0012] The foreground pixel may be determined in accordance with
depth information selected from the depth information of the second
left eye component or the second right eye component and the
respective third image.
[0013] The third image may be a stereoscopic image having a third
left eye component and a third right eye component, whereby the
third left eye component has depth information associated therewith
and the third right eye component has depth information associated
therewith.
[0014] The same depth value may be a mean value of the second left
or right eye component depth information and the third image depth
information at that pixel position.
[0015] The method may further comprise selecting the same depth
value for the generation of a plurality of frames of the first
stereoscopic image.
[0016] The method may further comprise calculating the intensity of
each pixel in either the second left or right eye component and the
third image and selecting the foreground pixel for the first left
or right eye component respectively on the basis of the calculated
intensity.
[0017] The component with the lowest intensity may be selected as
the foreground pixel at that pixel position in the first
stereoscopic image.
[0018] The method may further comprise outputting depth information
associated with each pixel in the mixed first image.
[0019] According to another aspect, there is provided a method of
producing a first image by mixing a second image of a captured
first scene having depth information, relating to the depth of a
pixel in the first scene associated therewith and a third image of
a captured second scene having depth information, relating to the
depth of a pixel in the captured second scene associated therewith,
wherein the first image is mixed using the depth information from
the second image as a key.
[0020] The first and second images may be stereoscopic images
[0021] The depth information may be provided from either a depth
map or a disparity map.
[0022] There is also provided a computer program containing
computer readable instructions which, when loaded onto a computer,
configure the computer to perform the method according to any one
of the above.
[0023] There is also provided a storage medium configured to store
the computer program therein or thereon.
[0024] According to another aspect, there is provided an apparatus
for producing a first stereoscopic image having a first left eye
component and a first right eye component, by mixing a second
stereoscopic image having a second left eye component and a second
right eye component wherein depth information is associated with
the second left eye component and depth information is associated
with the second right eye component with a third image having depth
information associated therewith, the apparatus comprising; a left
eye comparator operable to, at each pixel position of the first
left eye component, compare the depth information associated with
the second left eye component and the third image at that pixel
position, and a right eye comparator operable to, at each pixel
position of the first right eye component, compare the depth
information associated with the second right eye component and the
third image at that pixel position; and a controller operable to
determine the foreground pixel for the first left eye component and
the first right eye component at the pixel position on the basis of
said comparisons.
[0025] The foreground pixel may be determined in accordance with
the same depth value being selected for the first left eye
component and the first right eye component.
[0026] The foreground pixel may be determined in accordance with
depth information selected from the depth information of the second
left eye component or the second right eye component and the
respective third image.
[0027] The third image may be a stereoscopic image having a third
left eye component and a third right eye component, whereby the
third left eye component has depth information associated therewith
and the third right eye component has depth information associated
therewith.
[0028] The same depth value may be a mean value of the second left
or right eye component depth information and the third image depth
information at that pixel position.
[0029] The apparatus may further comprise a selector operable to
select the same depth value for the generation of a plurality of
frames of the first stereoscopic image.
[0030] The apparatus may further comprise an intensity calculator
operable to calculate the intensity of each pixel in either the
second left or right eye component and the third image and
selecting the foreground pixel for the first left or right eye
component respectively on the basis of the calculated
intensity.
[0031] The component with the lowest intensity may be selected as
the foreground pixel at that pixel position in the first
stereoscopic image.
[0032] The apparatus may further comprise an outputter operable to
output depth information associated with each pixel in the mixed
first image.
[0033] According to another aspect, there is provided an apparatus
for producing a first image by mixing a second image of a captured
first scene having depth information, relating to the depth of a
pixel in the first scene associated therewith and a third image of
a captured second scene having depth information, relating to the
depth of a pixel in the captured second scene associated therewith,
wherein the first image is mixed using the depth information from
the second image as a key.
[0034] The first and second images may be stereoscopic images
[0035] The depth information may be provided from either a depth
map or a disparity map.
BRIEF DESCRIPTION OF THE DRAWINGS
[0036] The above and other objects, features and advantages of the
invention will be apparent from the following detailed description
of illustrative embodiments which is to be read in connection with
the accompanying drawings, in which:
[0037] FIG. 1 shows a prior art multiplexing apparatus for 2D image
signals;
[0038] FIG. 2 shows a multiplexing apparatus for 3D image
signals;
[0039] FIG. 3 shows a prior art resultant image signal from the
apparatus of FIG. 1;
[0040] FIG. 4 shows a resultant image signal from the apparatus of
FIG. 2;
[0041] FIG. 5 shows a multiplexing apparatus for 3D image signals
according to embodiments of the present invention;
[0042] FIG. 6 shows a more detailed diagram of a multiplexing
co-ordinator of FIG. 5;
[0043] FIG. 7 shows a detailed diagram showing the generation of a
disparity map according to embodiments of the present
invention;
[0044] FIG. 8 shows a detailed diagram of a scan line for the
generation of a disparity map according to embodiments of the
present invention; and
[0045] FIG. 9 shows a detailed diagram of a horizontal position vs
dissimilarity matrix showing a part occluded object.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0046] FIG. 2 shows an apparatus which may implement the above
mixing technique in the 3D scenario. In the 3D scenario, the first
image 300 has a left eye image 300A and a right eye image 300B. The
left eye image is the version of the first image that is intended
for the viewer's left eye and the right eye image is the version of
the first image that is intended for the viewer's right eye. The
left eye image 300A is a horizontally displaced version of the
right eye image 300B. In every other respect, for non occluded
areas ideally, the left and right image would be identical. In the
case of determining the depth of each pixel in each image, it is
possible to do this in two ways. The first is to generate a depth
map for each image. This provides a depth value for each pixel in
the image. The second is to generate a disparity map which provides
details of the difference between pixels in the left eye image 300A
and the right eye image 305A. In the example of FIG. 2, a depth map
1010A is provided for the left eye image and a depth map 1020A is
provided for the right eye image. From these depth maps, it is
possible to calculate a disparity map which provides the difference
in pixel position between corresponding pixels in the left eye
image and the right eye image. However, as the skilled person will
appreciate, to calculate disparity maps, camera parameters such as
the angle of field and the interocular distance are also
required.
[0047] Similarly, the second image 305 has a left eye image 305A
intended for the viewer's left eye and a right eye image 305B
intended for the viewer's right eye. Again a depth map for each of
the left eye image and the right eye image is provided in 1010B and
1020B. So, in order to implement the mixing editing in 3D, two 2D
apparatuses 1000 of FIG. 1 are used. This arrangement is shown in
detail in FIG. 2.
[0048] In FIG. 2, there is shown a mixing apparatus 1000A which
generates the left eye image and a mixing apparatus 1000B which
generates the right eye image. The left and right eye images
should, ideally for unoccluded objects, be identical except for
horizontal displacement. The depth map for the left eye version of
the first image 1010A and the depth map for the left eye version of
the second image 1020A are provided to the mixing apparatus for the
left eye image. Similarly, the depth map for the right eye version
of the first image 1010B and the depth map for the right eye
version of the second image 1020B are provided to the mixing
apparatus 1000B. As the left eye version of the first image and the
right eye version of the first image are of the same scene, the
objects within that scene should be at the same depth. Similarly,
the left eye version of the second image and the right eye version
of the second image are of the same scene all objects within that
scene should be at the same depth. However, the depth maps for each
of the left hand version of the first and second image and the
right hand version of the first and second image are all generated
independently of one another.
[0049] As the depth maps are not always perfectly accurate the
arrangement of FIG. 2 has a previously unrecognised problem as
illustrated in FIG. 4 which have been addressed.
[0050] In the mixed left hand image created by mixing apparatus
100A, at pixels near the boundary between the house from the first
image 300A and the mask from the second image 305A, the mixed depth
map may take values at this point from the depth map for the first
image. However, at the corresponding pixels in the mixed right hand
image, the mixed depth map may take values from the depth map for
the second image. The resultant image is shown in detail in FIG.
4.
[0051] Specifically, in FIG. 4, an area showing the intersection of
the mask with the house is shown in detail. In the mixed left eye
image 310A, the boundary between the house and the mask has one
profile (405A 410A). However, in the mixed right eye image 310B,
although the boundary (405B 410B) between the house and the mask
should have an identical, although horizontally displaced, boundary
it does not. This means that in some parts of the boundary in one
eye, the mask will look to be in front of the house, whereas in the
same parts of the boundary in the other eye, the mask will look to
be behind the house. This discrepancy will cause discomfort for the
viewer when they view the image in 3D.
[0052] Embodiments of the present invention aim to address this
issue. Further, the depth maps created for each image are
computationally expensive to produce if the depth map is to be
accurate. Clearly, it is advantageous to further improve the
accuracy of depth maps to improve the enjoyment of the user and to
help avoid discrepancies occurring in the images. It is also an aim
of embodiments of the present invention to address this issue as
well.
[0053] The apparatus of FIG. 5 shows a multiplexing apparatus 500
for 3D image signals according to an embodiment of the present
invention. In FIG. 5, like reference numerals refer to like
features explained with reference to FIG. 2. The function of the
like features will not be explained hereinafter.
[0054] As can be seen from FIG. 5, the apparatus according to
embodiments of the present invention contain all the features of
FIG. 2 with an additional multiplexor coordinator 600.
Additionally, the function of the multiplexor coordinator 600 means
that the mixed depth map for the left hand image 5045A and the
mixed depth map for the right hand image 5045B, and the resultant
left and right hand mixed images 510A and 510B will be different to
those of FIG. 2.
[0055] The multiplexor coordinator 600 is connected to both the
left eye mixing apparatus 100A and the right eye mixing apparatus
100B. The function of the multiplexor coordinator 600 will be
described with reference to FIG. 6.
[0056] The multiplexor coordinator 600 is provided with the depth
map for the left hand version of the first image 605 and the depth
map for the left hand image of the second image 610. Similarly, the
multiplexor coordinator 600 is provided with the depth map for the
right hand version of the first image 615 and the depth map for the
right hand version of the second image 620. A detailed description
of the production of a disparity map (from which the depth map is
created) will be provided later, although it should be noted that
the invention is not so limited and any appropriately produced
depth map or disparity map may be used in embodiments of the
present invention.
[0057] As would be appreciated by the skilled person, although the
foregoing is explained with reference to a depth map, there would
need to be logic included which selects corresponding pixels in
each of the left and right eye image. In other words, the left eye
image and the right eye image are displaced from one another and so
there is included in FIG. 6 (although not shown), logic which
determines which pixels correspond to which other pixels. This type
of logic is known and so will not be explained hereinafter. In this
case, the depth information may be disparity information.
[0058] The depth map for the left hand version of the first image
605 is compared with the depth map for the left hand version of the
second image 610 in a depth comparator for the left eye image 625.
The depth comparator for the left eye image 625 determines, for
each pixel position along a scan line, whether the resultant left
eye image should have the appropriate pixel from the left hand
version of the first image or the appropriate pixel from the left
hand version of the second image as the foreground pixel.
Similarly, the depth comparator for the right eye image 630
determines, for each pixel position along a scan line, whether the
resultant right eye image should have the appropriate pixel from
the right hand version of the first image or the appropriate pixel
from the right hand version of the second image as the foreground
pixel.
[0059] The output of each comparator may be a depth value which
indicates the difference in depth values. Alternatively, the output
from each comparator may be any other type of value which indicates
to a subsequent multiplexor controller 635 which of the depth maps
each comparator selects. For example, the output from each depth
comparator may be a 1 or 0 identifying which depth map should be
used. The selection made by the depth comparator for the left eye
image 625 and the selection made by the depth comparator for the
right eye image 630 are input in a multiplexor controller 635. The
output of the multiplexor controller 635 is a signal which controls
the mixing apparatus for the left eye 100A and the mixing apparatus
for the right eye 100B to use the same pixel as foreground pixel
for each corresponding pixel pair. In other words, the perceived
depth of a pixel in the left eye resultant image, and the perceived
depth of the corresponding (or horizontally displaced) pixel in the
right eye resultant image is the same. This addresses the problem
noted above where the corresponding pixels in the left and right
eye versions of the mixed image have different depths and thus
different pixels are used as the foreground pixel.
[0060] Where there is disagreement in the depth maps for a given
pixel, the multiplexor controller 635 selects one of the depth maps
as the depth of the pixel. This is in dependence on the value of
the output from each comparator. In one embodiment, the multiplexor
controller 635 applies that depth value to the pixel in the other
mixing apparatus. Alternatively, the output pixel may be selected
purely on the basis of the output from each comparator.
[0061] In order to generate a depth signal the multiplexor
controller 635 may work in a number of different ways. Firstly, the
multiplexor controller 635 may simply select one depth map value
from one of the versions of the first image and use this as the
depth in the other version of the first image. Similarly, the
multiplexor controller 635 may simply select one depth map value
from one of the versions of the second image and use this as the
depth in the other version of the second image. Alternatively, the
multiplexor controller 635 can calculate the error in the depth of
each result and select the depth which has the lowest error.
Techniques for determining this are known to the skilled person.
Additionally, the selection may be random. Alternatively, the same
depth value may be use for a predetermined number of subsequent
frames. This stops the change of foreground pixels between
successive frames which would cause discomfort. The pixels with the
lowest intensity may be selected as being the foreground object.
This will again stop the user feeling discomfort. As a further
alternative, a depth which is the mean average of the two
dissimilar values may be selected as the depth of the corresponding
pixels.
[0062] If the multiplexor controller 635 simply selects the correct
pixel on the basis of the outputs of the comparators, a simple
instruction instructing the respective mixers 100A and 100B to use
the same pixel may be issued.
[0063] Although the above has been described with reference to
mixing two 3D images, the invention is not so limited. For example,
it is possible to use the above technique to mix a 2D image (such
as a logo) with a 3D image. For each pixel in the 2D image a depth
is provided. Indeed, with the above technique two images can be
edited together using the depth plane. For example, one image may
wipe to a second image using the depth plane. This will be referred
to hereinafter as a "z-wipe".
[0064] Z-Wipe
[0065] Although the foregoing has been explained with reference to
stereo pairs, the selection of a foreground pixel given a depth map
for two images which are to be mixed together is not so limited. By
mixing two images using the depth plane information, it is possible
to perform numerous effects using the depth plane of the image. For
example, it is possible to wipe from one image to another image
using the depth plane. In other words, it is possible to create an
editing technique where it appears to the viewer that one image
blends into another image from behind. Additionally, it is possible
to wipe from one image to another image only at a certain position
in the depth plane. Alternatively, one may use the depth plane as a
key for editing effects. For example, it may be possible to place
one image over another image only at one depth value. This may be
useful during live broadcasts where presently chroma keying
(commonly called blue or green screening) is used. One image, such
as a weather map, would be located at a depth position and the
above technique would select, for each pixel position, whether the
image of the weather presenter or the weather map would be in the
foreground. Clearly, many other editing techniques could be
envisaged using the depth plane as would be appreciated by the
skilled person.
[0066] Depth Map Generation
[0067] As noted above, in embodiments of the present invention, the
depth map will be generated. The depth of each pixel point in the
image can be generated using a number of predetermined algorithms,
such as Scale Invariant Feature Transform (SIFT). However, these
depth maps are either very densely populated and accurate but slow
to produce, or not so densely populated but quick and
computationally efficient to produce. There is thus a need to
improve the accuracy and density of produced depth maps whilst
still ensuring that the depth maps are producing computationally
efficiently. An aim of embodiments of the present invention is to
address this.
[0068] FIG. 7 shows a stereo image pair 700 captured using a
stereoscopic camera having a parallel lens arrangement. In the left
eye image 705, there is a cube 720A and a cylinder 715A. As will be
apparent, from the left eye image 705, the cylinder 715A is
slightly occluded by the cube 720A. In other words, in the left eye
image 705 the cube 720A is positioned in front of the cylinder 715A
and slightly obstructs the left eye image 705 from seeing part of
the cylinder 715A. The right eye image 710 captures the same scene
as the left eye image 705 but from a slightly different
perspective. As can be seen, the cube 720B is still located in
front of the cylinder 715B but in the right eye image 710, the cube
720B does not occlude the cylinder 715B. In fact there is a small
portion of background 740B between the cube 720B and cylinder 715B.
As will be also seen, the left side of the cube 725A is visible in
the left eye image 705 but is not visible in the right side image
710. Similarly, the right side of the cube 725B is visible in the
right eye image 710 but is not visible in the left eye image
705.
[0069] In order to determine the depth of each pixel in the left
eye image 705 and the right eye image 710, the disparity between
corresponding pixels needs to be determined. In other words, one
pixel position in the left eye image 705 will correspond to a part
of the scene. The same part of the scene will be at a pixel
position in the right hand image 710 different to the pixel
position in the left eye image 705. The difference in the number of
pixels is termed the disparity and will give an indication of the
depth of the part of the scene from the camera capturing the image.
This, over the entire image, provides the depth map for the
image.
[0070] In embodiments of the present invention, the same scan line
is taken from the left image eye 730A and the right eye image 730B.
The reason the same scan line is used is because in stereoscopic
images, only horizontal disparity should exist in epipolar
rectified images. In other words, the left and right eye image
should be vertically coincident with only disparity occurring in
the horizontal direction. It should be noted that to ensure only a
single pixel scan line can be used, the images are epipolar
rectified during preprocessing. However the invention is not so
limited. It is envisaged that although one scan line one pixel deep
will be described, the invention is not so limited and a scan line
of any depth may be used. A deeper scan line may be useful to
Increase the stability of the results.
[0071] The results of the left eye scan line 735A and a right eye
scan line 735B is shown in FIG. 8. As can be seen in the left hand
scan line 735A, and looking in the x direction, the background
changes to the left side of the cube 725A at point PL1. The left
side of the cube 725A changes to the front face of the cube 720A at
point PL2. The front face of the cube 720A changes to the cylinder
715A at point PL3. The cylinder 715A changes to the background
again at point PL4.
[0072] As can be seen in the right hand scan line 735B, and looking
in the x-direction, the background changes to the face of the cube
720B at point PR 1. The face of the cube 720B changes to the right
side of the cube 725B at point PR2. The right side of the cube 725B
changes to the background at point PR3. The background changes to
the cylinder 715B at point PR4 and the cylinder changes to the
background at point PR5.
[0073] In the left eye image, points PL1 to PL4 are detected and in
the right eye image, points PR1 to PR5 are detected. In order to
detect these points, the change in intensity between horizontally
adjacent pixels is measured. If the change in intensity is above a
threshold, the point is detected. Although the intensity difference
is used in embodiments, the invention is not so limited and the
change in luminance or colour or indeed any image property may be
used to detect the change point. Method of determining the change
point exists in the Art and so will not be described hereinafter.
It is next necessary to detect in the left and right scan lines
which segments correspond to the most forward object, i.e. the
object closest to the camera. In the example of FIG. 7, segment
720A in the left eye image 705 and segment 720B in the right eye
image 710 need to be detected. This is because the most forward
object in an image will not be occluded in either the left or right
image, assuming of course that either segment of the most forward
object does not extend beyond the scan line.
[0074] In order to reduce the amount of computation required to
determine the corresponding segments, the disparity between each
change point in the left eye image (PL1 to PL4) and each change
point in the right eye image (PR1 to PR5) is determined. This is
better seen in FIG. 8. This determination of the disparity enables
certain segments which cannot correspond to each other to be
ignored in calculating correspondence pixels. Referring to the
position of the change points on the scan line for the left eye
image, only change points appearing to the left hand side of the
corresponding position in the scan line for the right eye image can
correspond to the change point in the left hand image. Therefore,
when comparing the change points in the left hand scan line, only
change points to the left hand side of the change point in the
right hand image will be compared. For example, when finding a
change point in the right hand scan line that corresponds to change
point PL2, only PR1 can be the corresponding change point.
Similarly, when finding a change point that corresponds to point
PL3, it is only necessary to check the similarity between change
point PL3 and change points PR1, PR2, PR3 and PR4.
[0075] In fact, the amount of computation may be reduced further by
only checking change points in the right hand image scan line that
are within a predetermined distance from the change point in the
left hand image that is under test. For example, to find the change
point in the right hand image that corresponds to PL3, only the
change points that lie within an upper disparity threshold are
checked. In other words, only the change points in the right hand
scan line that are within a certain number of pixels to the left of
the change point in the right hand scan line are checked. The
threshold may be selected according to the depth budget of the
images or the interocular distance of the viewer or any other
metric may be selected.
[0076] A method for improving the segmentation process will be
described. In order to obtain accurate segmentation, the use of a
mean shift algorithm is known. However, as would be appreciated by
the skilled person, although accurate, the mean shift algorithm is
processor intensive. This makes the mean shift algorithm difficult
to implement in real time video. In order to improve the
segmentation, therefore, it is possible to use a less intensive
algorithm to obtain an idea where the segment boundaries lie in an
image, and then apply the mean shift algorithm to those boundary
areas to obtain a more accurate position for each segment
boundary.
[0077] So, in one embodiment, the input image may have a simple
edge detection algorithm applied thereto to obtain an approximate
location for edges in the image.
[0078] After edge detection, the edge detected image is then
subject to dilation filtering. This provides two areas. The first
areas are areas which are contiguous. These are deemed to belong to
the same segment. The second type of areas is areas surrounding the
detected edges. It is the second type of areas that are then
subjected to the mean shift algorithm. This improves the accuracy
of the results from the edge detection process whilst still being
computationally efficient.
[0079] One further embodiment in which to improve segmentation will
now be described. After edge detection of the input image, the edge
detected image is divided into smaller regions. These regions may
be of the same size, or may be of different sizes. Then the
dilation filtering may be applied to the image region by region
(rather than just along the edges as previously). After the
dilation filtering, the mean shift algorithm is applied to the
areas which were subjected to dilation filtering. The segmentation
is now complete.
[0080] In order to determine the forward most object, the pixels
adjacent to the change point in the left hand scan line are
compared to the pixels adjacent to the appropriate change points in
the right hand scan line. "Adjacent" in this specification may mean
directly adjacent i.e. the pixel next to the change point.
Alternatively, "adjacent" may mean in this specification within a
small number of pixels such as two or three pixels of the change
point, or indeed may mean within a larger number of pixels of the
change point. For forward most objects, or segments, the pixels to
the right hand side of point PL2 and PR1 will be most similar and
the pixels to the left of point PL3 and PR2 will be most similar.
In other words, the pixels at either end of the segment will be
most similar. After all the change points in the left hand scan
line and the right hand scan line have been calculated and compared
with one another, the forward most segment is established.
[0081] The validity of the selection of the forward most segment in
each image may be verified using the values of disparity of pixels
adjacent to the forward most segment in each image. As the forward
most segment is closest to the camera in each image, the disparity
between the pixel to the left of change point PL2 and its
corresponding pixel in the right hand scan line will be less than
or equal to the disparity between the pixel to the right of change
point PL2 and its corresponding pixel in the right hand scan line.
Similarly, the disparity between the pixel to the right of change
point PL3 and its corresponding pixel in the right hand scan line
will be less than or equal to the disparity between the pixel to
the left of change point PL3 and its corresponding pixel in the
right hand scan line. Similarly, the disparity between the pixel to
the left of change point PR1 and its corresponding pixel in the
left hand scan line will be less than or equal to the disparity
between the pixel to the right of change point PRI and its
corresponding pixel in the left hand scan line. Similarly, the
disparity between the pixel to the right of change point PR2 and
its corresponding pixel in the left hand scan line will be less
than or equal to the disparity between the pixel to the left of
change point PR2 and its corresponding pixel in the right hand scan
line.
[0082] After determining the most forward object and verifying the
result, it is possible to determine a part occluded object. A part
occluded object is an object which is part visible to either the
left or right hand eye image, but is partly overlapped in the other
eye image. Cylinder 715A is therefore part occluded in the left eye
image and is not occluded in the right eye image. As the skilled
person will appreciate, where there is part occlusion of an object,
there is no disparity information available because one image (the
left eye in this example) does not include the object for
comparison purposes. Therefore, it is necessary to estimate the
disparity. This is explained with reference to FIG. 9.
[0083] FIG. 9 shows a dissimilarity map for each pixel position on
a scan line. In other words, FIG. 9 shows a map which for each
pixel position along the x-axis shows how similar, or dissimilar,
pixels at a given disparity from the pixel position are. So, in
FIG. 9, along the x axis shows pixel positions on a scan line for,
say, the left eye image (although the invention is not so limited).
Along the y axis shows the similarity in the right eye image
between the pixel at the position on the scan line in the left eye
image and each pixel position at increasing disparity in the right
eye image. The maximum disparity is set by the depth budget of the
scene as previously noted.
[0084] Looking at the origin of the dissimilarity map (in the
bottom left corner of the map), only one pixel has a disparity
value. This is because at this position in the left hand image, all
pixels to the left of this point (i.e. having a disparity of one)
will be out of bounds of the left hand scan line and so cannot be
measured. This is indicated by a hashed line.
[0085] As would be appreciated, the change points in the map are
shown as thick black lines at each pixel position in the left hand
scan line compared with the right hand image. It would be
appreciated though that this is only an example and a comparison of
any scan line with any image is envisaged. As can be seen, the
non-occluded segment (which is closest to the camera) is determined
in accordance with the previous explanation. However, as noted
before, the segment to the immediate left of the non-occluded
segment in the left scan line and to the immediate right of the
non-occluded segment in the right scan line may be part
occluded.
[0086] In order to determine the disparity at any point in the
occluded area, it is necessary to determine which section of the
part occluded segment is occluded and which part is visible.
Therefore, the similarity of the left hand pixel nearest to the
right hand edge of the part occluded segment is determined. As can
be seen from section 905 these values are so dissimilar, that there
is no correlation. This indicates that this section of the part
occluded segment is occluded. Such analysis takes place for all
pixel positions in the segment to the immediate left of the forward
most object in the left scan line.
[0087] As can be seen, the similarity map shows that a number of
pixels within the part occluded segment have high similarity (or
low dissimilarity) values. The pixel at position 910, is closest to
the most forward segment which shows the most similarity.
Additionally, pixel position 915 is the right hand pixel closest to
the left hand edge of the part occluded segment. In order to
determine the disparity at any point within the part occluded
segment, therefore, a straight line, for example, is drawn between
pixel position 910 and pixel position 915. Then the disparity for
each pixel position is then estimated from this straight line.
Although a straight line is shown, the invention is not limited to
this. The disparity line may be determined in accordance with the
measured levels of dissimilarity or levels of similarity. For
example, the line may be defined by a least squares error
technique. Indeed, any suitable technique is envisaged.
[0088] It is envisaged that the above method may be performed on a
computer. The computer may be run using computer software
containing computer readable instructions. The computer readable
instructions may be stored on a storage medium such as a magnetic
disk or an optical disc such as a CD-ROM or indeed may be stored on
a network or a solid state memory.
[0089] Moreover, although the foregoing has been described with
reference to a stereoscopic image captured using a parallel
arrangement of camera lenses, the invention is not so limited. The
stereoscopic image may be captured using any arrangement of lenses.
However, it should be converted into parallel images according to
embodiments of the present invention.
[0090] Although the foregoing has mentioned two examples for the
provision of depth information, the invention is no way limited to
depth maps and disparity maps. Indeed any kind of depth information
may be used.
[0091] Although illustrative embodiments of the invention have been
described in detail herein with reference to the accompanying
drawings, it is to be understood that the invention is not limited
to those precise embodiments, and that various changes and
modifications can be effected therein by one skilled in the art
without departing from the scope and spirit of the invention as
defined by the appended claims.
* * * * *