U.S. patent application number 10/499332 was filed with the patent office on 2005-01-20 for segmentation of images using the watershed method.
Invention is credited to Watson, Alistair Imeson.
Application Number | 20050013483 10/499332 |
Document ID | / |
Family ID | 9927826 |
Filed Date | 2005-01-20 |
United States Patent
Application |
20050013483 |
Kind Code |
A1 |
Watson, Alistair Imeson |
January 20, 2005 |
Segmentation of images using the watershed method
Abstract
By using homotopic transformations, specifically the watershed
transformation, within feature space formed by the red, green and
blue components, or any other set of attributes, of the pixels of
an image, a unique number of colours can be found for that image.
Each colour is given a label and the image is then segmented into
regions of uniform labels. The colour or characteristic of each
segment can then be altered or used to identify, an object. This
can be applied to any scalable (either integer or real) data
set.
Inventors: |
Watson, Alistair Imeson;
(Dunblane, GB) |
Correspondence
Address: |
Breiner & Breiner
115 North Henry Street
PO Box 19290
Alexandria
VA
22320-0290
US
|
Family ID: |
9927826 |
Appl. No.: |
10/499332 |
Filed: |
June 17, 2004 |
PCT Filed: |
December 18, 2002 |
PCT NO: |
PCT/GB02/05754 |
Current U.S.
Class: |
382/164 |
Current CPC
Class: |
G06T 2207/20152
20130101; G06K 9/4652 20130101; G06T 7/90 20170101; G06K 9/38
20130101; G06K 9/342 20130101; G06T 7/155 20170101; G06T 7/11
20170101; G06T 2207/20156 20130101 |
Class at
Publication: |
382/164 |
International
Class: |
G06K 009/00 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 18, 2001 |
GB |
0130210.8 |
Claims
1. A method using a computer for segmenting an image into a small
number of homogeneous regions on the basis of the colour, the
method comprising the steps of: 1. Digitising a source image to
generate a digitised image file, comprising an n-dimensional map of
m-tuples, each of which represents the colour value at that point
in the image. 2. Forming an m-dimensional histogram of colour
frequency in the digitised image file. 3. Sorting the entries in
the histogram by height. 4. Choosing a point that has the highest
histogram value attained by any unlabelled point. 5. Assigning a
unique label to this point as follows: i. If no previously labelled
point is adjacent, assign a new label and geodesic distance 0. ii.
If there are adjacent elements with the same label, assign this
label and determine the geodesic distance to be the same (if
histogram value is the same) or one greater than (otherwise) the
least value held by the neighbour having the highest histogram
value. iii. If there are neighbours that have different labels,
ignore this point at present. 6. Finding any other points having
the same height and treating them as per 5. 7. Assigning to each
remaining point at this level (as ignored at 5iii) (i.e. each point
which is not uniquely connected) the label of whichever of its
neighbours has the lowest geodesic distance. 8. Repeating steps 4
to 7 until all points within the feature space have been assigned a
label. 9. Assigning to each pixel in the image the label that is
found for that combination of component values within the feature
space. 10. Giving each region of uniform labels a unique label to
identify that segment.
2. A method for image segmentation which comprises creating or
digitising an image, consisting of an N-dimensional map of data,
each element of which contains the M-dimensional colour data (the
value at that point in the image of each of the M colour
components) 2 optionally transforming the image from its original
colour space to another with the same or differing dimension (M),
and/or adding further components (to increase M) generated
algorithmically from the original data, e.g. `texture`, and/or
scaling each component by the same or different amounts to alter
the number of unique values in that band.
3. Making a histogram of frequency of each unique combination of
component values (`colour`), preferably by constructing a hash
table, wherein elements of the histogram are
`neighbours`/`adjacent` if their component values differ by no more
than one in no more than a specified number of dimensions.
4. optionally sorting the histogram's entries into buckets, ordered
by histogram value, largest first.
5. For each histogram value found, starting from the highest and
working down, consider the set of colours SC with that histogram
value, performing steps 6 and 7.
6. For each colour C in SC, {If there are no neigbour colours that
already have a class, make a new class, for which this colour is
the seed point, with geodesic distance 0. OR If all neighbours
share one class: Add this element to that class; determine the
geodesic distance for this element by choosing the adjacent element
with the lowest geodesic distance, and assigning the same distance,
if they have the same histogram value, or that distance+1
otherwise. OR If there are neighbours of 2 or more classes, leave
unassigned for the moment. }leaving the only remaining colours at
this histogram level which have neighbours of more than one
class.
7. For each remaining unassigned colour C in SC {Add C to the class
of the neighbour having the lowest geodesic distance, assigning the
same distance, if they have the same histogram value, or that
distance+1 otherwise. }so that all the colours in the image have
been assigned a class.
8. optionally removing classes that represent, only a small number
of elements of the image, by reassigning their colours to (a)
nearby class(es)
9. optionally ssign each element in the image the label of the
class that its colour belongs to, and
10. optionally assigning each connected (in image space) set of
elements of the image all having colours of the same class a unique
label to identify that region.
Description
[0001] In recent years, vast strides have been made in the field of
computer-assisted image processing. The creation and manipulation
of images has proved a boon to many engaged in the graphic arts
field, industrial monitoring, and surveillance, but there are still
problems in the initial stages of rendering an already existing
image into processable form. The classic approach to securing a
computerised image is to scan a photographic original to form a
file in which data are stored representing properties of a large
number of portions of the image, so-called pixels. Each pixel is
characterised by a number of parameters corresponding to colour and
intensity, and the file contains data relating to the location of
each pixel so that when the file is called up by an appropriate
program, the image is displayed on screen. Most recently, the
process of scanning has been supplemented by the development of
so-called digital camers, which produce an image file directly.
[0002] In order to process the image to the form desired by the
user, it often needs to be broken down into different parts, for
example those corresponding to background and displayed object, in
order to change the colour balance of the background without
affecting that of other parts of the image. This process of
segmentation is time-consuming and requires a high degree of skill.
Attempts to automate the process have been made, but they do not
work well or easily, as the intellectually comprehensible pieces of
an image, clear to any human viewer, are simply not easily
identifiable by a computer.
[0003] The present invention seeks to provide a method of analysing
the data in an image file to yield information quite independent of
human intervention. It seeks to enable patterns or structures
within a data set, if such exist, to be revealed and used, both to
describe the data and to make predictions if such patterns recur.
The method does not depend upon superimposed assumptions based upon
current theory and knowledge. It is implemented by the use of a
computer system, and thus can enable that computer system to
receive or gather data about the external world, either by
importation of a picture of that world, or by direct input from a
suitable camera" system, and then to analyse such data without
preconceptions The system can thus arrive at an understanding of
the world, and, if desired enable accurate predictions to be made
about the world. Such a system can thus be seen as forming a basis
for machine intelligence.
[0004] In accordance with the invention, a digitised image is
segmented on the basis of colour by a method as set out in the main
claim. An alternative approach is set out in claim 2.
[0005] The invention essentially uses homotopic transformations,
specifically the watershed transformation, within feature space
formed by the colour components of the pixels of an image. These
may be represented in many colour spaces, such as HLS (Hue,
Lightness, Saturation), RGB (Red, Green, Blue), or CYMK (Cyan,
Yellow, Magenta, Key). Images may be processed in their original
colour space, or transformed into a different one before
processing. Furthermore, additional channels of information may be
generated algorithmically from the data, for example by performing
an analysis relating to texture for each pixel, and included in the
classification process. Additionally, rather than performing only
one classification, in all the dimensions at once, there exists the
option of performing several classifications, each in a subspace of
the feature space, and of then making reference to some or all of
the classifications in the segmentation process. This enables a
unique number of colours, groups of colours, or contiguous regions
in feature space, hereinafter called classes, to be found for that
image. Each class is given a label and the image is then segmented
into regions of uniform labels. The colour of each segment can
then, for example, be altered or used to identify an object.
[0006] The method can be applied to any scalable (either integer or
real) data set. While the usual number of dimensions for the
histogram is three, it is of course possible to use more, or fewer,
if desired, but use of more than three dimensions materially
increases the amount of computing power and computer memory
required to carry out the necessary analysis. It should be noted
that in using the method, the lattice resolution and connectivity
will both affect the number of sets (as defined below) found.
[0007] In order to understand the theory behind the method, it is
necessary to bear in mind the following definitions:
[0008] Lattice--A set of nodes and connections between the nodes.
An Euclidean lattice is a square grid pattern, with the nodes being
the intersections, and the lines between them representing the
connections. For a simple 2-dimensional Euclidean lattice, there
are two possible ways of defining the connectivity. a) The
4-connected lattice consists of allowing nodes to be adjacent only
if they differ by only one on one dimension. This means that each
node on the lattice has 4 neighbours, hence the name. b) The
8-connected lattice consists of allowing nodes to be adjacent if
they differ by only one on any number of dimensions, including the
case where the node differs by one on both dimensions. Thus the
connectivity includes the diagonal nodes and there are in this case
8 nearest neighbours, and hence the name The nature of any lattice
and its connectivity can be defined for any number of dimensions by
extension of the 2-dimensional case.
[0009] Geodesic distance--Any distance on a lattice must be
measured along a continuous string of adjacent nodes. The string of
adjacent nodes is known as a path. The geodesic distance between
any 2 nodes is the path, out of all possible paths between the
nodes, which has the minimum distance.
[0010] Set--A collection of nodes, each of which is adjacent to at
least one other node in the group. Thus, you can, by moving only
between adjacent nodes on the lattice, get from any member of the
set to any other.
[0011] Jordan's Lemma--A process (such as adding/subtracting a node
to a set) preserves homotopy if nodes that were previously
connected by a path remain connected, and those that were not
connected by a path remain disconnected. Thus, if there exist n
sets before the operation, there must only be n sets after the
operation. Processes which satisfy the homotopy condition, if used
exclusively, will guarantee that if there are n sets in a data set
represented on a given lattice, then only n sets will be
`discovered` or revealed at the end of applying the processes.
[0012] Monotonic--A series of numbers in which each number is
either as great (small) or greater (smaller) than its
predecessor.
[0013] Fall-set--This is a path in which the numerical value, of
the nodes varies monotonically, starting from a high value. If the
path were followed from its high end to its low end, then by
analogy with water flowing down a hillside this would describe how
the water flowed. The nodes to which no water flows, and from which
water flows, define a `watershed`. Hence, the algorithms that
discover these nodes are known as `watershed` algorithms.
[0014] The method of the invention is based on the idea of a
fall-set, geodesic distance, and Jordon's Lemma. Jordon's Lemma
gives the legitimate transformations of sets such that homotopy
(number of sets) is preserved. Definition of a set: All those
points that are connected continuously. Contintuity is defined as
being equivalent to adjacency on the lattice. Therefore, a set is
all those points that are connected by a string of points that are
members of the set. The Lemma states that a point can be added (or
subtracted) provided that no point previously unconnected becomes
connected (or any point previously connected becomes disconnected).
A transformation that satisfies this restriction will conserve
homotopy or keep the number of sets constant. By selecting as a
seed point the highest point of the histogram which must belong to
a set and then adding to it adjacent points that are lower and
uniquely connected to that initial point (thus satisfying Jordan's
Lemma), the fall set for that seed point will be discovered and
defined. The identification by the algorithm of other seed points
that are independent and uniquely definable ensures that if there
are n sets (groups, classes) in the feature space only n will be
found. By using the geodesic measure from the seed point for every
point in the set, those points on the lattice that may be connected
to several sets, and are therefore boundary points, can be assigned
to the group they are nearest to in terms of geodesic distance from
the seed point of that group.
[0015] The process is thus capable of discovering the number of
homogeneous colours there are in any image. Once the number and
precise definition of the groups is known then the original image
can be segmented into homogeneous regions.
[0016] The method of the invention provides segmentation in a
fashion which is qualitatively distinguished from previous
approaches, which have used statistical decision theory.
Instatistical decision theory, the number and mathematical
description of the sets within the data, or the distributions
within a feature space are assumed to be known. Further, it is
assumed that the shape of each set is expressible as a continuous
mathematical function. All these assumptions are totally erroneous.
There is no rational procedure to calculate the number of colours
in an image. The shape of the sets within the feature space
(histogram) is never regular either in outline or in profile. The
data are discrete and in no respect can such data be treated as
continuous. At best, the traditional statistical decision theory
approach is a very poor approximation to the actual distributions
within feature space, and this inevitably leads to errors in
assigning pixels to a set. In contrast, the method according to the
invention describes the data accurately and therefore cannot be
bettered, merely equalled in accuracy. It is the only method that
can find an answer to the problem of `how many sets or colours are
there in any one image?`. The answer for any given resolution
(radiometric) and lattice connectivity is unique.
[0017] Instances of a data set, measured on at least interval
scales, can be represented within a feature space, the axes of
which are the dimensions of the data. Once the data is so
represented, then the watershed algorithm can be used to describe
precisely the number, size and shape of the independent sets
(classes, groups) of the data. This is the basis of knowledge. Once
the division of the feature space into its constituent sets
(classes) has been completed, then it only remains to assign
significance to each set and then to make predictions on the basis
of a point belonging to a particular set. The watershed algorithm
is thus a rigorous method of describing data for the purpose of
prediction. It can also be used to segment (determine to which
colour group a particular pixel belongs) any data set. It is unique
in being able to describe any data set precisely in terms of the
number, shape, and size of its components. No other method can ever
do better than, only at best duplicate, the watershed solution.
[0018] The watershed algorithm used in the present invention can be
applied to any data set that has the following characteristics:
[0019] 1. The number of instances is large.
[0020] 2. The component measures for each instance within the data
set are all at least interval.
[0021] These are not severe restrictions and therefore the method
can be applied to most data sets.
[0022] The invention is illustrated by way of the following
example, which is, to conserve space and aid clarity of
understanding, concerned with a data set microscopic compared with
any real data set representing a pixellated image. However, it is
believed that it serves to illustrate how the method is applied.
The example describes the processing in simple terms, but in real
implementation, all of the processes are carried out using
approprite computer programming.
EXAMPLE
[0023] Let us suppose we have a small image of 5 by 5 pixels, and
that we have the red and green values for each pixel. This will
define a 2-dimensional problem, and we would like to segment the
image into its constituent `colours`, purely on the basis of the
pattern of adjacent points within the feature space defined by the
values of the two colours, red and green. Tables 1 and 2 show the
components of an image, one for the red component, and the other
for the green component.
1TABLE 1 Red Component 4 3 4 4 5 5 4 1 2 1 5 3 0 1 1 1 1 0 1 2 0 1
1 1 1
[0024]
2TABLE 2 Green Component 1 1 1 0 1 2 1 5 4 3 1 0 4 5 3 5 4 5 3 4 5
5 4 4 4
[0025] The first step is to construct a hash table (the most
economical method of storing sparse data) containing in the
following order:
[0026] The hash table position, the red component, the green
component, the number of pixels having these 2 components, the
label for the class (set). The hash function in this example is
L=Mod((R+7*G):13)
[0027] where R is the red value, G the green value, and L is the
remainder after dividing by 13 the result of R+7*G. (1, 7, and 13
are prime numbers: the hashing technique of storing data works best
with prime numbers, and the choice of the prime numbers in the
hashing function depends on the range of values to be stored.) Each
pixel is taken in turn and the hash table is constructed with the
entry for each combination being incremented each time that
combination occurs, so that the final value will give the height of
the histogram for that combination. As each pixel is entered into
the hash table, the red and green components are checked, and if
either differs the hash table key is incremented until an empty
field is found, into which the colour is added. The result is shown
in Hash Table 1:
3 HASH TABLE 1 1 2 0 4 1 3 1 4 4 4 2 4 2 5 3 0 1 6 4 0 1 7 5 2 1 8
3 1 1 9 0 5 2 10 1 3 3 11 1 5 4 12 4 1 3 13 5 1 2
[0028]
4 HASH TABLE 2 3 1 4 4 11 1 5 4 10 1 3 3 12 4 1 3 1 2 4 2 9 0 5 2
13 5 1 2 2 0 4 1 5 3 0 1 6 4 0 1 7 5 2 1 8 3 1 1 1
[0029]
5HASH TABLE 3 1 2 0 4 1 g 3 1 4 4 g 4 2 4 2 g 5 3 0 1 r 6 4 0 1 r 7
5 2 1 r 8 3 1 1 r 9 0 5 2 g 10 1 3 3 g 11 1 5 4 g 12 4 1 3 r 13 5 1
2 r
[0030] The feature space will be considered as an Euclidean lattice
that is four connected. In order to facilitate the procedure, the
hash table is reordered using the fourth column (histogram height)
such that the highest values occur at the beginning of the table.
The result is as shown in Hash Table 2. Note that if two hash table
entries differ by 1 in their red or green components (but not both)
then they are considered to be adjacent. If a candidate entry is
being considered as belonging to an already existing group, then
provided it is adjacent to one entry of the group and not adjacent
to any entry that belongs to another group, then it can be added to
that group. This procedure conforms to Jordan's Lemma.
[0031] Starting at the beginning of the table, column 4,
representing the histogram value for each colour, is scanned to
find the maximum value. The maximum is found to be 4, and there are
two such fields to consider.
[0032] These differ in colour by 1 in one dimension only and are
thus adjacent, and are given the same label g. Column 4 of the hash
table is scanned for any field containing the histogram value 3.
There are two. The first (position 10 in table 1) differs in colour
from one of the 4s (position 3 in table 1) by 1 in only the green
component and therefore can be assigned to the same group g. The
second 3 (position 12 in table 1) differs by more than one from all
the already examined entries and therefore this position is not
adjacent to the g labelled positions, and is thus part of another
set, and is given the label r. There being no more 3s to consider,
column 4 of Hash Table 1 is next scanned for 2s. There are four,
namely positions 4, 9, 10, and 13. That in position 4 differs by
one in one dimension from that in position 3 and is therefore
assigned label g. Position 9's entry is adjacent to position 11's,
position 10's is adjacent to position 3's, and position 13's is
adjacent to position 12's. None of these positions is adjacent to
positions that belong to more than one group and therefore each can
be added without violating Jordan's Lemma. Each is given the
appropriate label. All the entries with histogram value 2 have been
accounted for so the is are located in column 4. There are five, in
positions 2, 5, 6, 7, and 8 of Hash Table 1. Position 2's is
adjacent to two, those at positions 3 and 9 of Hash Table 1. Both
these have the same label, so that in position 2 can be assigned to
that label. Position 5's entry is not adjacent to that of any
already labelled position so is left unlabelled. Positions 6, 7,
and 8 have entries which are adjacent to an already labelled
position and are therefore assigned the same label. Position 5 is
again examined, and its entry is now found to be adjacent to that
of two positions, 8 and 6. Both these have the same label so the
position acquires that label, r. The result can be seen in Hash
Table 3.
[0033] If a position is found to be adjacent to two (or more)
positions which do not have the same label, it is a boundary node,
and is given the label of the set to which it is closest, in the
following sense. The geodesic distance to the first assigned
position is computed for each of the sets and the smallest found.
If the distances are equal the position is assigned to the first
found group. The assignment of boundary points is not necessary to
the procedure--it is a convenience that ensures a classification
label is attached to every data combination. It is of course
possible to leave these boundary combinations unlabelled and deal
with them in some other manner.
[0034] The appropriate label can now be assigned to each pixel by
computing the hash table key for that pixel and reading the label
from the hash table. The result is as follows:
6TABLE 6 Final `Segmented` Image r r r r r r r g g g r r g g g g g
g g g g g g g g
[0035]
7TABLE 7 Feature Space (Histogram) 0 1 2 3 4 5 0 0 0 0 0 1 2 1 0 0
0 2 4 4 2 0 0 0 0 2 0 3 1 1 0 0 0 0 4 1 3 0 0 0 0 5 0 2 1 0 0 0
[0036] Table 7 is the histogram or feature space, as it would
normally be represented. Using the fall set idea it is easy to see
that there are just two sets in the data. When more dimensions are
used the difference in time taken to scan the two tables (13 checks
for the hash table, and 25 for the histogram in this case) becomes
increasingly great, such that only a hash table method is
feasible.
* * * * *