U.S. patent application number 09/159094 was filed with the patent office on 2001-08-23 for image segmentation apparatus and method.
Invention is credited to LI, XING, MEYERS, MICHAEL E., TSE, FRANCIS K..
Application Number | 20010016072 09/159094 |
Document ID | / |
Family ID | 22571062 |
Filed Date | 2001-08-23 |
United States Patent
Application |
20010016072 |
Kind Code |
A1 |
LI, XING ; et al. |
August 23, 2001 |
IMAGE SEGMENTATION APPARATUS AND METHOD
Abstract
The present invention provides a method and apparatus for
classifying image data. A white class detector may receive input
image data and determine a white class of a specified pixel based
on local roughness, peak or count or valley count and a brightness.
A pattern detection device may detect a pattern of pixels across a
scan line or matrix of input data and compare the pattern with a
plurality of predetermined pixel patterns.
Inventors: |
LI, XING; (WEBSTER, NY)
; MEYERS, MICHAEL E.; (FAIRPORT, NY) ; TSE,
FRANCIS K.; (ROCHESTER, NY) |
Correspondence
Address: |
OLIFF & BERRIDGE
P O BOX 19928
ALEXANDRIA
VA
22320
|
Family ID: |
22571062 |
Appl. No.: |
09/159094 |
Filed: |
September 23, 1998 |
Current U.S.
Class: |
382/224 |
Current CPC
Class: |
G06T 2207/10008
20130101; G06T 7/136 20170101; G06T 2207/30176 20130101; G06T 7/11
20170101; H04N 1/40062 20130101 |
Class at
Publication: |
382/224 |
International
Class: |
G06K 009/62; G06K
009/40 |
Claims
What is claimed is:
1. An apparatus for classifying image data comprising: an input
device that receives the image data; and a white class detector
that receives the input image data and determines a white class of
a specified pixel based on local roughness of a pixel, one of a
peak count and a valley count within a neighborhood of the pixel
and a brightness of the specified pixel.
2. The apparatus of claim 1, further comprising a pattern detection
device that receives the input image data and detects a pattern of
pixels across one scan line or matrix of the input image data, the
pattern detection device compares the pattern of pixels with a
plurality of predetermined pixel patterns in order to classify the
data.
3. The apparatus of claim 2, further comprising a classification
output device that outputs a signal corresponding to the
classification of the input image data based on at least the
predetermined pixel pattern corresponding to the pattern detected
by the pattern detection device.
4. A method for classifying image data comprising the steps of:
receiving the image data; determining a brightness of a pixel under
consideration; determining a local roughness of the pixel;
obtaining one of a peak count and a valley count within a
neighborhood of the pixel under consideration; and determining a
white class of the pixel under consideration based on the
determined brightness, the determined local roughness and the
obtained peak count or the obtained valley count.
5. The method of claim 4, further comprising the step of detecting
a pattern of pixels across one scan line or matrix of the input
image data, and comparing the pattern of pixels with a plurality of
predetermined pixel patterns in order to classify the data.
6. An apparatus for classifying image data comprising: an input
device that receives the image data; a pattern detection device
that detects a pattern of pixels across one scan line or a matrix
of pixels in the image data, and that compares the pattern with a
plurality of predetermined pixel patterns; and a classification
device that outputs a classification signal based on the comparison
of the pattern detection device.
8. The apparatus of claim 7, further comprising a white class
detector that receives the input image data and determines a white
class of a specified pixel based on local roughness of a pixel, one
of a peak count and a valley count within a neighborhood of the
pixel and a brightness of the specified pixel.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of Invention
[0002] This invention relates to an image processing method and
system. More particularly, this invention classifies input image
pixels into different classifications prior to output.
[0003] 2. Description of Related Art
[0004] In digital reproduction of documents such as in the digital
copier environment, a document is first optically scanned and
converted to a gray scale image. In the case of color reproduction,
the document may be converted to a gray scale image of several
separations, such as the R, G and B separations. In order to
produce a hard copy of the scanned and digitized image, the image
has to be further processed according to the requirements of the
marking engine. For example, if the marking engine is capable of
bi-level printing, then the image has to be rendered into a 1-bit
bit map for printing. To preserve the appearance of a gray scale
image in a binary output, often some digital halftoning process is
used in which the multi-bit input image is screened with a periodic
array. However, if the original image itself contains halftone
screen, objectionable moire patterns may occur due to the
interference between the original and the new screens. Also, while
dot screen halftoning may be good for rendering continuous tone
originals, it may degrade the quality of text and line drawings.
Often a document contains different types of images. In order to
achieve optimal image quality in document reproduction, a system
capable of automatically identifying different types of images
within a page is needed. For example, if an image part is
identified as halftone, then some kind of low-pass filtering may be
applied prior to halftone screening so the gray scale appearance
can be preserved without introducing moire patterns. For text area,
some sharpness enhancement filter could be applied and other
rendering techniques such as thresholding or error diffusion could
be used.
[0005] Early work on image segmentation for the purpose of document
reproduction dates back to the 1970s. U.S. Pat. No. 4,194,221, the
subject matter of which is incorporated herein by reference,
discloses a method for automatic multimode reproduction. It employs
autocorrelation in halftone detection. Since then, a lot of work
has been published in the area of image segmentation. See, for
example, U.S. Pat. No. 4,740,843, the subject matter of which is
incorporated herein by reference, discloses the method of halftone
image detection by measuring the distance between successive gray
level maxima. U.S. Pat. No. 5,341,277, the subject matter of which
is incorporated herein by reference, discloses a dot image
discrimination method that counts density change points within an
area. One disadvantage that is common to the existing image
segmentation systems is the rigidity of the system structure.
Usually the system only provides several programmable parameters
used for thresholds in detecting video maximum/minimum, halftone
dot counting, etc. It does not provide much flexibility to support
processing/rendering optimization and to cope with requirement
change. There are other shortcomings in the existing segmentation
systems that are related to using fixed threshold in halftone dot
detection, using simple average in halftone dot counting, etc.,
which could result in misclassification in certain area.
SUMMARY OF THE INVENTION
[0006] The present invention provides a method and apparatus for
classifying image data. In one embodiment, a video peak/valley
counter may count one of peaks and valleys within a window of the
input image data. A local roughness device may determine a local
roughness of the input image data. In one embodiment, the input
image data may be classified based on the count of the video
peak/valley counter device and the local roughness of the local
roughness detector.
[0007] In one embodiment, a neighborhood average gray value may be
determined for the input image data. A pixel under consideration
may be evaluated to determine if it is a peak or valley based on
whether its brightness is greater or less than a peak threshold
value or valley threshold value, which are based on the
neighborhood average gray value.
[0008] In one embodiment, a peak/valley detection device may
determine one of a peak and a valley count within a window of the
image data around a pixel under consideration. A neighborhood
checking device may check whether any video peaks or valleys are
located within a neighborhood of the pixel under consideration.
[0009] In one embodiment, a halftone dot count of a window may be
determined. If the determined halftone dot count is less than a
predetermined number, then a neighborhood of the pixel under
consideration is checked for any peaks and valleys. The data is
then classified based on the number of peaks and valleys if there
are any peaks or valleys within the neighborhood.
[0010] In one embodiment, pixels within a window may be evaluated
to determine respective peaks and valleys. Each of the pixels
within the window may be evaluated unless any pixel within a
neighborhood of a desired pixel has previously been classified as a
peak or valley.
[0011] In one embodiment, a processing device may determine a peak
or valley within a window of the image data. The window may include
a neighborhood of pixels about a specified pixel. The processing
device may determine the peaks and valleys within the window unless
a pixel within the neighborhood has been determined to be a peak or
valley.
[0012] Other objects, advantages and salient features of the
invention will become apparent from the following detailed
description taken in conjunction with the annexed drawings which
disclose preferred embodiments of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The invention will be described with reference to the
following drawings in which like reference numerals refer to like
elements and wherein:
[0014] FIG. 1 shows one example of a video matrix;
[0015] FIG. 2 shows one embodiment of the present invention;
[0016] FIG. 3 shows a two-dimensional look-up table in accordance
with the present invention;
[0017] FIG. 4 shows eight patterns;
[0018] FIGS. 5A and 5B show examples of a video context window;
[0019] FIG. 6 shows one example of a neighborhood for a pixel under
consideration;
[0020] FIG. 7 shows another example of a neighborhood for a pixel
under consideration;
[0021] FIGS. 8A and 8B show video peaks and valleys in a 24.times.8
window; and
[0022] FIG. 9 shows a plot of threshold and video average in
accordance with the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] In accordance with one embodiment of the present invention,
segmentation may classify an image, on a per pixel basis, into one
of several possible classifications. For example, input video
pixels may be classified, on a pixel-by-pixel basis, into one of 32
different image types. This classification may be known as tags or
effect pointers. The tags may be used by downstream image
processing to specify different filtering, rendering and other
operations based on the classification.
[0024] The present invention preferably accomplishes this by
looking at a 5.times.5 (fast scan by slow scan) pixel context and
determining various characteristics such as the presence and
magnitudes of edges (horizontal or vertical), average value of
video, minimum and maximum values of video, etc. Many of the
functions in the segmentation process are easily defined in terms
of the 5.times.5 video matrix. This matrix is shown in FIG. 1 and
can be viewed as a "window" that slides across and down the input
image. The center pixel, V.sub.22, is the pixel being
processed/classified. The pixels may be generally referenced as
V.sub.ij where i is the slow scan index and j is the fast scan
index. A video matrix of 5.times.5 is used herein as a preferred
example; however, video matrices other than 5.times.5 are also
within the scope of this invention.
[0025] The shift array may develop the video context matrix as the
input image moves through the processing architecture. The outputs
of this module may be fed into the various modules that need all or
some of this context. The data is preferably stored in a buffer or
buffer-like device prior to and during preprocessing. Accordingly,
as soon as the value of pixel V.sub.44 is available, the
classification of pixel V.sub.22 can be started. A shift array may
be used that brings in the current scan line and the four previous
scan lines that have been stored in the scan line buffers.
[0026] FIG. 2 shows a possible architecture for the image
segmentation apparatus and method according to the present
invention. One skilled in the art would understand that processing
may occur on a special purpose computer, a programmed
microprocessor or microcontroller and peripheral integrated circuit
elements, an ASIC or other integrated circuit, a hardwired
electronic or logic circuit such as a discrete element circuit, a
programmable logic device such as a PLD, PLA, FPGA or PAL, or the
like. Furthermore, specific algorithms may be accomplished using
software in combination with specific hardware.
[0027] As shown in FIG. 2, two major features that may be extracted
for segmentation in accordance with an embodiment of the present
invention are video peak/valley count within a window containing
the pixel being classified and the local roughness. Local roughness
may represent the degree of gray level discontinuity computed as a
combination of some gradient operators. One example of local
roughness is the difference between the maximum and minimum of nine
3.times.3 window sums within a 5.times.5 video context. Other
methods of determining the local roughness are also within the
scope of the present invention. On the other hand, a pixel may be
considered as a video peak or valley if its gray level is the
highest or lowest in the neighborhood and the gray level difference
between the pixel and the neighborhood average is greater than a
certain threshold. Other methods of determining video peaks and/or
valleys are also within the scope of the present invention.
[0028] Several lines of peak and valley patterns may be recorded in
scan line buffers for computing peak/valley count within a defined
window. In accordance with one embodiment, peak/valley count and
local roughness may be used as indices to form a two-dimensional
look-up table (hereafter also called a classification table) as a
basis to classify data. FIG. 3 shows one example of a
two-dimensional look-up table that uses five roughness levels and
twelve peak/valley count levels. This results in sixty
classification table entries (i.e., 5.times.12=60). Depending on a
location within the look- up table, the video data may be mapped to
certain classes such as low frequency halftone, high frequency
halftone, smooth continuous tone, rough continuous tone, edge, text
on halftone, etc. Depending on the class, the input data may be
processed differently.
[0029] The two-dimensional look-up table allows for flexibility in
processing and rendering decision making, which in turn makes it
possible to use a smaller video context and intermediate results
buffer in the segmentation, and at the same time improve the image
quality in areas such as stochastic screens, line screens, etc.
[0030] The look-up table (i.e., classification table) may be
complemented with some special classes. One of them is the "edge
class". It tries to identify some line art and kanji area that
could be missed by the look-up table. Another special class is the
"white class". It makes use of the absolute gray level information
in addition to peak/valley count and roughness. The "default class"
shown in FIG. 2 may be used for the borders of an image. The
classification look-up table output may be multiplexed with the
special classes to produce the final classification of a pixel
(i.e., class output). The classification table assignment may be
programmable, which allows for more flexibility in rendering
adjustment.
[0031] As described above with respect to one embodiment, major
features that may be extracted for segmentation include local
roughness and video peak/valley count. The local roughness may be
the difference between the maximum and minimum of a nine 3.times.3
window sum within a 5.times.5 video context, which extracts edge
and text information effectively in most cases. However, the
roughness detector are not sensitive to some line art and kanji
text patterns and are prone to false detection of video
peak/valley. To limit these problems, in one embodiment pattern
detection may be introduced to complement other parts of the
segmentation algorithm. This technique looks at the pattern of
pixels across a scan line (or matrix) in order to classify the
data. FIG. 4 shows eight examples of patterns which may be used in
the segmentation algorithm of the present invention. In this
figure, .Arrow-up bold. and .dwnarw. denote transitions required
across one line of the 5.times.5 video context while represents
that a certain threshold has to be met. These patterns are best
defined according to certain parameters that must be met to
classify the pixels as corresponding to a pattern. One embodiment
of these parameters are described below. One skilled in the art
would understand how these rule-based parameters are implemented in
the algorithm of the present invention.
[0032] V.sub.ij represents the pixel at the ith row and the jth
column with the row being in the fast scan direction and the column
being in the slow scan direction.
[0033] Next, sumSS [j] is defined as the sum of the five pixels in
the jth column, or 1 sum Ss [ j ] = i = 0 4 V i , j o j 4.
[0034] Another variable array is updownFs[j], 0<=j<=3. This
is used to signal transitions. The assignment is determined as
following: if the following conditions are met
[0035] i) V.sub.ij<=V.sub.ij+1 for all i
[0036] ii) sumSs[j]<sumSs[j+1]
[0037] then updownFs[j] is assigned the value 1; on the other hand,
if the following conditions are met
[0038] i) V.sub.ij>=V.sub.ij+1 for all i
[0039] ii) sumSs[j]>sumSs[j+1]
[0040] then updownFs[j] is assigned the value 2. Otherwise,
updownFs[j] will be neither 1 nor 2.
[0041] Patterns 1-8 shown in FIG. 4 are then classified as
corresponding to a specific pattern based on several variables. In
a preferred embodiment, this pattern classification is as
follows:
[0042] Pattern 1.
[0043] i) updownFs[0]=2
[0044] ii) updownFs[2]=1
[0045] iii) min(sumSs[0],sumSs[3]-min(sumSs[2],sumSs[2])
>SUMDIF1
[0046] Pattern 2.
[0047] i) updownFs[1]=2
[0048] ii) updownF3[3]=1
[0049] iii) min(sumSs[1],sumSs[4])- min(sumSs[2],sumSs[3])
>SUMDIF1
[0050] Pattern 3.
[0051] i) updownFs[0]=1
[0052] ii) updownFs[1]=1
[0053] iii) updownFs[2]=2
[0054] iv) updownFs[3]=2
[0055] v) sumSs[2]-sumSs[0]>SUMDIF2
[0056] vi) sumSs[2]-sumSs[4]>SUMDIF2
[0057] Pattern 4.
[0058] i) sumSs[3]-sumSs[2]>SUMDIF3
[0059] ii) sumSs[2]-sumSs[1]>SUMDIF3
[0060] Pattern 5.
[0061] i) updownFs[1]=1
[0062] ii) updownFs[2]=2
[0063] iii)
max(sumSs[1],sumSs[2])-max(sumSs[0],sumSs[3])>SUMDIF4
[0064] Pattern 6.
[0065] i) updownFs[1]=1
[0066] ii) updownFs[3]=2
[0067] iii)
max(sumSs[2],sumSs[3])-max(sumSs[1],sumSs[4])>SUMDIF4
[0068] Pattern 7.
[0069] i) updownFs[0]=2
[0070] ii) updownFs[1]=2
[0071] iii) updownFs[2]=1
[0072] iv) updownFs[3]=1
[0073] v) sumSs[0]-sumSs[2]>SUMDIF5
[0074] vi) sumSs[4]-sumSs[3]>SUMDIF5
[0075] Pattern 8.
[0076] i) sumSs[1]-sumSs[2]>SUMDIF6
[0077] ii) sumSs[2]-sumSs[3]>SUMDIF6
[0078] FIGS. 5A and 5B show examples of a 5.times.5 video context
window with the pixel values shown. The numbers (200, 50 etc.) are
gray levels of the pixels. By using such a pattern detector, then
text areas can be detected that would not be detected by a
roughness detector. As can be seen, FIG. 5A corresponds to Pattern
1 shown in FIG. 4 while FIG. 5B shows an example of a video context
window corresponding to Pattern 3.
[0079] The patterns described above identify certain gray level
transitions in the fast scan direction. Patterns in the slow scan
direction may also be detected in accordance with the present
invention.
[0080] In processing and rendering of background areas, a white
class may be desirable that makes use of the absolute gray value of
the pixel as well. In accordance with one embodiment of the present
invention three features may be used in detecting white class,
namely, the brightness, the roughness and the halftone dot count.
Prior segmentation schemes rely only on peak/valley count and
background threshold to determine white class. However, including
both roughness and halftone dot count to detect a white class adds
flexibility to the algorithm without a significant cost
increase.
[0081] In order to qualify as white, first the gray level of the
pixel under classification should be greater than a predetermined
value. The predetermined value may be a default number or may be
obtained by computing the lead edge histogram of the image and
detecting the peak at the light end of the histogram. The histogram
of the whole page may also be used if it is available. Second, the
local roughness of the pixel should also be below a certain
threshold. As discussed above, the local roughness may be the
difference between the maximum and minimum of the nine 3.times.3
window sums within a 5.times.5 video context. Third, the halftone
dot count within a defined neighborhood of the pixel should be
small enough.
[0082] As is well known to one skilled in the art, low-frequency
halftone images are often processed and rendered differently than
other types of pictorials such as high-frequency halftones,
continuous tones, etc. For example, high-frequency images may be
converted to continuous tone images using a low-pass filter and
then re-screened for printing. Low-frequency halftones, on the
other hand, are often rendered with error diffusion. There are many
circumstances in which a non low-frequency area could be classified
as a low-frequency halftone if the video peak/valley count is the
only criterion. Stochastic screen originals and line screen
originals are some typical examples. Also, some part of
high-frequency halftones may be misclassified as low-frequency
image areas due to the missing peaks/valleys in some local areas.
False detection of low-frequency halftone may result in severe
artifacts. One embodiment of the present invention uses a
neighborhood checking mechanism to reduce the false detection of
low-frequency halftones.
[0083] As discussed above, with respect to one embodiment, the
halftone dot within a window (i.e., 24 columns by 8 lines) and the
local roughness may be used as indices to form the look-up table.
The entries of the look-up table are then mapped to certain number
of classes. Neighborhood checking may be performed when the
halftone dot count within the window is smaller than a programmable
parameter. The algorithm checks a defined neighborhood of any
peak/valley within the window. FIG. 6 shows one example of the
shaded pixels forming the neighborhood of the black pixel. If there
are peaks/valleys in the neighborhood of a peak/valley, then
instead of using the original halftone dot count, a special index
may be given to the pixel under detection. This neighborhood
checking ensures that unless the video peaks/valleys within a
window are some distance apart, the pixel will not be considered as
a low-frequency halftone. That is, if there are closely located
peaks/valleys within a window, then the pixel is not part of a
low-frequency halftone but rather may be a high-frequency
halftone.
[0084] FIG. 7 shows a similar embodiment in which the black pixel
is under detection and the shaded pixels are neighboring pixels of
interest. Using appropriate logic such as hardware, the pixel under
detection will not be considered as a peak or a valley if any of
the neighboring pixels of interest (i.e., the shaded pixels) are
peaks or valleys. In other words, since by the time the pixel under
detection is being examined and the neighboring pixels of interest
have all been detected, the logic is easy to implement in hardware.
The logic guarantees that if a pixel is detected as a peak or
valley, then none of its eight immediate neighbors will be peaks or
valleys.
[0085] In one embodiment of the present invention, a triple window
may be used to determine peak/valley counts and the halftone dot
count average associated with the pixel under classification. For
example, the peak/valley information in a window of 24 columns by 8
lines around a pixel is examined. FIG. 8A shows the 24.times.8
window divided into three smaller (8.times.8) windows. In each of
the 8.times.8 windows, the greater of the peak and valley counts is
chosen to represent the halftone dot count of that window. This
improves the accuracy of counting in the area with major gray level
changes.
[0086] Given the halftone dot counts of the three 8.times.8
windows, a set of rules may be used to determine the final halftone
dot count associated with the pixel under classification. The
halftone dot counts of the three windows are considered dotLeft,
dotMid and dotRight respectively, and dotCount is the final
halftone dot count. The rules can be described by the following
C-like statement,
1 if (dotLeft > dotMid && dotMid < dotRight) {
dotCount = min(dotLeft,dotRight)*2; } else if (dotLeft < dotMid
&& dotMid > dotRight) { if (min(dotLeft,dotRight) == LOW
COUNT) { dotCount = dotMid + dotRight; } else { dotCount =
max(dotMid,dotright)*2; } } else { if (dotMid > = LOWCOUNT) {
dotCount = dotMid + max(dotLeft,dotRight); } else { dotCount =
dotMid + min(dotLeft,dotRight); } }
[0087] where LOWCOUNT is a programmable parameter.
[0088] Using the above rules instead of simple averaging improves
the halftone dot count in areas of peak/valley misdetection or
where peak/valley density transitions occur.
[0089] More specifically, FIG. 8A shows an example of video peak
pattern in a 24.times.8 window and FIG. 8B shows an example of
video valley pattern in a 24.times.8 window. The dotLeft, dotMid
and dotRight are 4, 4 and 5 respectively. If LOWCOUNT is set to be
3, then dotCount, which reflects two thirds of the halftone dot
count within the 24.times.8 window, is 9. This type of rule-based
calculation generally works better than simple averaging in
transitional areas.
[0090] The gray difference between a peak or valley and its
neighbors is greatest in mid-tone areas as compared to highlight or
shadow areas. Thus, one embodiment of the present invention links
the threshold for peak/valley detection to the neighborhood average
gray value so as to reduce misclassification. For example, a pixel
may be considered as a video peak/valley if its gray level is the
highest/lowest in the neighborhood and also, the gray level
difference between the pixel and the neighborhood average is
greater than a certain threshold. The qualifying conditions of
peak/valley and the definition of neighborhood may vary. If the
threshold for peak/valley detection is set too high, then some
halftone dots in the highlight or shadow area may be missed. On the
other hand, if the threshold is set too low, then some potential
noise or non-halftone video gray level variation could be falsely
identified as halftone dots. To reduce misclassification, the
threshold for peak/valley detection may be tied to the neighborhood
average gray value.
[0091] The threshold-video average correlation may be established
through statistical analysis. The implementation may be a look-up
table or some simple formula. For example, the
Threshold-Video_Average plot shown in FIG. 9 may be represented by
the following equation,
Threshold=C1-Video_Average/16 for Video_Average>MT
C2+Video_Average.sup.2/2048 for Video_Average<=MT
[0092] with C1=21, C2=5 and MT=128.
[0093] A 16-entry table may be used to achieve similar results.
[0094] An implementation of this approach has been tested with the
video average calculated as the average of the eight immediate
neighbors of the pixel under detection for peak/valley.
[0095] In at least one embodiment of the present invention, video
averaging is used to reduce the possibility of misclassifying peaks
and valleys.
[0096] Halftone dots, when present in the form of video peaks,
generally occur in areas with relatively low average gray value.
The converse is true for halftone dots in the form of video valley.
By limiting peak detection to the area where the video average is
below a certain threshold and limiting valley detection to the area
where the video average is above certain threshold, some false
detection can be prevented. The threshold for peak detection may be
set to be greater than the threshold for valley detection by some
margin, making sure that halftone dots will not be missed. The
video average could be the average gray level of a 5.times.5
window.
[0097] While the invention has been described in relation to
preferred embodiments, many modifications and variations are
apparent from the description of the invention, and all such
modifications and variations are intended to be within the scope of
the present invention as defined in the appended claims.
* * * * *