Segmentation Of A Multi-column Document Lyubarskiy; Dmitry Arturovich [ABBYY Development LLC]

Segmentation Of A Multi-column Document

Lyubarskiy; Dmitry Arturovich

Patent Application Summary

U.S. patent application number 14/508276 was filed with the patent office on 2015-07-16 for segmentation of a multi-column document. The applicant listed for this patent is ABBYY Development LLC. Invention is credited to Dmitry Arturovich Lyubarskiy.

Application Number	20150199821 14/508276
Document ID	/
Family ID	53521530
Filed Date	2015-07-16

United States Patent Application	20150199821
Kind Code	A1
Lyubarskiy; Dmitry Arturovich	July 16, 2015

SEGMENTATION OF A MULTI-COLUMN DOCUMENT

Abstract

A method for detecting a logical structure of an image of a document. The method includes identifying objects in the image of the document, constructing a graph dividing the identified objects in the image of the document, detecting an optimal path through the graph to locate regions in the image of the document, and dividing the image of the document into the regions based at least in part on the detected optimal path.

Inventors:

Lyubarskiy; Dmitry Arturovich; (Moscow, RU)

Applicant:

Name	City	State	Country	Type
ABBYY Development LLC	Moscow		RU

Family ID:

53521530

Appl. No.:

14/508276

Filed:

October 7, 2014

Current U.S. Class:	382/173
Current CPC Class:	G06K 9/00463 20130101; G06F 40/284 20200101; G06F 40/268 20200101; G06F 40/211 20200101; G06K 17/00 20130101
International Class:	G06T 7/00 20060101 G06T007/00; G06K 9/00 20060101 G06K009/00

Foreign Application Data

Date	Code	Application Number
Jan 15, 2014	RU	2014101124

Claims

1. A method for detecting a logical structure of an image of a document, the method comprising: identifying objects in the image of the document; constructing a division graph of the identified objects in the image of the document; detecting at least one optimal path through the division graph to locate regions in the image of the document; and dividing the image of the document into the regions based at least in part on the detected optimal path.

2. The method of claim 1, wherein constructing the division graph is performed based on constructing an area diagram for the identified objects.

3. The method of claim 1, further comprising constructing an adjacency graph based on the division graph, wherein constructing the adjacency graph comprises: assigning graph vertices corresponding to the identified objects; identifying pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph; and joining each pair of adjacent objects by an edge of the adjacency graph.

4. The method of claim 2, wherein constructing the division graph further comprises: identifying graph vertices and graph edges on the area diagram; and creating a minimal graph relative to the identified graph vertices and graph edges.

5. The method of claim 1, wherein detecting the optimal path further comprises assigning weight values to edges of the division graph, wherein the weight values are based upon analyzing mutual characteristics of at least one pair of identified objects.

6. The method of claim 5, further comprising summing the assigned weight values to determine a total weight value for paths in the division graph, wherein the at least one optimal path is a path with the best total weight value.

7. The method of claim 1, wherein detecting the at least one optimal path comprises: constructing at least one sub-graph of the division graph; identifying connected components in the sub-graph; constructing at least one path relative to the identified connected components; and filtering the at least one path.

8. The method of claim 1, wherein detecting the optimal path further comprises correcting the at least one optimal path by adding one or more patches.

9. The method of claim 3, wherein dividing the image of the document into regions further comprises: removing at least one graph edge from the adjacency graph corresponding to the at least one optimal path; identifying connected components in the adjacency graph; constructing regions relative to the connected components in the adjacency graph; and dividing the image of the document in the constructed regions.

10. A system to detect a logical structure of an image of a document, the system comprising: a memory configured to store processor-executable instructions; and a processor operatively coupled to the memory, wherein the processor is configured to: identify objects in the image of the document; construct a division graph of the identified objects in the image of the document; detect at least one optimal path through the division graph to locate regions in the image of the document; and divide the image of the document into the regions based at least in part on the detected optimal path.

11. The system of claim 10, wherein the processor is further configured to construct an area diagram for the identified objects.

12. The system of claim 10, wherein the processor is further configured to: construct an adjacency graph based on the division graph, wherein, to construct the adjacency graph, the processor is further configured to: assign graph vertices corresponding to the identified objects; identify pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph; and join each pair of the adjacent objects by an edge of the adjacency graph.

13. The system of claim 11, wherein the processor is further configured to identify graph vertices and graph edges on the area diagram.

14. The system of claim 10, wherein the processor is further configured to assign weight values to edges of the division graph, wherein the weight values are based upon analyzing mutual characteristics of at least one pair of identified objects.

15. The system of claim 14, wherein the processor is further configured to sum the assigned weight values to determine a total weight value for paths in the division graph, wherein the at least one optimal path is a path with the best total weight value.

16. The system of claim 10, wherein the processor is further configured to: construct at least one sub-graph of the division graph; identify connected components in the sub-graph; construct at least one path relative to the identified connected components; and filter the at least one path.

17. The system of claim 10, wherein the processor is further configured to correct the at least one optimal path by adding one or more patches.

18. The system of claim 12, wherein the processor is further configured to: remove at least one graph edge from the adjacency graph corresponding to the at least one optimal path; identify connected components in the adjacency graph; construct regions relative to the connected components in the adjacency graph; and divide the image of the document in the constructed regions.

19. A non-transitory computer-readable storage medium having computer-readable instructions stored therein, the instructions being executable by a processor of a computing system, wherein the instructions comprise: instructions to identify objects in the image of the document; instructions to construct a division graph of the identified objects in the image of the document; instructions to detect an optimal path through the division graph to locate regions in the image of the document; and instructions to divide the image of the document into the regions based at least in part on the detected optimal path.

20. The non-transitory computer-readable storage medium of claim 19, further comprising: instructions to construct an area diagram for the identified objects.

21. The non-transitory computer-readable storage medium of claim 19, further comprising: instructions to construct an adjacency graph based on the division graph, wherein constructing the adjacency graph further comprises: instructions to assign graph vertices corresponding to the identified objects; instructions to identify pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph; and instructions to join each pair of adjacent objects by an edge of the adjacency graph.

22. The non-transitory computer-readable storage medium of claim 20, further comprising: instructions to identify graph vertices and graph edges on the area diagram; and instructions to create a minimal graph based on the identified graph vertices and graph edges.

23. The non-transitory computer-readable storage medium of claim 19, further comprising instructions to assign weight values to edges of the division graph, wherein the weight values are based upon analyzing mutual characteristics of at least one pair of identified objects.

24. The non-transitory computer-readable storage medium of claim 23, further comprising instructions to sum the assigned weight values to determine a total weight value for paths in the division graph, wherein the at least one optimal path is a path with the best total weight value.

25. The non-transitory computer-readable storage medium of claim 19, further comprising: instructions to construct at least one sub-graph of the division graph; instructions to identify connected components in the sub-graph; instructions to construct at least one path relative to the identified connected components; and instructions to filter the at least one path.

26. The non-transitory computer-readable storage medium of claim 19, further comprising: instructions to correct the at least one optimal path by adding one or more patches.

27. The non-transitory computer-readable storage medium of claim 21, further comprising: instructions to remove at least one graph edge from the adjacency graph corresponding to the at least one optimal path; instructions to identify connected components in the adjacency graph; instructions to construct regions relative to the connected components in the adjacency graph; and instructions to divide the image of the document in the constructed regions.

Description

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority under 35 USC 119 to Russian Patent Application No. 2014101125, filed Jan. 15, 2014; the disclosure of which is incorporated herein by reference.

FIELD OF TECHNOLOGY

[0002] The subject matter of the present application relates to a method and system in the field of image processing, specifically the analysis of documents with a complex spatial layout.

BACKGROUND

[0003] Recognition of the internal content of photographed or scanned documents is a challenge at present. The basic principle of existing segmentation methods is to search for a document's basic (logical) structure such as column regions, incuts, and headers, and to analyze the identified regions inside (for example, identification of text lines in a column regions). Known segmentation methods are often able to identify only rectangular blocks in a document (e.g. rectangular columns of text and incuts with primitive shapes). Such methods need the breaks between columns and incuts to be sufficiently large to correctly determine whether a given fragment of the document belongs to a column or an incut. Incorrect classification occurs when there are isolated objects (for example, sparse table cells) inside columns or incuts and when the shape of regions is not rectangle.

[0004] Existing methods of document analysis are unable to accurately and reliably perform segmentation of document images. Therefore, there is a need to develop a new method to analyze the internal content of documents.

SUMMARY

[0005] In one aspect, the present disclosure is related to a method for detecting a basic (logical) structure of an image of a document. The method includes identifying objects in the image of the document, constructing an area diagram for the identified objects, constructing a division graph based on the area diagram for the identified objects, and constructing an adjacency diagram based on the division graph. In some implementations, constructing the adjacency graph includes assigning graph vertices corresponding to the identified objects, identifying pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph, and joining each pair of adjacent objects by an edge of the adjacency graph. The method further includes identifying graph vertices and graph edges on the area diagram and creating a minimal graph based on the identified graph vertices and graph edges. The method further includes detecting an optimal path through the graph to locate regions in the image of the document, and dividing the image of the document into the regions based at least in part on the detected optimal path.

[0006] In some implementations, the method further includes assigning weight and/or penalties values to edges of the division graph. In an implementation, the weight values are based upon a comparison of at least two identified objects. The method further includes summing the assigned weight values to determine a total weight value to detect the optimal path. In an illustrative embodiment, the optimal path is a path with the best total weight value. The method further includes constructing a sub-graph of the division graph to restrict a search area for the needed path, identifying connected components in the subgraph, constructing at least one path relative to the identified connected components, and filtering the paths. In some implementations, the method further includes correcting the constructed path by adding one or more patches. The method further includes removing at least one graph edge from the adjacency graph corresponding to the optimal path, identifying connected components in the adjacency graph, constructing regions relative to the connected components in the adjacency graph, and dividing the image of the document in the constructed regions.

[0007] In another aspect, the present disclosure is related to a system to detect a basic structure of a document. The system includes a memory configured to store processor-executable instructions and a processor operatively coupled to the memory. In some implementations, the processor is further configured to identify objects in the image of the document, construct an area diagram for the identified objects, construct a division graph based on the area diagram, and construct an adjacency diagram based on the division graph. In some implementations, the processor is further configured to assign graph vertices corresponding to the identified objects, identify pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph, and join each pair of adjacent objects by an edge of the adjacency graph. The processor is further configured to identify graph vertices and graph edges on the area diagram, and create a minimal graph based on the identified graph vertices and graph edges. The method further includes detecting an optimal path through the graph to locate regions in the image of the document, and dividing the image of the document into the regions based at least in part on the detected optimal path.

[0008] In some implementations, the processor is further configured to assign weight or penalties values to edges of the division graph. In an implementation, the weight values are based upon a comparison of at least one pair of identified objects. The processor is further configured to sum the assigned weight values to determine a total weight value to detect the optimal path. In some implementations, the optimal path is a path with the best total weight value. The processor is further configured to construct a sub-graph of the division graph to restrict a search area for the needed path, identify connected components in the sub-graph, construct at least one path relative to the identified connected components, and filter the paths. In some implementations, the processor is further configured to correct the constructed path by adding one or more patches. The processor is further configured to remove at least one graph edge from the adjacency graph corresponding to the optimal path, identify connected components in the adjacency graph, construct regions relative to the connected components in the adjacency graph, and divide the image of the document in the constructed regions.

[0009] In another aspect, the present disclosure is related to a non-transitory computer-readable storage medium having computer-readable instructions stored therein, the instructions being executable by a processor of a computing system. The instructions further include instructions to identify objects in the image of the document, construct an area diagram for the identified objects, instructions to construct a division graph based on the area diagram, and instructions to construct an adjacency diagram based on the division graph. In some implementations, the instructions further include instructions to assign graph vertices corresponding to the identified objects, instructions to identify pairs of adjacent objects corresponding to pairs of objects divided by at least one edge in the division graph, and instructions to join each pair of adjacent objects by an edge of the adjacency graph. The instructions further include instructions to identify graph vertices and graph edges on the area diagram and instructions to create a minimal graph based on the identified graph vertices and graph edges. The method further includes detecting an optimal path through the graph to locate regions in the image of the document, and dividing the image of the document into the regions based at least in part on the detected optimal path.

[0010] In some implementations, the instructions further include instructions to assign weight or penalties values to edges of the division graph. In an implementation, the penalties are based upon a comparison of mutual characteristics of at least one pair of identified objects. The instructions further include instructions to sum the assigned weight values to determine a total weight value to detect the optimal path. In some implementations, the optimal path is a path with the best total weight value. The instructions further include instructions to construct a sub-graph of the division graph to restrict a search area for the needed path, instructions to identify connected components in the sub-graph, instructions to construct at least one path relative to the identified connected components, and instructions to filter the paths. In some implementations, the instructions further include instructions to correct the constructed path by adding one or more patches. The instructions further include instructions to remove a at least one graph edge from the adjacency graph corresponding to the at least one optimal path, instructions to identify connected components in the adjacency graph, instructions to construct regions relative to the connected components in the adjacency graph, and instructions to divide the image of the document in the constructed regions.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The foregoing and other objects, aspects, features, and advantages of the disclosure will become more apparent and better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:

[0012] FIG. 1 is flow diagram of a method for detecting a logical structure of a document in accordance with an illustrative embodiment;

[0013] FIG. 2 is a flow diagram of a method for constructing an area diagram in accordance with an illustrative embodiment;

[0014] FIG. 3 is a flow diagram of a method for constructing a division graph in accordance with an illustrative embodiment;

[0015] FIG. 4 is flow diagram of a method for detecting an optimal path in accordance with an illustrative embodiment;

[0016] FIG. 5 is a flow diagram of a method for dividing a document into regions in accordance with an illustrative embodiment;

[0017] FIG. 6 illustrates an example of an image of a document with distorted columns in accordance with an illustrative embodiment;

[0018] FIG. 7 illustrates an example of an image of a document with incuts in accordance with an illustrative embodiment;

[0019] FIG. 8 illustrates examples of objects in an image of a document in accordance with an illustrative embodiment;

[0020] FIG. 9 illustrates a first example of a graph of objects in an image of a document in accordance with an illustrative embodiment;

[0021] FIG. 10 illustrates a second example of a graph of objects in an image of a document in accordance with an illustrative embodiment; and

[0022] FIG. 11 is a block diagram of a system for detecting a logical structure of a document according to an illustrative embodiment.

DETAILED DESCRIPTION

[0023] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown only in block diagram form in order to avoid obscuring the invention.

[0024] Reference in this specification to "one implementation" or "an implementation" means that a particular feature, structure, or characteristic described in connection with the implementation is included in at least one implementation of the invention. The appearances of the phrase "in one implementation" in various places in the specification are not necessarily all referring to the same implementation, nor are separate or alternative implementations mutually exclusive of other implementations. Moreover, various features are described which may be exhibited by some implementations and not by others. Similarly, various requirements are described which may be requirements for some implementations but not other implementations.

[0025] Segmentation is an important step in document image recognition. The segmentation process makes it possible to identify different regions on document images. Regions may be columns, pictures, tables, text blocks, headers, etc.

[0026] The present disclosure is directed to methods and systems to discover (or identify) the basic structure of a document. In some implementations, the document may contain text columns, images, tables, etc. The present disclosure is also directed to the discovery (identification) of the borders of objects of any complexity located in a document, e.g. borders of objects in the document may be partially or completely cut into columns. In some implementations, the discovery of the borders includes the discovery of gaps between columns, for example in an image of a document. The gaps in images of documents may be curved, narrow, or distorted. In some implementations, the objects in a document may be arbitrarily shaped, which increases the difficulty in identifying borders. An example of a document with curved columns 62 is shown in FIG. 6. A document may include various regions such as: text blocks, pictures, columns, tables, diagrams, etc. The methods and systems described herein may divide the document into these regions to properly understand their logical relationship.

[0027] There may be two types of regions in a document: column regions and non-column regions. Column regions may be understood to be sections of the document containing text that is shaped like a column. A document may have one or more columns of text. Non-column regions, in some implementations, may be sections of a document containing pictures, framed text, headers, or tables and may be referred to hereinafter as "incuts."

[0028] An incut may be region on the document that contains text, a framed text, a header, a table, and/or a picture. In some implementations, the main feature of an incut is that it partially or completely cuts into a column of the main text. Hereinafter, an incut that partially cuts into a column may be referred to as "partial." Additionally, hereinafter, an incut that completely cuts into a column may be referred to as "complete." Note the main text of a column may be located either to the left or to the right of a partial incut. Thus, there may be two types of partial incuts: a right partial incut and a left partial incut.

[0029] FIG. 7 illustrates an example of an image of a document with incuts. In FIG. 7, an image of a document includes an incut (71), a first column (72), a middle column (73), a third column (74), a complete incut (75) and a border of the middle column (76). The complex incut (71) in FIG. 7 is an incut for each column. For the first column (72), the incut (71) is a partial right (rectangular) incut. For the middle column (73), the incut (71) is a complete incut. For the third column (74), the incut (71) is a partial left (non-rectangular) incut.

[0030] In some implementations, incuts may be located at the top, bottom, or middle of a text column. For an upper incut, the main text may be located below the incut. Thus, the incut is adjacent to the upper part of the context. For a lower incut, the main text may be located above the incut, and the incut may be adjacent to the lower part of the context. In the case of a middle incut, the text may be located above, below, and/or to one side (to the right or left), depending on whether the incut is a right incut or a left incut. For example, the "International Business" incut (75) in FIG. 7 is an upper (complete) incut for all of the columns. In some implementations, middle complete incuts can split a column into two disconnected parts. In one implementation, headers may be considered as a special case of upper complete incuts.

[0031] FIG. 1 shows a flow diagram of a method for detecting a logical structure of a document. The method may be implemented on a computing device (e.g., a user device). In one implementation, the method is encoded on a computer readable medium that contains instructions that, when executed by the computing device, cause the computing device to perform operations of the method. At block 100, an image of a document is received. In some implementations, the image of the document (or frame) may be obtained electronically using any one of the known methods. In an implementation, the image may be obtained from the memory of an electronic device, or from any other accessible sources. The image of the document may be received by a processor. The processor may be configured to receive the image of the document and perform the methods described herein.

[0032] At block 105, objects in the image of the document are identified. In some implementations, in order to find the borders between regions, objects can first be identified in the document (105). Objects in a document can, for example, be fragments of text lines, lines, picture fragments, pictures, words, separators, etc. FIG. 8 illustrates examples of text objects (82) in an image of a document. FIG. 8 depicts fragments of text lines, which may be objects (82). Object borders (84) are denoted with a black line.

[0033] In the present disclosure the image of the document is segmented into regions using an area Voronoi diagram (hereinafter "AVD"), and the identified objects are used as the Voronoi areas. Referring back to FIG. 1, at block 110, an AVD is constructed. An area Voronoi diagram may be constructed by any convenient method. For the purposes of segmentation, an approximate area Voronoi diagram may be constructed using, for example, the method shown in FIG. 2.

[0034] FIG. 2 shows a flow diagram of a method for constructing an approximate area Voronoi diagram. At block 200, an image with identified objects is received. At block 210, each object is initially approximated with a set of points (210). In some implementations, the set of points may be referred to as approximating points. FIG. 8 shows examples of objects (82) and one of the ways to approximate them with a set of points (86). After approximating points have been found for all objects, at block 220, a Voronoi diagram is built for them (220). The set of points in the built Voronoi diagram, which are located closer to a given approximating point than any of the others may be referred to as a Voronoi cell of the approximating point. At block 230, the borders between cells belonging to the same objects (230) are eliminated. At block 240, the resulting diagram is an approximate area Voronoi diagram. It is worth noting that the AVD contains useful supplemental information related to the topology of objects. For example, it may be used to refine the borders between objects. In some implementations, this utilizes a special data structure referred to as a division graph.

[0035] FIG. 3 shows a flow diagram of a method of implementation for constructing a division graph (120, at FIG. 1). At block 300, an area Voronoi diagram is received. In some implementations, the edges in a division graph signify continuous sections of borders between Voronoi cells, for example Voronoi cells in the area Voronoi diagram. The edges in the division graph may begin and end where the corresponding border lines are branching. A branch may be a point where three or more Voronoi cells meet. In some implementations, an edge supplies additional information, for example, the edge may signify a pair of objects divided by the corresponding border line.

[0036] FIG. 9 schematically illustrates a first example of a division graph. FIG. 9 shows the edge Eab (91), which divides adjacent cells A (92) and B (93). The vertices in the division graph can be the branch points (102) and the terminal points (104) on the AVD, as schematically illustrated in FIG. 10, which illustrates a second example of a division graph. In some implementations, the vertices corresponding terminal points may be referred to as terminal vertices. Terminal vertices can be vertices located along the border of the graph. In an implementation, only one edge can extend from a terminal vertex. In some implementations, if an object has a more or less random shape, then exactly 3 edges will originate from each non-terminal vertex. This follows from the properties of a Voronoi diagram.

[0037] Referring back to FIG. 3, at blocks 310 and 320, the branch points and terminal points on the AVD can be used when constructing the division graph. In some implementations, each of these points corresponds to a graph vertex (310), and each segment that divides the Voronoi cells corresponds to a graph edge (320). At block 330, the resulting graph may be reduced to a minimal homeomorphic graph. In some implementations, to reduce the resulting graph to a minimal homeomorphic graph, vertices with only two incident edges may be eliminated and the two corresponding edges may be joined into one (330). In one implementation, edges are joined only if they divide the same pair of objects. The resulting graph is a division graph 340.

[0038] Now referring back to FIG. 1, at block 120, an adjacency graph is constructed. In some implementations, an adjacency graph of objects may be a dual representation of the AVD. Unlike the division graph, the vertices of the adjacency graph may have a well-defined physical meaning and correspond to the objects in the image of the document themselves. The edges of the adjacency graph connect adjacent objects, while the edges of the division graph divide them. In some implementations, a division graph may be used to construct an adjacency graph. In one embodiment, the procedure to construct an adjacency graph using a division graph may include identifying the vertices of the adjacency graph. In some implementations, the vertices of the adjacency graph may correspond to the objects identified in the document image, and two vertices of the adjacency graph may be joined by an edge if the objects that correspond to these vertices are divided by at least one edge in the division graph (the supplementary information of an edge of the division graph indicates the objects that it divides). In some implementations, it is not possible to unambiguously reconstruct a division graph from an adjacency graph of objects. This is due to the fact that it may not be possible to know from the adjacency graph whether or not two edges in the division graph must be adjacent (i.e. share a common vertex).

[0039] Still referring to FIG. 1, at block 140, a path search is conducted. The path search can be conducted after the division graph and the adjacency graph have been constructed, where the path search is a search for paths in the division graph.

[0040] FIG. 4 shows a flow diagram of a method for detecting an optimal path. In some implementations, regions in a document may be text blocks, pictures, columns, tables, diagrams, etc. Most frequently, the borders between regions in documents are rather large areas of whitespace, for example, the separation between columns or the borders of incuts, which are generally continuous. In terms of the division graph, such a space may be a rather long continuous path (i.e. a sequence of adjacent edges) in the division graph.

[0041] In an implementation, a path in the division graph can be a sequence of edges in which the end of one edge (a vertex) is the beginning of another edge. In some implementations, a path in the division graph may terminate either with a terminal vertex or with a vertex from some other path. FIG. 7 illustrates an example of an image of a document, and it is readily apparent that a set of paths (76) will "cut" the document, thereby splitting its context into the desired regions.

[0042] In an implementation, to correctly split the document into regions, an optimal path is defined. To do this, referring back to FIG. 4, at block 400, the edges of the division graph are weighted. In some implementations, the edges of the division graph are weighted based on an analysis of the consistency of the objects that the edges of the division graph pass between. The analysis may also include a comparison of characteristics of the identified objects in the image of the document. For example and without limitation, the analysis may include a comparison of an object's text quality, the position of baselines, height, etc. In some implementations, based on the results, an edge separating the objects being examined may be assigned a weight or a penalty. In one implementation, the penalty (weight) may be a certain numeric value. In some implementations, the weight values may be based upon analyzing mutual and/or similar characteristics of at least one pair of identified objects. For example, if two objects are very similar to pieces of a line, then an edge that separates these two objects may be assigned a large penalty. In other implementations, if two objects' types are not consistent, (i.e. a text object and a picture), then the edges between these two objects may be assigned a smaller penalty.

[0043] In some implementations, the size of the penalty may depend on the distance between objects. In the general case, for various subtasks the penalties may differ. For example, in one implementation, when searching for specific types of incuts, one may expect that there is column text on at least on side of an edge, so that edges that separate non-textual objects may have a large penalty. In another implementation, a large penalty would not be appropriate when searching for paths between columns (inter-column paths), because it may be possible that an edge separates two pictures, each of which belongs to its own column.

[0044] In some implementations, finding the paths in the division graph assumes that a path must pass along a border between neighboring objects in the document. Because the distance between region borders is greater than, for example, the distance between text lines in a single column, then after the edges of the division graph have been weighted, it turns out that the edges dividing neighboring regions have sufficiently small weight (penalty) relative to the edges dividing objects within a single region. Thus, in an implementation, to find a path that passes along the border between regions, it is desirable to find the path with the smallest penalty. In some implementations, the path with the smallest penalty may be an optimal path. In an implementation, the optimal path may be the path with the best total weight value. The best total weight value may have the smallest total penalty or highest total score value.

[0045] In an implementation, the process of constructing the optimal path may begin with a search for "good" edges, or edges with terminal vertices. A "good" edge may refer to an edge with the smallest penalty. In one implementation, a "good" edge may refer to an edge with a penalty below a pre-determined threshold. Initially, a path has no penalties, because it includes no edges. With the addition of each edge, the path's penalty increases by the magnitude of the added edge's penalty. In some implementations, Dijkstra's algorithm may be used for example to find the path with the smallest penalty. Thus, in an implementation, the path obtained by adding the edges with the smallest penalty may be the border between two or more regions of an image of a document. In some implementations, the path obtained by adding the edges with the smallest penalty may be between terminal vertices and/or sections of other paths in the document image. In general, any scoring system may be selected to sum, add, total, and/or quantify a score value for an edge. For example, in one implementation, instead of penalties, the quality of an edge and/or path could be scored. Thus, one may construct the path by adding edges with the highest weight (score) of quality. The method described next uses a penalty system as an example.

[0046] In some implementations, a subgraph of the division graph may be created. A subgraph may be a graph that contains some subset of the vertices of the division graph and some subset of the edges incident to them in the division graph. In some implementations, a subgraph may be defined in an area of document image through which the desired path is surmised to pass. In such implementations, the path search will take place only within the subgraph. One of the advantages of using subgraphs is that the process of discovering regions in a document can be accelerated.

[0047] The method presented makes it possible to segment documents of arbitrary complexity and with the most diverse logical structure. Constructing paths based on a multi-column page with incuts, similar to the document illustrated in FIG. 7, can be difficult. This sort of structure is typical of magazines and newspapers and is one of the most complex segmentation tasks. An example of constructing inter-column paths and paths to isolate incuts is described below.

[0048] In some implementations, path construction may include several stages. First one or more inter-column graphs (i.e. sub-graphs) may be constructed. In some implementations, the inter-column graphs are constructed using the division graph. To construct the inter-column sub-graph, various hypotheses as to how to partition the page into columns may be tested. In doing so, it is recognized that columns may be uniform or varied in width. In some implementations, the inter-column graph may be searched for edges with small penalties, for example, penalties that are less than a certain threshold value. In an implementation, edges with small penalties may be most suitable to serve as borders between two columns. In some implementations, the edges identified with small penalties may be a section of the future optimal path.

[0049] In an implementation, to optimize the search for the separating path, an undirected graph may be turned into a directed graph. Initially, the sub-graph may be undirected, for example, the sub-graph's edges may not have a direction. In order to produce a directed graph from an undirected graph, each undirected edge can be replaced with two directed edges, where the two directed edges are pointed in opposite directions. In some implementations, a search is conducted, for example, for paths that move only from top to bottom. In one implementation, edges that move in the opposite direction may be eliminated. If no reliable sections of inter-column paths are found for a given column-partitioning hypothesis (i.e. path sections having edges with small penalties), another way of partitioning the document into columns may be considered.

[0050] In some implementations, in addition to inter-column paths, a search for complete and partial incuts may be conducted. In an implementation, partial incuts may be located in places where there are breaks in inter-column paths. Vertices, where an inter-column path is broken, may be treated as initial vertices of the path of partial incuts.

[0051] In places where no partial incuts are found, there is a chance of finding the path of complete incuts. In some implementations, to find the path of a complete incut, it may be desirable to examine edges located between the borders of the break in the inter-column paths. For example, it is possible to find edges whose weights are above the average weight (or whose penalties are below the average penalty) of edges within the column being analyzed (because the distance between the incut and the main column text is greater than the distance between the text lines inside the column).

[0052] In some implementations, to construct paths in the division graph, the method may include look through the initial vertices and the end vertices (i.e., where the path ends) of the already constructed paths, trying to start a new path from one of these vertices. In other implementations, "good" edges may be identified and used as a section to construct a new path from them by successively adding edges that are adjacent to them.

[0053] Now referring to FIG. 4, at block 440, an inter-column graph is constructed. In some implementations, as has already been stated, a tentatively constructed inter-column graph (440) may be used when constructing inter-column paths. Inter-column graph may include only those edges that may be elements of an inter-column path. In an implementations, "horizontal edges" (i.e., edges that divide horizontally overlapping objects) may be eliminated from the graph. Additionally, in some implementations, edges that do not fall on the borders of the analyzed columns may be eliminated.

[0054] At block 450, the inter-column graph may be searched for all connected components. A connected component may refer to sets of vertices of the graph, such that for any two vertices in this set there is a path from one to the other, and there is not a path from a vertex in the set to a vertex that is not in the set. Next, at block 460, the inter-column paths may be compared to the connected components found. In some implementations, the inter-column paths may be compared to the connected components using Dijkstra's algorithm.

[0055] At block 470, all or part of the paths may be filtered. In some implementations, paths may be filtered based on both absolute and/or relative characteristics. In an implementation, when using absolute characteristics, short, curved paths and/or paths with large penalties may be considered suspect. The short, curved paths, paths with large penalties may be subject to filtering. In some implementations, relative filtering accounts for the fact that the given interval may contain several paths that are good but incompatible with one another. For example, in one implementation, if a table is located under an inter-column path, it may contain a good long vertical path which is horizontally incompatible with the inter-column path. In some implementations, determination of which of two paths is spurious may be for example possible based on comparative analysis that considers which path is closer horizontally to the center of the inter-column division.

[0056] Still referring to FIG. 4, at block 410, a tentatively constructed column sub-graph, which includes only those edges that can separate an incut from a column with the text, may be used during the construction of paths of partial incuts. In some implementations, the creation of such a sub-graph may account for the fact that at least one object adjacent to an edge of searched path will most likely be a text object (text of a column).

[0057] The process of constructing paths of partial incuts (420) seeks to find a path between broken sections of found inter-column path. The process (420) may successively add edges located between the vertices of broken inter-column path to construct a path which connects the vertices. In an implementation, if an upper/lower incut is searched, the process seeks to find path from an initial/end vertex of an appropriate inter-column path to one of terminal vertices in the column sub-graph.

[0058] Firstly, the path of partial incut is searched for using Dijkstra's algorithm. In some implementations, this path suggested by Dijkstra's algorithm may be referred to as a base path. After the base path (420) has been found, at block 430, the base path may be corrected. In an implementation, correcting the base path may include expanding the base path (430). In some implementations, the base path may need to be corrected because the incut may be sparse, for example it may consist of objects located far from one another or it may be a compound incut (e.g., a picture and its caption). If there is a sparse incut, the base path may be incorrectly detected, i.e. the path could erroneously pass through the incut.

[0059] To combat this phenomenon, it is proposed to use a patch technique. In some implementations, patches may be used to correctly adjust the constructed path, for example, in the case of sparse incuts. A patch may be an edge or section of a path with a small penalty that is a candidate for inclusion in the path. In some implementations, an example of an indication that an edge or a several edges should be added as a patch is, for example, the fact that the edge/edges divide objects which are essentially different (e.g. a part of picture and a text object or text objects with different height).

[0060] After patches have been constructed, an attempt may be made to add the patches to the incut's base path. The base path with one or more patch added may be referred to as a final path. In some implementations, the system may add several patches to find the final path. In an implementation, a set of hypotheses of patch paths can be created, and the best among them can be selected. This selection may be based on two criteria. First, the selection may be based on a concept that it is best for the incut to encompass as much "large territory" as possible. Second, the selection may be based on penalties of patches, because the final path should be "good" so it should contain edges with small penalties.

[0061] Overall, the method of creating paths of complete incuts may be similar to the method of constructing paths of partial incuts. In some implementations, incut candidates may be searched based on an enumeration of the end/initial vertices of the incut's path. In an implementation, the final candidate may be selected heuristically based on the following criteria: a) distance from a location of a break in inter-column paths (the smaller the distance, the more preferable the candidate path); b) the quality of the path itself (the lower the likelihood that the candidate path "cuts" through a paragraph of columnar text, for example, the better).

[0062] In some implementations, the identified text columns can be checked for the presence of incuts that do not protrude from the column borders. To do this, in an implementation, the content of the column may be analyzed for the presence of sections of separating paths with small penalties. If a column of text contains a "good" section of a path, then in some implementations, the column may contain a partial incut. In an implementation, an attempt to construct the path of a partial incut can be made based on this section.

[0063] FIG. 5 shows a flow diagram of a method for dividing a document into regions. The constructed system of paths is not enough to understand the logical structure of a document and to obtain a document segmentation result, so additional stages should be performed to identify regions within the document. In some implementations, after the system of paths in the division graph has been found, objects can be assigned to the regions to which they correspond. In an implementation, the adjacency graph (120) may be used for this purpose. The edges which are corresponding to the constructed system of paths are removed from the adjacency graph. Each edge of the division graph stores information about the objects that it divides (i.e. the objects located on both sides of the edge). For each edge of paths (480), information regarding the objects that the edge divides may be extracted, and the corresponding edge from the adjacency graph, which joins these objects, may be removed (510). Next, at block 520, any known method may be used to identify connected components in obtained adjacency graph with removed corresponding edges (520). The connected components correspond to the desired regions and contain precisely those objects that belong to these regions. At block 530, the resulting regions are identified on the document image (530) and displayed in the segmentation results. Finally, the image of the document can be divided into regions (160).

[0064] The foregoing description of illustrative embodiments has been presented for purposes of illustration of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosed embodiments. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

[0065] FIG. 11 shows a system 1100 that may detect a logical structure of a document using the techniques described above, in accordance with some embodiments of the disclosure. The system 1100 typically includes at least one processor 1102 coupled to a memory 1104. The processor 1102 may represent one or more processors (e.g., microprocessors), and the memory 1104 may represent random access memory (RAM) devices comprising a main storage of the system 1100 and/or any supplemental levels of memory e.g., cache memories, non-volatile or back-up memories (e.g. programmable or flash memories), read-only memories, etc. In addition, the memory 1104 may include memory storage physically located elsewhere in the system 1100 (e.g. any cache memory in the processor 1102) as well as any storage capacity used as a virtual memory (e.g., as stored on a mass storage device 1110).

[0066] In some implementations, the system 1100 receives a number of inputs and outputs for communicating information externally. The system 1100 may include one or more user input devices 1006 (e.g., a keyboard, a mouse, a scanner etc.) and a display 1108 (e.g., a Liquid Crystal Display (LCD) panel) for interfacing with a user/operator. For additional storage, the hardware 1100 may also include one or more mass storage devices 1110, e.g., a floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g. a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive, etc.) and/or a tape drive, among others. Furthermore, the system 1100 may include an interface with one or more networks 1112 (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the system 1100 typically includes suitable analog and/or digital interfaces between the processor 1102 and each of the components 1104, 1106, 1108 and 1112 as is well known in the art.

[0067] The system 1100 operates under the control of an operating system 1114, and executes various computer software applications, components, programs, objects, modules, etc. indicated collectively by reference number 1116 to perform the correction techniques described above.

[0068] In general, the routines executed to implement the embodiments of the disclosure may be used as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as "computer programs." The computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects of the disclosure. Moreover, while the invention has been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments of the disclosure are capable of being distributed as a program product in a variety of forms, and that the disclosure applies equally regardless of the particular type of machine or computer-readable media used to actually affect the distribution. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks, (DVDs), etc.), among others, and transmission type media such as digital and analog communication links. Computer-readable media, as included within the present disclosure, include only non-transitory media (i.e., do not include transitory signals-in-space).

[0069] Although the present disclosure has been provided with reference to specific exemplary embodiments, it is evident that the various modifications can be made to these embodiments without changing the initial spirit of the invention. Accordingly, the specifications and drawings are to be regarded in an illustrative sense rather than in a restrictive sense.

* * * * *