Classification of samples Patent Grant Gossage , et al. January 24, 2 [Cambridge Research & Instrumentation, Inc.]

Classification of samples

Gossage , et al. January 24, 2

Patent Grant 8103081

U.S. patent number 8,103,081 [Application Number 12/401,430] was granted by the patent office on 2012-01-24 for classification of samples. This patent grant is currently assigned to Cambridge Research & Instrumentation, Inc.. Invention is credited to Kirk William Gossage, Tyna A. Hope.

United States Patent	8,103,081
Gossage , et al.	January 24, 2012

Classification of samples

Abstract

Methods disclosed herein include: (a) determining positions of a plurality of cells based on one or more images of the cells; (b) for at least some of the plurality of cells, generating a matrix that includes two-dimensional information about positions of neighboring cells, and determining one or more numerical features based on the information in the matrix; and (c) classifying the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Inventors:	Gossage; Kirk William (Milford, CT), Hope; Tyna A. (Wakefield, MA)
Assignee:	Cambridge Research & Instrumentation, Inc. (Woburn, MA)
Family ID:	41164021
Appl. No.:	12/401,430
Filed:	March 10, 2009

Prior Publication Data


	Document Identifier	Publication Date
	US 20090257640 A1	Oct 15, 2009

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number	Issue Date
61035240	Mar 10, 2008
61045402	Apr 16, 2008

Current U.S. Class:	382/133; 600/407; 356/36
Current CPC Class:	G06K 9/0014 (20130101); G06T 7/11 (20170101); G06K 9/4642 (20130101); G06T 2207/30024 (20130101); G06T 2207/10056 (20130101)
Current International Class:	G06K 9/00 (20060101)
Field of Search:	;382/128-134,162 ;356/36,300,369,453,465 ;600/407,476

References Cited [Referenced By]

U.S. Patent Documents


6421131	July 2002	Miller
6466687	October 2002	Uppaluri et al.
6920239	July 2005	Douglass et al.
6924893	August 2005	Oldenbourg et al.
2005/0216204	September 2005	Yener et al.
2006/0036372	February 2006	Yener et al.
2006/0245631	November 2006	Levenson et al.

Other References

Coleman et al., "Syntactic structure analysis in uveal melanomas," Brit. J. Ophthalmology 78: 871-874 (1994). cited by other .
Choi et al., "Minimum spanning trees (MST) as a tool for describing tissue architecture when grading bladder carcinoma," Proc. 8th Int. Conf. on Image Analysis and Processing (San Remo, Italy), pp. 615-620 (1995). cited by other .
Geusebroek et al., "Segmentation of Tissue Architecture by Distance Graph Matching," Cytometry 35(1): 12-22 (1999). cited by other .
Sudbo et al., "New algorithms based on the Voronoi Diagram applied in a pilot study on normal mucosa and carcinomas," Analytical Cellular Pathology 21(2): 71-86 (2000). cited by other .
Sudbo et al., "Prognostic Value of Graph Theory-Based Tissue Architecture Analysis in Carcinomas of the Tongue," Laboratory Investigation 80(12) (2000). cited by other .
Gunduz et al., "The cell-graphs of cancer," Bioinformatics 20 Supp. 1: i145-i151 (2004). cited by other .
Takahashi et al., "Support Systems for Histopathologic Diagnosis of Hepatocellular Carcinoma Using Nuclear Positions," Proc. 2nd Annual IASTED Conf. Biomed. Eng., pp. 219-223 (2004). cited by other .
Demir et al., "Augmented cell-graphs for automated cancer diagnosis," Bioinformatics 21 Supp. 2: ii7-ii12 (2005). cited by other .
Demir et al., "Learning the Topological Properties of Brain Tumors," IEEE/ACM Trans. Comp. Biol. Bioinf. 2(3): 262-270 (2005). cited by other .
Demir et al., "Spectral analysis of cell-graphs for automated cancer diagnosis," 4th Conf. on Modeling and Simulation in Biology, Medicine, and Biomedical Engineering (Linkoping, Sweden), 2005. cited by other .
Landini et al., "Quantification of Local Architecture Changes Associated with Neoplastic Progression in Oral Epithelium using Graph Theory," Fractals in Biology and Medicine IV (Losa et al., eds.), pp. 193-201 (Birkhauser, 2005). cited by other .
Petushi et al., "Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer," BMC Med. Imaging, pp. 6-14 (2006). cited by other .
Bilgin et al., "Cell-Graph Mining for Breast Tissue Modeling and Classification," IEEE Eng. Med. Biol. Soc. 1: 5311-5314 (2007). cited by other .
Doyle et al., "Automated Grading of Prostate Cancer Using Architectural and Textural Image Features," IEEE International Symposium on Biomedical Imaging (ISBI), pp. 1284-1287 (2007). cited by other .
Gunduz-Demir, "Mathematical modeling of the malignancy of cancer using graph evolution," Mathematical Biosciences 209(2): 514-527 (2007). cited by other .
Lin et al., "Automated image analysis methods for 3-D quantification of the neurovascular unit from multichannel confocal microscope images," Cytometry A 66A(1): 9-23. cited by other.

Primary Examiner: Sohn; Seung C
Attorney, Agent or Firm: Fish & Richardson P.C.

Parent Case Text

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application Ser. No. 61/035,240, filed on Mar. 10, 2008, and to U.S. Provisional Patent Application Ser. No. 61/045,402, filed on Apr. 16, 2008. The entire contents of each of these provisional applications are incorporated by reference herein.

Claims

What is claimed is:

1. A method, comprising: determining positions of a plurality of cells based on one or more images of the cells; for at least some of the plurality of cells, generating a matrix comprising two-dimensional information about positions of neighboring cells, and determining one or more numerical features based on the information in the matrix; and classifying the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

2. The method of claim 1, wherein determining positions of a plurality of cells comprises determining positions of nuclei of the cells.

3. The method of claim 2, wherein the positions of the nuclei are determined automatically from the one or more images.

4. The method of claim 2, wherein the matrix comprises information about positions of the nuclei of neighboring cells relative to the nucleus of a cell corresponding to the matrix.

5. The method of claim 4, wherein the matrix comprises information about positions of the nuclei of neighboring cells as a function of distance between the neighboring nuclei and the nucleus of the cell corresponding to the matrix.

6. The method of claim 4, wherein the matrix comprises information about positions of the nuclei of neighboring cells as a function of angular orientation relative to the nucleus of the cell corresponding to the matrix.

7. The method of claim 4, wherein the matrix comprises information about positions of the nuclei of neighboring cells as a function of distance along a first coordinate axis between the neighboring nuclei and the nucleus of the cell corresponding to the matrix, and wherein the matrix further comprises information about positions of the neighboring cells as a function of distance along a second coordinate axis between the neighboring nuclei and the nucleus of the cell corresponding to the matrix, the second coordinate axis being orthogonal to the first coordinate axis.

8. The method of claim 1, wherein determining positions of a plurality of cells comprises determining positions of one or more structural features of the cells.

9. The method of claim 8, wherein the one or more structural features comprises a cellular membrane.

10. The method of claim 8, wherein the one or more structural features comprises two or more structural features.

11. The method of claim 1, wherein the two-dimensional information comprises position information as a function of two quantities, and wherein the matrix comprises a first dimension corresponding to one of the quantities and a second dimension corresponding to the other of the quantities.

12. The method of claim 11, wherein the two-dimensional information varies as a function of one of the quantities along the first dimension of the matrix, and the two-dimensional information varies as a function of the other quantity along the second dimension of the matrix.

13. The method of claim 1, wherein the at least one of multiple classes comprises two classes.

14. The method of claim 13, wherein the two classes correspond to cancerous cells and non-cancerous cells.

15. The method of claim 1, wherein the at least one of multiple classes comprises more than two classes.

16. The method of claim 1, wherein the one or more numerical features comprises at least one one-dimensional feature derived from a distribution of the positions of neighboring cells.

17. The method of claim 16, wherein the distribution is derived from the elements of the matrix.

18. The method of claim 17, wherein the distribution is derived by summing elements along one dimension of the matrix.

19. The method of claim 16, wherein the at least one one-dimensional feature is derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix.

20. The method of claim 16, wherein the at least one one-dimensional features is derived from a distribution of the positions of neighboring cells as a function of angular orientation relative to a cell corresponding to the matrix.

21. The method of claim 16, wherein the at least one one-dimensional feature comprises a mean of the positions of neighboring cells.

22. The method of claim 16, wherein the at least one one-dimensional feature comprises a standard deviation of the positions of neighboring cells.

23. The method of claim 16, wherein the at least one one-dimensional feature comprises a median of the positions of neighboring cells.

24. The method of claim 16, wherein the at least one one-dimensional feature comprises a mode of the positions of neighboring cells.

25. The method of claim 1, wherein the one or more numerical features comprises at least one two-dimensional feature derived from a distribution of the positions of neighboring cells.

26. The method of claim 25, wherein the distribution is derived from the elements of the matrix.

27. The method of claim 25, wherein the at least one two-dimensional feature is derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix, and as a function of angular orientation of the neighboring cells relative to the cell corresponding to the matrix.

28. The method of claim 25, wherein the at least one two-dimensional feature is derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix along a first coordinate direction, and as a function of distance between the neighboring cells and the cell corresponding to the matrix along a second coordinate direction orthogonal to the first coordinate direction.

29. The method of claim 25, wherein the at least one two-dimensional feature comprises a measure of entropy based on the distribution of the positions of neighboring cells.

30. The method of claim 25, wherein the at least one two-dimensional feature comprises a measure of uniformity based on the distribution of the positions of neighboring cells.

31. The method of claim 25, wherein the at least one two-dimensional feature comprises a measure of density based on the distribution of the positions of neighboring cells.

32. The method of claim 1, wherein the one or more numerical features comprises at least one one-dimensional feature and at least one two-dimensional feature, the features being derived from a distribution of the positions of neighboring cells.

33. The method of claim 32, wherein the at least one one-dimensional feature and the at least one two-dimensional feature are derived from a distribution of the positions of neighboring cells, as a function of relative distance and angular orientation between the neighboring cells and a cell that corresponds to the matrix.

34. The method of claim 32, wherein the at least one one-dimensional feature and the at least one two-dimensional feature are derived from a distribution of the positions of neighboring cells, as a function of relative distance along each of two orthogonal coordinate directions, between the neighboring cells and a cell that corresponds to the matrix.

35. The method of claim 32, wherein the at least one one-dimensional feature comprises at least one of a mean, a standard deviation, a median, and a mode of a distribution of the positions of neighboring cells, as a function of distance between the neighboring cells and a cell that corresponds to the matrix.

36. The method of claim 32, wherein the at least one one-dimensional feature comprises at least one of a mean, a standard deviation, a median, and a mode of a distribution of the positions of neighboring cells, as a function of an angular orientation of the neighboring cells relative to a cell that corresponds to the matrix.

37. The method of claim 32, wherein the at least one two-dimensional feature comprises at least one of a measure of entropy, a measure of uniformity, and a measure of density, based on the distribution of the positions of neighboring cells.

38. The method of claim 1, further comprising determining pixel texture information for the at least some of the plurality of cells, and classifying the at least some of the plurality of cells based on the pixel texture information.

39. The method of claim 38, wherein the pixel texture information comprises first-order pixel texture information.

40. The method of claim 39, wherein the first-order pixel texture information comprises one or more of a mean, a median, a mode, a standard deviation, and a surface area that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

41. The method of claim 38, wherein the pixel texture information comprises second-order pixel texture information.

42. The method of claim 41, wherein the second-order pixel texture information comprises one or more of a measure of entropy, a measure of uniformity, and a measure of density that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

43. The method of claim 1, wherein the one or more images of the cells are derived from a set of multispectral sample images.

44. The method of claim 43, wherein the set of multispectral sample images are spectrally unmixed to produce the one or more images of the cells.

45. The method of claim 1, wherein the one or more images of the cells are derived from a set of red-green-blue (RGB) sample images.

46. The method of claim 45, wherein the set of RGB sample images comprises a single RGB image.

47. The method of claim 45, wherein the set of RGB sample images comprises two or more RGB images.

48. The method of claim 45, wherein the set of RGB sample images are spectrally unmixed to produce the one or more images of the cells.

49. The method of claim 45, wherein the set of RGB sample images are decomposed to produce the one or more images of the cells without spectral unmixing.

50. The method of claim 49, wherein the set of RGB sample images are decomposed to produce the one or more images of the cells, and wherein the decomposition comprises optical density conversion of the set of RGB sample images.

51. The method of claim 1, wherein the one or more images of the cells comprises a single image derived from a set of multispectral sample images.

52. The method of claim 1, wherein the one or more images of the cells comprises a single image derived from a set of RGB sample images.

53. A method, comprising: determining positions of a plurality of cells based on one or more images of the cells; for at least some of the plurality of cells, determining a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determining one or more numerical features from the distribution; and classifying the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

54. The method of claim 53, wherein determining positions of a plurality of cells comprises determining positions of nuclei of the cells.

55. The method of claim 54, wherein the positions of the nuclei are determined automatically from the one or more images.

56. The method of claim 53, wherein determining positions of a plurality of cells comprises determining positions of one or more structural features of the cells.

57. The method of claim 53, further comprising determining a second distribution of neighboring cells as a function of relative distance to the neighboring cells, determining one or more numerical features from the second distribution, and classifying the at least some of the plurality of cells based on numerical features determined from the second distribution.

58. The method of claim 53, further comprising, for each of the at least some of the plurality of cells, generating a matrix that comprises information about the relative angular orientation of neighboring cells.

59. The method of claim 58, wherein the matrix further comprises information about relative distance to the neighboring cells.

60. The method of claim 58, wherein the distribution of neighboring cells as a function of relative angular orientation of the neighboring cells is determined from elements of the matrix.

61. The method of claim 53, wherein the at least one of multiple classes comprises two classes.

62. The method of claim 61, wherein the two classes correspond to cancerous cells and non -cancerous cells.

63. The method of claim 53, wherein the at least one of multiple classes comprises more than two classes.

64. The method of claim 53, wherein the one or more numerical features comprises a mean of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells.

65. The method of claim 53, wherein the one or more numerical features comprises a standard deviation of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells.

66. The method of claim 53, wherein the one or more numerical features comprises a median of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells.

67. The method of claim 53, wherein the one or more numerical features comprises a mode of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells.

68. The method of claim 57, wherein the one or more numerical features determined from the second distribution comprises a mean of the positions of neighboring cells as a function of the relative distance to the neighboring cells.

69. The method of claim 57, wherein the one or more numerical features determined from the second distribution comprises a standard deviation of the positions of neighboring cells as a function of the relative distance to the neighboring cells.

70. The method of claim 57, wherein the one or more numerical features determined from the second distribution comprises a median of the positions of neighboring cells as a function of the relative distance to the neighboring cells.

71. The method of claim 57, wherein the one or more numerical features determined from the second distribution comprises a mode of the positions of neighboring cells as a function of the relative distance to the neighboring cells.

72. The method of claim 53, further comprising, for each of the at least some of the plurality of cells, determining one or more numerical features from a two-dimensional distribution of positions of neighboring cells, and classifying the at least some of the plurality of cells based on the one or more numerical features determined from the two-dimensional distribution.

73. The method of claim 72, wherein the one or more numerical features determined from the two-dimensional distribution comprises a measure of entropy.

74. The method of claim 72, wherein the one or more numerical features determined from the two-dimensional distribution comprises a measure of uniformity.

75. The method of claim 72, wherein the one or more numerical features determined from the two-dimensional distribution comprises a measure of density.

76. The method of claim 53, further comprising determining pixel texture information for the at least some of the plurality of cells, and classifying the at least some of the plurality of cells based on the pixel texture information.

77. The method of claim 76, wherein the pixel texture information comprises first-order pixel texture information.

78. The method of claim 77, wherein the first-order pixel texture information comprises one or more of a mean, a median, a mode, a standard deviation, and a surface area that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

79. The method of claim 76, wherein the pixel texture information comprises second-order pixel texture information.

80. The method of claim 79, wherein the second-order pixel texture information comprises one or more of a measure of entropy, a measure of uniformity, and a measure of density that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

81. The method of claim 53, wherein the one or more images of the cells are derived from a set of multispectral sample images.

82. The method of claim 81, wherein the set of multispectral sample images are spectrally unmixed to produce the one or more images of the cells.

83. The method of claim 53, wherein the one or more images of the cells are derived from a set of red-green-blue (RGB) sample images.

84. The method of claim 83, wherein the set of RGB sample images comprises a single RGB image.

85. The method of claim 83, wherein the set of RGB sample images comprises two or more RGB images.

86. The method of claim 83, wherein the set of RGB sample images are spectrally unmixed to produce the one or more images of the cells.

87. The method of claim 83, wherein the set of RGB sample images are decomposed to produce the one or more images of the cells without spectral unmixing.

88. The method of claim 87, wherein the set of RGB sample images are decomposed to produce the one or more images of the cells, and wherein the decomposition comprises optical density conversion of the set of RGB sample images.

89. The method of claim 53, wherein the one or more images of the cells comprises a single image derived from a set of multispectral sample images.

90. The method of claim 53, wherein the one or more images of the cells comprises a single image derived from a set of RGB sample images.

91. An apparatus, comprising: an imaging system configured to obtain one or more images of a sample comprising cells; and an electronic processor configured to: determine positions of a plurality of cells in the sample based on the one or more images of the sample; for at least some of the plurality of cells, generate a matrix comprising two-dimensional information about positions of neighboring cells, and determine one or more numerical features based on the information in the matrix; and classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

92. An apparatus, comprising: an imaging system configured to obtain one or more images of a sample comprising cells; and an electronic processor configured to: determine positions of a plurality of cells in the sample based on the one or more images of the sample; for at least some of the plurality of cells, determine a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determine one or more numerical features from the distribution; and classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

93. A computer program product configured to cause an electronic processor to: determine positions of a plurality of cells in a sample based on one or more images of the sample; for at least some of the plurality of cells, generate a matrix comprising two-dimensional information about positions of neighboring cells, and determine one or more numerical features based on the information in the matrix; and classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

94. A computer program product configured to cause an electronic processor to: determine positions of a plurality of cells in a sample based on one or more images of the sample; for at least some of the plurality of cells, determine a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determine one or more numerical features from the distribution; and classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Description

TECHNICAL FIELD

This disclosure relates to classification of biological samples, and in particular, to classification of disease states in cells.

BACKGROUND

Manual inspection and classification of biological samples can be both time consuming and prone to errors that arise from the subjective judgment of a human technician. As an alternative, automated classification systems can be used to examine biological samples such as tissue biopsies to provide information for clinical diagnosis and treatment.

SUMMARY

In general, in a first aspect, the disclosure features a method that includes: (a) determining positions of a plurality of cells based on one or more images of the cells; (b) for at least some of the plurality of cells, generating a matrix that includes two-dimensional information about positions of neighboring cells, and determining one or more numerical features based on the information in the matrix; and (c) classifying the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the method can include one or more of the following features.

Determining positions of a plurality of cells can include determining positions of nuclei of the cells. The positions of the nuclei can be determined automatically from the one or more images.

The matrix can include information about positions of the nuclei of neighboring cells relative to the nucleus of a cell corresponding to the matrix. The matrix can include information about positions of the nuclei of neighboring cells as a function of distance between the neighboring nuclei and the nucleus of the cell corresponding to the matrix. Alternatively, or in addition, the matrix can include information about positions of the nuclei of neighboring cells as a function of angular orientation relative to the nucleus of the cell corresponding to the matrix.

The matrix can include information about positions of the nuclei of neighboring cells as a function of distance along a first coordinate axis between the neighboring nuclei and the nucleus of the cell corresponding to the matrix, and the matrix can include information about positions of the neighboring cells as a function of distance along a second coordinate axis between the neighboring nuclei and the nucleus of the cell corresponding to the matrix, the second coordinate axis being orthogonal to the first coordinate axis.

Determining positions of a plurality of cells can include determining positions of one or more structural features of the cells. The one or more structural features can include a cellular membrane. The one or more structural features can include two or more structural features.

The two-dimensional information can include position information as a function of two quantities, where the matrix includes a first dimension corresponding to one of the quantities and a second dimension corresponding to the other of the quantities. The two-dimensional information can vary as a function of one of the quantities along the first dimension of the matrix, and the two-dimensional information can vary as a function of the other quantity along the second dimension of the matrix.

The at least one of multiple classes can include two classes. The two classes can correspond to cancerous cells and non-cancerous cells.

The at least one of multiple classes can include more than two classes.

The one or more numerical features can include at least one one-dimensional feature derived from a distribution of the positions of neighboring cells. The distribution can be derived from the elements of the matrix. For example, the distribution can be derived by summing elements along one dimension of the matrix.

The at least one one-dimensional feature can be derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix. The at least one one-dimensional features can be derived from a distribution of the positions of neighboring cells as a function of angular orientation relative to a cell corresponding to the matrix.

The at least one one-dimensional feature can include a mean of the positions of neighboring cells. Alternatively, or in addition, the at least one one-dimensional feature can include a standard deviation of the positions of neighboring cells. Alternatively, or in addition, the at least one one-dimensional feature can include a median of the positions of neighboring cells. Alternatively, or in addition, the at least one one-dimensional feature can include a mode of the positions of neighboring cells.

The one or more numerical features can include at least one two-dimensional feature derived from a distribution of the positions of neighboring cells. The distribution can be derived from the elements of the matrix.

The at least one two-dimensional feature can be derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix, and as a function of angular orientation of the neighboring cells relative to the cell corresponding to the matrix.

The at least one two-dimensional feature can be derived from a distribution of the positions of neighboring cells as a function of distance between the neighboring cells and a cell corresponding to the matrix along a first coordinate direction, and as a function of distance between the neighboring cells and the cell corresponding to the matrix along a second coordinate direction orthogonal to the first coordinate direction.

The at least one two-dimensional feature can include a measure of entropy based on the distribution of the positions of neighboring cells. Alternatively, or in addition, the at least one two-dimensional feature can include a measure of uniformity based on the distribution of the positions of neighboring cells. Alternatively, or in addition, the at least one two-dimensional feature can include a measure of density based on the distribution of the positions of neighboring cells.

The one or more numerical features can include at least one one-dimensional feature and at least one two-dimensional feature, the features being derived from a distribution of the positions of neighboring cells. The at least one one-dimensional feature and the at least one two-dimensional feature can be derived from a distribution of the positions of neighboring cells, as a function of relative distance and angular orientation between the neighboring cells and a cell that corresponds to the matrix. Alternatively, or in addition, the at least one one-dimensional feature and the at least one two-dimensional feature can be derived from a distribution of the positions of neighboring cells, as a function of relative distance along each of two orthogonal coordinate directions, between the neighboring cells and a cell that corresponds to the matrix. The at least one one-dimensional feature can include at least one of a mean, a standard deviation, a median, and a mode of a distribution of the positions of neighboring cells, as a function of distance between the neighboring cells and a cell that corresponds to the matrix. Alternatively, or in addition, the at least one one-dimensional feature can include at least one of a mean, a standard deviation, a median, and a mode of a distribution of the positions of neighboring cells, as a function of an angular orientation of the neighboring cells relative to a cell that corresponds to the matrix. The at least one two-dimensional feature can include at least one of a measure of entropy, a measure of uniformity, and a measure of density, based on the distribution of the positions of neighboring cells.

The method can include determining pixel texture information for the at least some of the plurality of cells, and classifying the at least some of the plurality of cells based on the pixel texture information. The pixel texture information can include first-order pixel texture information. The first-order pixel texture information can include one or more of a mean, a median, a mode, a standard deviation, and a surface area that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

The pixel texture information can include second-order pixel texture information. The second-order pixel texture information can include one or more of a measure of entropy, a measure of uniformity, and a measure of density that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

The one or more images of the cells can be derived from a set of multispectral sample images. The set of multispectral sample images can be spectrally unmixed to produce the one or more images of the cells.

The one or more images of the cells can be derived from a set of red-green-blue (RGB) sample images. The set of RGB sample images can include a single RGB image. Alternatively, the set of RGB sample images can include two or more RGB images. The set of RGB sample images can be spectrally unmixed to produce the one or more images of the cells. The set of RGB sample images can be decomposed to produce the one or more images of the cells without spectral unmixing. The set of RGB sample images can be decomposed to produce the one or more images of the cells, and the decomposition can include optical density conversion of the set of RGB sample images.

The one or more images of the cells can include a single image derived from a set of multispectral sample images. The one or more images of the cells can include a single image derived from a set of RGB sample images.

The method can also include any of the other steps and/or features disclosed herein, as appropriate.

In another aspect, the disclosure features a method that includes: (a) determining positions of a plurality of cells based on one or more images of the cells; (b) for at least some of the plurality of cells, determining a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determining one or more numerical features from the distribution; and (c) classifying the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the method can include one or more of the following features.

Determining positions of a plurality of cells can include determining positions of nuclei of the cells. The positions of the nuclei can be determined automatically from the one or more images.

Determining positions of a plurality of cells can include determining positions of one or more structural features of the cells.

The method can include determining a second distribution of neighboring cells as a function of relative distance to the neighboring cells, determining one or more numerical features from the second distribution, and classifying the at least some of the plurality of cells based on numerical features determined from the second distribution.

The method can include, for each of the at least some of the plurality of cells, generating a matrix that includes information about the relative angular orientation of neighboring cells. The matrix can include information about relative distance to the neighboring cells.

The distribution of neighboring cells as a function of relative angular orientation of the neighboring cells can be determined from elements of the matrix.

The at least one of multiple classes can include two classes. The two classes can correspond to cancerous cells and non-cancerous cells.

The at least one of multiple classes can include more than two classes.

The one or more numerical features can include a mean of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells. Alternatively, or in addition, the one or more numerical features can include a standard deviation of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells. Alternatively, or in addition, the one or more numerical features can include a median of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells. Alternatively, or in addition, the one or more numerical features can include a mode of the positions of neighboring cells as a function of the relative angular orientation of the neighboring cells.

The one or more numerical features determined from the second distribution can include a mean of the positions of neighboring cells as a function of the relative distance to the neighboring cells. Alternatively, or in addition, the one or more numerical features determined from the second distribution can include a standard deviation of the positions of neighboring cells as a function of the relative distance to the neighboring cells. Alternatively, or in addition, the one or more numerical features determined from the second distribution can include a median of the positions of neighboring cells as a function of the relative distance to the neighboring cells. Alternatively, or in addition, the one or more numerical features determined from the second distribution can include a mode of the positions of neighboring cells as a function of the relative distance to the neighboring cells.

The method can include, for each of the at least some of the plurality of cells, determining one or more numerical features from a two-dimensional distribution of positions of neighboring cells, and classifying the at least some of the plurality of cells based on the one or more numerical features determined from the two-dimensional distribution. The one or more numerical features determined from the two-dimensional distribution can include a measure of entropy. Alternatively, or in addition, the one or more numerical features determined from the two-dimensional distribution can include a measure of uniformity. Alternatively, or in addition, the one or more numerical features determined from the two-dimensional distribution can include a measure of density.

The method can include determining pixel texture information for the at least some of the plurality of cells, and classifying the at least some of the plurality of cells based on the pixel texture information. The pixel texture information can include first-order pixel texture information. The first-order pixel texture information can include one or more of a mean, a median, a mode, a standard deviation, and a surface area that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

The pixel texture information can include second-order pixel texture information. The second-order pixel texture information can include one or more of a measure of entropy, a measure of uniformity, and a measure of density that are determined based on intensity values of pixels in regions of the one or more images that correspond to the cells.

The one or more images of the cells can be derived from a set of multispectral sample images. The set of multispectral sample images can be spectrally unmixed to produce the one or more images of the cells.

The one or more images of the cells can be derived from a set of red-green-blue (RGB) sample images. The set of RGB sample images can include a single RGB image. Alternatively, the set of RGB sample images can include two or more RGB images.

The set of RGB sample images can be spectrally unmixed to produce the one or more images of the cells.

The set of RGB sample images can be decomposed to produce the one or more images of the cells without spectral unmixing. The set of RGB sample images can be decomposed to produce the one or more images of the cells, and the decomposition can include optical density conversion of the set of RGB sample images.

The one or more images of the cells can include a single image derived from a set of multispectral sample images. The one or more images of the cells can include a single image derived from a set of RGB sample images.

The method can also include any of the other steps and/or features disclosed herein, as appropriate.

In a further aspect, the disclosure features an apparatus that includes an imaging system configured to obtain one or more images of a sample that includes cells, and an electronic processor configured to: (a) determine positions of a plurality of cells in the sample based on the one or more images of the sample; (b) for at least some of the plurality of cells, generate a matrix that includes two-dimensional information about positions of neighboring cells, and determine one or more numerical features based on the information in the matrix; and (c) classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the apparatus can include any of the features disclosed herein, as appropriate.

In another aspect, the disclosure features an apparatus that includes an imaging system configured to obtain one or more images of a sample that includes cells, and an electronic processor configured to: (a) determine positions of a plurality of cells in the sample based on the one or more images of the sample; (b) for at least some of the plurality of cells, determine a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determine one or more numerical features from the distribution; and (c) classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the apparatus can include any of the features disclosed herein as appropriate.

In a further aspect, the disclosure features a computer program product configured to cause an electronic processor to: (a) determine positions of a plurality of cells in a sample based on one or more images of the sample; (b) for at least some of the plurality of cells, generate a matrix that includes two-dimensional information about positions of neighboring cells, and determine one or more numerical features based on the information in the matrix; and (c) classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the computer program product can include any of the features and/or steps disclosed herein, as appropriate.

In another aspect, the disclosure features a computer program product configured to cause an electronic processor to: (a) determine positions of a plurality of cells in a sample based on one or more images of the sample; (b) for at least some of the plurality of cells, determine a distribution of neighboring cells as a function of relative angular orientation of the neighboring cells, and determine one or more numerical features from the distribution; and (c) classify the at least some of the plurality of cells as belonging to at least one of multiple classes based on the numerical features.

Embodiments of the computer program product can include any of the features and/or steps disclosed herein, as appropriate.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description, drawings, and claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram showing a position matrix for a nucleus.

FIG. 2A is an image of a sample where regions of the image have been classified by an automated classification system.

FIG. 2B is an image of the same sample shown in FIG. 2A, where regions of the image have been classified by a manually-supervised classification system.

FIG. 3 is a receiver operator characteristic curve showing a relationship between a true positive classification rate and a false positive classification rate for automated classification of a sample.

FIG. 4 is a graph showing classification accuracy as a function of cutoff for automated classification of a sample.

FIG. 5 is an image of a tissue sample that has been treated with a DAB immunohistochemical (IHC) assay and counterstained with hematoxylin.

FIG. 6. is an image of a second tissue sample treated with a DAB IHC assay and counterstained with hematoxylin.

FIG. 7A is an image of a tissue sample showing cancerous regions identified by a pixel texture-based classifier.

FIG. 7B is an image of the same tissue sample as in FIG. 7A showing cancerous regions identified by a technician.

FIG. 8 is a receiver operator characteristic curve derived from classification of sample images using a classifier built with both relative nuclear position-based features and texture-based features.

FIG. 9 is a chart showing relative contributions of 14 different features to classifier accuracy and to a splitting metric.

FIG. 10 is a chart showing relative contributions of nine different features to classifier accuracy and to a splitting metric.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

In certain assessment protocols, biological samples are stained with one or more stains, and then images of the stained samples are used to classify regions of the sample into various classes according to criteria such as spectral properties of the various regions and local variations in image intensity. Suitable methods for acquiring spectral images of samples and classifying the samples according to such criteria are disclosed, for example, in: U.S. patent application Ser. No. 11/342,272 entitled "CLASSIFYING IMAGE FEATURES" by Richard Levenson et al., filed on Jan. 27, 2006, and published as U.S. Patent Publication No. US 2006/0245631; and U.S. patent application Ser. No. 11/861,060 entitled "SAMPLE IMAGING AND CLASSIFICATION" by Richard Levenson et al., filed on Sep. 25, 2007. The entire contents of each of the foregoing patent applications are incorporated herein by reference.

When a biological sample includes a plurality of cells that are to be classified, the classification can be based on certain structural arrangements of the cells within the tissue. This structural information can be used to supplement and, in some embodiments, be used in place of, information derived directly from variations in spatial and/or spectral intensities of individual pixels in images of the sample.

In the methods disclosed herein, structural information derived from the determination of the positions of cellular nuclei in a biological tissue sample can be used to classify individual cells as belonging to one of a variety of classes (e.g., cancerous or non-cancerous), and the classification results can then be used for clinical diagnosis and treatment applications. This classification can be useful on its own, or can be used as an aid to a technician, e.g., by directing the technician's attention to particular structures of interest in the sample, and/or by making a provisional assessment of the sample that is later reviewed and approved or refused by the technician.

In some embodiments, information derived from the arrangement of nuclear positions can be used to classify cells. This nuclear tissue architecture information provides quantitative measures of patterns and shapes formed by the distribution of nuclei within the sample. It is important to distinguish the nuclear tissue architecture information, which derives from statistics of the relative positions of nuclei in the sample, from ordinary "pixel texture" information, which derives from the statistics of intensity values at regular locations in a pixel grid.

Information derived from nuclear position measurements can include both one-dimensional and two-dimensional statistics regarding the arrangement of nuclei within the sample. One-dimensional information can include, for example, any information about the distribution of neighboring nuclei relative to the nucleus of a cell to be classified, for which the distribution can be expressed as a function of a single variable; this variable can be the distance between nuclei and the cell to be classified, or the angular orientation of imaginary line segments that connect the nuclei to that cell. Measures which can be derived from a one-dimensional distribution include statistical measurements of the distance distribution or of the angular distribution, such as the mean, median, mode, standard deviation, uniformity, and/or other statistical indices.

Two-dimensional information can include, for example, any information about the distribution of neighboring nuclei relative to the nucleus of a cell to be classified, for which the distribution can be expressed as a function of two variables; for example, the variables can be the distance between nuclei and the cell to be classified, and the angular orientation of imaginary line segments that connect the nuclei to that cell. Two-dimensional information can be represented by a position matrix determined for a cell, which describes the relative position of the surrounding nuclei in terms of their angular and positional distribution. In some embodiments, relative nuclear position information can be expressed in Cartesian terms (e.g., displacement in X and displacement in Y directions). Suitable "nuclear texture" measures can include a position matrix showing the distribution of distances to neighboring nuclei and angular orientations of neighboring nuclei, from which statistical measures can be derived.

The measures derived from relative nuclear position distributions can then be used as input to an automated classifier that assigns each cell to a class based on one-dimensional information, two-dimensional information, or a combination of one-dimensional and two-dimensional information.

Information derived from nuclear position can also be combined with other information about the image such as pixel texture, brightness, and other information, and classification can be based on the combined information to obtain better classification accuracy than is possible with either information set used alone.

Although the present examples are concerned with nuclear position, and use nuclear position to provide an estimate of cell location, other estimates of cell location can also be used in place of, or in addition to, nuclear position, for particular applications. For example, in samples where the membrane of cells are visible, membrane location can be used to develop position matrices in addition to, or instead of, the nuclear position described in the present examples.

The methods disclosed herein include derivation of information based on nuclear positions in sample images; provided the staining procedures are adequately controlled to a sufficient degree that nuclear positions can be accurately determined from the images, the derived information is typically not substantially affected by modest variations in staining or tissue preparation. Classification of cells using position information (e.g., nuclear position information) can therefore be less sensitive to variations in staining procedures than classification methods that rely more heavily on staining density and/or on spectral properties of regions of a sample.

As a first step in the classification of cells using nuclear position information, one or more images of the sample are acquired using an optical imaging system. In some embodiments, the set of acquired images includes multispectral images, and the images are spectrally unmixed to obtain a new set of images. Typically, one or more members of the new set of images is then selected for analysis. For example, the images can be spectrally unmixed to decompose the measured images into a new set of images, each of which corresponds to a single spectral contribution such as a specific stain applied to the sample (e.g., hematoxylin), or to another component such as an autofluorescence component. Methods for spectral unmixing are disclosed, for example, in: U.S. patent application Ser. No. 10/669,101 entitled "SPECTRAL IMAGING OF DEEP TISSUE" by Richard Levenson et al., filed on Sep. 23, 2003, and published as U.S. Patent Publication No. US 2005/0065440; and in PCT Patent Application No. PCT/US2004/031609 entitled "SPECTRAL IMAGING OF BIOLOGICAL SAMPLES" by Richard Levenson et al., filed on Sep. 23, 2004, and published as PCT Patent Publication No. WO2005/040769. The entire contents of each of the foregoing patent applications are incorporated herein by reference.

Alternatively, conventional color (RGB) images can be decomposed into components using an estimate of the color of each stain, preferably using techniques of optical density conversion to approximately linearize the effect of the various stains, and reduce interaction (spectral cross-talk) effects. In some embodiments, a single monochrome or color image can be used without color decomposition or spectral unmixing steps.

Following the selection of one or more suitable image(s), the image(s) is/are analyzed to determine the positions of the nuclei. In some embodiments, for example, the nuclear positions can be determined manually by a system operator (e.g., via a computer display that shows the selected image(s) and permits the operator to indicate positions of individual nuclei). In certain embodiments, identification of nuclear positions is performed automatically by a computer-based algorithm (e.g., a formula-based algorithm, or a machine-learning algorithm such as a trained neural network and/or a genetic algorithm). Nuclear positions are determined for all cells within a selected region of interest of the image to be analyzed. When the positions of the nuclei have been determined, each nucleus within the region of interest is chosen in turn and the distribution of its neighboring nuclei is assessed.

To assess the distribution of neighboring nuclei, a position matrix is constructed. An exemplary position matrix for a particular cell is shown in FIG. 1. The position matrix can be constructed automatically according to operator-specified parameters such as a maximum distance d.sub.max at which another nucleus can be considered a neighbor, a distance resolution .DELTA.d for the matrix (.DELTA.d=2 pixels in FIG. 1), and an angular resolution .DELTA..theta. for the matrix (.DELTA..theta.=20 degrees in FIG. 1). The elements in the position matrix are initially set to 0, and for each surrounding cell that meets the angle and distance criteria for a given matrix element, the matrix element's value is incremented by 1.

Nuclear position information can be derived directly from images of the sample or from the position matrices for each nucleus in the sample. Nuclear position information typically includes, for example, a variety of one-dimensional statistical measures of the distribution of neighboring nuclei such as the mean, standard deviation, energy, entropy, and density, these measures being expressed as a function of a single variable, such as distance or angular orientation of surrounding nuclei. In some embodiments, this information is determined by collapsing the position matrices into one-dimensional histograms (e.g., one histogram expressed as a function of distance, the other as a function of angular orientation). In other embodiments, some or all of the nuclear position information is determined directly from the statistics of the two-dimensional distribution of positions for the neighboring nuclei.

Exemplary two-dimensional measures derived from the position matrices can include the following: entropy=.SIGMA..SIGMA.p log p uniformity=.SIGMA..SIGMA.p.sup.2 density=.SIGMA..SIGMA.[p.sup.2/[(k+1).DELTA.d]] where the dual sums run over all position matrix elements p, and k is the number of columns (or rows) over which the position matrix information extends. In addition, Haralick et. al. describe measures which can be derived from two-dimensional distribution matrices (such as the position matrices discussed above) in "Textural Features for Image Classification", IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-3, pp. 610-621 (1973), the entire contents of which are incorporated herein by reference. The measures described therein can be applied to the nuclear position matrices disclosed herein rather than to the gray-level co-occurrence matrices used in that paper.

The various one-dimensional and two-dimensional measures are used as components in a feature vector associated with each cell to be classified. In addition to one-dimensional and two-dimensional measures derived from the position matrices and/or images, the feature vector can also include other measures that complement the position measures to improve classification accuracy. For example, in some embodiments, the feature vector for each cell also includes measures derived from pixel texture information in regions surrounding the cell. In general, a wide variety of pixel texture information can be included in the feature vector. First-order pixel texture measures derived from a circular region around the nuclear center can include: mean, median, mode, standard deviation, and surface area. Other pixel texture measures can also be derived including, for example, normalized variance and/or measures derived from two-dimensional gray-level co-occurrence matrices. These pixel-based measures can be calculated using pixel intensity values from individual images in the sample image set, such as the signal strength in one spectral component in a set of unmixed spectral images derived from a multispectral image stack or RGB image. Pixel-based measures can also be calculated based on pixel intensity values from one plane of a multispectral image stack or RGB image, or based on pixel intensity values from an image which represents a mixture or combination of several individual images, such as a mean signal strength or summed signal strength. The foregoing automated analysis is repeated for all cells to be classified, to obtain feature vectors for each of the cells.

A variety of different automated classifiers can be used to automatically classify sample cells based on the feature vector information derived from sample images. In particular, machine-learning classifiers such as neural network-based classifiers can be used to classify cells based on nuclear position information, or on nuclear position information and pixel texture information derived from a small region centered on the cell. Other classifiers can also be used, including randomForest-based classifiers and genetic algorithm-based classifiers.

As a first step in using a classifier, the classifier is trained to recognize certain classes of cells based upon feature vectors developed from previously-classified standards. The previously classified standards can include, for example, selected regions of the sample that have been classified manually by a trained technician. Alternatively, or in addition, the standards can include other images which have been correctly classified either by a technician or by another automated classifier.

Random forest-based classifiers can be used to classify sample cells using features derived from images as disclosed herein. In particular, a classifier implemented in the R statistical programming language and based on the randomForest package can be used. The random forest classifier (RFC) is an ensemble classification system, and uses CART trees as the units of the ensemble. The RFC does not require a separate data set for validation, due to the manner in which the ensemble is created. Each CART tree is created using a bootstrap sample of the original data set; cases used in the bootstrap sample are referred to as "in-bag" cases, and those not included are referred to as "out-of-bag" cases. When the classifier reports the prediction error on a per tree basis, only those cases that were out-of-bag for that tree are used in the evaluation. The out-of-bag error estimate for the classifier is the average error estimate for all trees in the ensemble.

The randomForest package includes a number of adjustable parameters. In particular, the number of trees (ntree) and the voting method by the forest (cutoff) can be adjusted. The ntree parameter can be adjusted to accommodate computer memory restrictions and/or allow for modeling of complex relationships. The cutoff parameter can be adjusted to force the classifier to favor one type of error over another (e.g., false positive errors can be favored or disfavored relative to false negative errors). The randomForest package generates a confusion matrix illustrating the performance of the classifier based on predicted values of the out-of-bag error. Information about the relative contributions of the various features that function as input to the classifier is provided in the form of a ranking of the importance of the various features. Receiver operator characteristic curves can be calculated to provide a more complete assessment of classifier performance, and the area under the receiver operator characteristic curves can be determined. These curves can be generated, for example, using the ROCR package implemented in the R statistical language.

Once trained, the classifier can be used to classify cells in a sample of interest based on the nuclear position information and other information such as pixel texture information. Typically, as discussed above, the elements of the feature vector for each cell correspond to measures derived from the positions of neighboring nuclei, and/or to measures derived from pixel texture information. Each feature vector functions as input to the classifier, which then operates on the elements of the feature vector and determines a classification for the associated cell. These operations are repeated until classification of all desired cells in the sample is complete.

The methods disclosed herein can be used to classify a wide variety of different types of samples. In some embodiments, for example, the methods can be used to classify samples taken from prostate glands. In certain embodiments, the methods can be used to classify other types of samples, including samples extracted from other body tissues. The methods can be used to classify portions of the samples according to various classification schemes. For example, portions of the samples can be classified as either cancer cells, or something other than cancer cells. Tissue samples can be classified as either cancerous or benign. In general, samples (and portions thereof) can be classified into a wide variety of different classes based on the nature of particular applications.

The steps described above in connection with various methods for collecting, processing, analyzing, interpreting, and displaying information from samples can be implemented in computer programs using standard programming techniques. Such programs are designed to execute on programmable computers or specifically designed integrated circuits, each comprising an electronic processor, a data storage system (including memory and/or storage elements), at least one input device, and least one output device, such as a display or printer. The program code is applied to input data (e.g., images from the detector) to perform the functions described herein and generate output information (e.g., images showing classified regions of samples, statistical information about sample components, etc.), which is applied to one or more output devices. Each such computer program can be implemented in a high-level procedural or object-oriented programming language, or an assembly or machine language. Furthermore, the language can be a compiled or interpreted language. Each such computer program can be stored on a computer readable storage medium (e.g., CD ROM or magnetic diskette) that when read by a computer can cause the processor in the computer to perform the analysis and control functions described herein.

EXAMPLES

The disclosure is further described in the following examples, which are not intended to limit the scope of the claims.

Example 1

A set of experiments was performed on images of a tissue section to assess the accuracy of automated classification, and specifically to evaluate the classification of samples on the basis of relative nuclear position information, pixel texture information in the region surrounding the nuclei, and a combination of the two. Analysis of the first image, shown in FIG. 5, led to the manual identification of 397 cellular nuclei. With parameters d.sub.max=100 pixels, .DELTA.d=20 pixels, .DELTA..theta.=45 degrees, and a circular radius of 5 pixels for assessing statistical measures of pixel textures surrounding the nuclei, various features for each of the identified nuclei were determined and submitted to a validated randomForest automated classifier for classification of each cell as either normal or cancerous. When both nuclear position information and pixel texture information were used in the classification, the error rate was 16% and area under the receiver operator characteristic curve, AUC, was 0.85. When only measures derived from the position matrices (e.g., nuclear position information) were used in the classification, the error rate was 15% and AUC was 0.83. For the image shown in FIG. 5, it was apparent that pixel texture information did not contribute significantly to the accuracy of classification.

A similar classification procedure was applied to a portion of a second image, shown in FIG. 6. Analysis of the second image led to the manual identification of 1132 cellular nuclei, of which 724 were identified as belonging to cancerous cells and 408 were identified as belonging to non-cancerous cells. With parameters d.sub.max=100 pixels, .DELTA.d=20 pixels, .DELTA..theta.=45 degrees, and a circular radius of 5 pixels for statistical measures of pixel textures surrounding the nuclei, various features for each of the identified nuclei were determined and submitted to a validated randomForest automated classifier. When both nuclear position information and pixel textural information were used in the classification, the error rate was 17% and AUC was 0.875. When only measures derived from the position matrices were used in the classification, the error rate was 35% and AUC was 0.68. For the image shown in FIG. 6, it was apparent that classification on the basis of both nuclear position information and pixel textural information significantly enhanced the accuracy of the classification results.

Example 2

In a second study, four individual images of a sample were acquired under 20.times. magnification in a microscope imaging system (Aperio Technologies Inc., Vista, Calif.), and each of the color images from the microscope imaging system was decomposed to yield an image corresponding to the hematoxylin component in the sample. One image was selected for analysis from among the four hematoxylin component images, and positions of cell nuclei in the selected image were identified manually by a system operator. Position matrices were constructed for each of the identified nuclei. Parameters used to construct the position matrices were as follows: d.sub.max=100 pixels, .DELTA.d=20 pixels, and .DELTA..theta.=20 degrees. The position matrices were collapsed to one-dimensional histograms in both distance and angular orientation, and uniformity, mean, and standard deviation measures were extracted from each of the histograms. Two-dimensional measures extracted from the position matrices were entropy, uniformity, and density.

In addition, pixel intensity-based statistical measures were derived from circular regions centered upon the nuclei of each of the cells to be classified. The pixel-based measures included the mean, median, mode, standard deviation, and surface area of pixel intensities in the circular regions. For this analysis, the radius of the circular region was 20 pixels.

The nuclear position information and pixel textural information derived from the image analysis for each cell was then submitted to an automated classifier, and the sample cells were classified. Two different automated classifiers were used. The first was a randomForest classifier, which used nuclear position information from two different subset regions of one image for training and validation, respectively. The second automated classifier used was a neural network-based classifier that incorporated cross-validation by splitting the input data into groups containing 90% and 10% for training and test, respectively. Selection of points was performed at random for each trial. The classification results were compared to manually-supervised classification results for the same image to evaluate the accuracy of the classification method.

Operating on one image selected from the set of four acquired images and using the randomForest classifier, sample cells were classified and then compared to the manually-supervised classification results. Based on the comparison, the accuracy of the automated classification was estimated to be 85% on average. The classification was repeated six times, with accuracies ranging from 85.2% to 85.8%, depending on the random number used to seed the classifier. The AUC was found to be 0.885, and the top 50% of the features were the mode, the angular standard deviation, the angular uniformity, the mean, the median, the entropy, and the density. Some of these features--for example, angular measures--can be derived from one-dimensional analysis of the position information, while others (such as entropy) are derived from the two-dimensional information in the position matrices.

FIGS. 2A and 2B show a comparison between automated classification results, indicated by shading, and the manually-supervised classification results, for one of the hematoxylin component images. The image in FIG. 2B shows manually-supervised classification of an image of the sample. The image shown in FIG. 2A shows automated classification of the same image. In FIG. 2B, classification was based on pixel texture information derived from the image. Erroneously classified regions of the sample are shown circled in black in FIG. 2B.

Operating on a second image selected from the set of four acquired images and again using the randomForest classifier, a comparison between the automated classification of sample cells and the manually-supervised classification indicated an estimated accuracy of automated classification to be 93% on average (classification repeated twice, with identical accuracy each time). The AUC was found to be 0.97, and the top 50% of the features were the standard deviation, the angular standard deviation, the surface area, the mean, the angular uniformity, the density, and the radial uniformity.

To assess the effects of using multiple images, the image data from all four of the hematoxylin component images was combined to form a single data set, and this data set was analyzed and classified according to the same procedure used for each of the individual images. The results from the randomForest classifier indicated an automated classification accuracy of 84.5% on average (classification repeated three times with different seeds, with accuracies of 84.5%, 84.8%, and 84.7%). An analysis of the classification error revealed that the error rate was a result of cancerous cells classified as non-cancerous (5%), and non-cancerous cells classified as cancerous (41%). For this classification trial, the AUC was found to be 0.89, and the top 50% of the features were the standard deviation, the mode, the angular standard deviation, the mean, the angular uniformity, the median, and the surface area. FIG. 3 is a receiver operator characteristic curve showing the variation of the true positive identification rate as a function of the false positive identification rate for classification based on the combined image data.

The combined image data from the four images was also classified using the neural network-based classifier with cross-validation. Ten classification trials were run, with accuracy rates ranging from 73% to 81%.

To evaluate the relative contributions of nuclear position information and pixel texture information to the overall accuracy of automated classification, a randomForest classifier was created and validated first using only nuclear position information, and then using only pixel textural information from the circular regions surrounding the nuclear centers. Automated classification based only on nuclear position information yielded an accuracy of 77% with AUC of 0.77 following comparison to the manually-supervised classification; automated classification based only on pixel texture information yielded an accuracy of 79% with AUC of 0.85. Classifying based on either information set individually yielded poorer results than classifying based on the combination of nuclear position information and pixel texture information.

The particular data set represented by the images includes an over-representation of cancerous cells relative to non-cancerous cells, as evidenced by the relatively high false-positive error rate (e.g., 41%). FIG. 4 shows a plot of accuracy as a function of cutoff, and further suggests that a more balanced data set would yield higher classification accuracy.

To assess the effects on classification accuracy of a more balanced data set, an additional experiment using the combined image data from the four acquired images was performed. A random group of cancer cells were removed from consideration to create a class-balanced reduced data set, and the reduced data set was automatically classified with the randomForest classifier. Classification accuracy, determined by comparing the automatic classification results to the manually-supervised classification results, was determined to be 81%. The error rate at which cancerous cells were classified as non-cancerous was 15%, and the error rate at which non-cancerous cells were classified as cancerous was 22%. The AUC was measured as 0.88, and the top 50% of the features were mode, angular standard deviation, mean, standard deviation, median, entropy, and radial uniformity. Using the randomForest classifier and only these features, a classification accuracy rate of 79.5% was achieved.

Example 3

In a third study, immunohistochemical images of both cancerous and non-cancerous prostate gland were classified using the methods disclosed herein to investigate the discriminatory ability of the methods. Eight tissue microarray samples were obtained from eight different patients. The samples, each about 5 microns thick and 600 microns in diameter, were embedded in paraffin and stained with hematoxylin and DAB. Images of each of the samples were obtained.

In each of the eight images, nuclear centers were manually identified. In total, 8931 nuclear centers were identified, with 4391 (about 49%) designated as non-cancerous. Each of the eight images was also classified using pixel texture-based methods to identify a region of interest (e.g., a region that includes prostate gland cells) in each image. The images were spectrally unmixed into planes corresponding to the hematoxylin and DAB immunohistochemical stains. Using pixel texture-based methods, an automated classification system can guide a system operator to a region of interest in a sample, without requiring the operator's manual intervention. FIG. 7A shows an example of a region of interest identified in one sample using pixel texture-based classification methods. In FIG. 7A, cancerous cells identified using the texture-based methods are shaded medium grey, and labeled "C." For comparison, an image of the same region of interest in the sample in shown in FIG. 7B; cancerous regions identified by a pathologist are shaded medium grey and labeled "C."

In each of the eight images, following identification of a suitable region of interest, the methods disclosed herein were used to classify cells within the region of interest. Position matrices were constructed with parameter values d.sub.max=100 pixels, .DELTA.d=20 pixels, and .DELTA..theta.=20 degrees. Features were extracted from the position matrices as discussed above. For each nuclear center, a total of 14 features were obtained: nine features were extracted from relative nuclear position matrices, and five features (mean, median, mode, surface area, and standard deviation) were derived from nuclear texture information (with a nuclear texture radius=20 pixels). Nuclear texture information was determined from the hematoxylin plane of the images. Using a randomForest algorithm, classifiers were developed and validated. The number of trees was set to 500. Independent classifiers were built with all 14 features, with only the features extracted from relative nuclear position co-occurrence matrices, and with only the features derived from nuclear texture information, to evaluate the contribution of different types of features to overall classifier performance.

Table 1 shows the confusion matrix obtained from a classifier built with all 14 features.

TABLE-US-00001 TABLE 1 Non cancer Cancer Class Error Rate Non cancer 3400 991 0.23 Cancer 838 3702 0.18

Operating on all of the images, the out-of-bag error rate was 20.5%. The receiver operator characteristic curve for this classification is shown in FIG. 8. The area under the curve is 0.88. Each of the 14 features was assessed for its relative contribution to the classifier (based on the accuracy of the classification results obtained), and for its influence on the randomForest splitting metric. FIG. 9 shows the results of these assessments. On the left side of FIG. 9, each of the 14 features is ranked according to its relative contribution to the accuracy of the classifier. On the right side of FIG. 9, each of the 14 features is ranked according to its influence on the splitting metric.

Table 2 shows the confusion matrix obtained from a classifier built with only the nine features derived from relative nuclear position co-occurrence matrices.

TABLE-US-00002 TABLE 2 Non cancer Cancer Class Error Rate Non cancer 3228 1163 0.26 Cancer 1004 3536 0.22

The out-of-bag error rate for this classifier, operating on all eight of the images, was 24.3%. The area under the receiver operator characteristic was 0.83. FIG. 10 shows the results of assessing the contributions of each of the nine features to the classifier. On the left side of FIG. 10, the features are ranked in order of their contribution to the accuracy of the classifier. On the right side of FIG. 10, the features are ranked in order of their influence on the splitting metric.

Table 3 shows the confusion matrix obtained from a classifier built with only the five features derived from nuclear texture information.

TABLE-US-00003 TABLE 3 Non cancer Cancer Class Error Rate Non cancer 2758 1633 0.37 Cancer 2158 2382 0.48

For this classifier operating on all eight images, the out-of-bag error rate was 42.5%, and the area under the receiver operator characteristic curve was 0.60.

Based on the classification results for each of the three classifiers in this study, for small data sets, the use of both features derived from relative nuclear position co-occurrence matrices and from nuclear texture information (e.g., all 14 features, as discussed above) assists in the discrimination between normal and cancerous prostate gland cells. Improved classification performance (e.g., accuracy) is obtained relative to classifications based on nuclear texture information alone.

Other Embodiments

A number of embodiments have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. For example, in some embodiments, nuclear position and/or pixel texture information can be combined with other information such as spectral information to further increase the accuracy of automated classification results. Multiple sample images, each having different spectral information, can be analyzed and the information therefrom combined with nuclear position and/or pixel texture information. Features derived from any of these different types of information can be used to classify samples.

The methods disclosed herein can be used with a variety of classification schemes and/or scoring systems. For example, in certain embodiments, the methods disclosed herein can be used in Gleason scoring for classifying samples that include cancerous cells such as prostate cells. In Gleason scoring, structural features of a sample are identified and used to assign a certain score (e.g., a Gleason score) to an image of the sample for purposes of identifying potentially cancerous samples. Structural features that are used to assign the Gleason score include, for example, a relative organization of nuclei in certain regions of the sample. Methods for Gleason scoring are disclosed, for example, in: A. De la Taille et al., "Evaluation of the interobserver reproducibility of Gleason grading of prostatic adenocarcinoma using tissue microarrays," Hum Pathol., vol. 34, No. 5. (May 2003), pp. 444-449; and E. B. Smith et al., "Gleason scores of prostate biopsy and radical prostatectomy specimens over the past 10 years. Is there evidence for systematic upgrading?" Cancer, vol. 94, no. 8, 2002, pp. 2282-2287. The entire contents of each of these references is incorporated herein by reference.

Other embodiments are within the scope of the following claims.

* * * * *