U.S. patent application number 11/680069 was filed with the patent office on 2007-09-13 for fast gridding of irregular data.
This patent application is currently assigned to AMBERCORE SOFTWARE INC.. Invention is credited to Sylvain De Margerie, Yuriy Monastirev, Alexander Montastirev, Boris Voribiov.
Application Number | 20070211077 11/680069 |
Document ID | / |
Family ID | 38478480 |
Filed Date | 2007-09-13 |
United States Patent
Application |
20070211077 |
Kind Code |
A1 |
Voribiov; Boris ; et
al. |
September 13, 2007 |
FAST GRIDDING OF IRREGULAR DATA
Abstract
A method of fast gridding of irregular data, has been developed
for spatial interpolation of large irregular spatial point data
sets; for example building a 3D geographic terrain grid surface
from billions of irregularly spaced xyz coordinates on the earth's
surface. The method developed typically translates into many orders
of magnitude gain in computational speed. For example, to produce a
gridded data set (having M rows and N columns) from P irregularly
located sampling points, the computational steps required can be
reduced from a number of the order of O(M.times.N.times.P) to a
lesser number of the order of O(M.times.N+P) operations. The method
achieves this by ensuring that each of the P sampling points is
visited only once. This is particularly significant since spatial
data collection devices typically collect data points in the
billions. The method described is readily extendible to any number
of dimensions.
Inventors: |
Voribiov; Boris; (Ottawa,
CA) ; Monastirev; Yuriy; (Kharkov, UA) ;
Montastirev; Alexander; (Kharkov, UA) ; De Margerie;
Sylvain; (Ottawa, CA) |
Correspondence
Address: |
AMBERCORE SOFTWARE INC.
2081 MERIVALE RD., SUITE 1300
OTTAWA
ON
K2G-1G9
US
|
Assignee: |
AMBERCORE SOFTWARE INC.
Ottawa
CA
|
Family ID: |
38478480 |
Appl. No.: |
11/680069 |
Filed: |
February 28, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60767191 |
Mar 9, 2006 |
|
|
|
Current U.S.
Class: |
345/606 ;
345/611 |
Current CPC
Class: |
G06T 3/4007
20130101 |
Class at
Publication: |
345/606 ;
345/611 |
International
Class: |
G09G 5/00 20060101
G09G005/00 |
Claims
1. A method for producing an M.times.N gridded data set from P
irregularly located sampling points which requires only
O(M.times.N+P) operations comprising the steps of: visiting each of
the P sampling points only once; computing for each sampling point,
corresponding grid indices in an M.times.N grid pattern by taking
the modulus of the point coordinates; matching each sampling point
to one of a single grid index and a pattern of
m.sub.i.times.n.sub.i grid indices distributed around the sample
point; accumulating values in a first array and incrementing
counters in a second array where multiple points are matched to the
same grid index; scanning the grid a single time, once all P
sampling points have been processed, and calculating the grid
values using one of averaging and interpolation based on
surrounding values; and saving the second array as a point density
by-product.
2. The method of claim 1, further comprising the step of
pre-processing in which the sampling points are aggregated.
3. The method of any of claims 1 and 2, further comprising the step
of post processing which fills holes in the gridded data set with
smoothed, interpolated values.
4. The method of any of claims 1, 2 and 3, further comprising the
step of post processing which fills holes in the gridded data set
with one of an arbitrary null value, an average value and an
interpolated value based in the values of the gridded data
bordering each of the holes.
5. The methods of any of claims 1, 2, 3 and 4, wherein the gridded
data set has three or more dimensions.
6. The methods of any of claims 1, 2, 3, 4 and 5, wherein each grid
index in the gridded data set has two or more associated values.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This non provisional application relates to a Provisional
Application
BACKGROUND OF THE INVENTION
Introduction
[0002] The gradual accumulation of spatial data in all fields of
science as well as the increasing capacity of data acquisition
equipment, such as LIDAR (Light Distancing And Ranging) for
airborne topographic mapping, is giving rise to an exponentially
growing volume of information. This growth exceeds Moore's law
doubling computation capacity every 18 months. The common approach,
when carrying out analyses on computing platforms, is to throw more
computational resources at the problem. This approach is reaching
its scalability limit and, once again, the development of smart and
efficient algorithms is becoming paramount for practical processing
gargantuan data sets.
[0003] Spatial analysis based on regular rectangular grids is one
method of improving the efficiency of data processing and modeling.
The regular grid affords innumerable advantages in statistical
analysis, frequency domain analysis, linear or non-linear modeling
and so on. In addition, the regularity of rectangular grids allows
the use of hardware assisted vector processing techniques which
further leverage Moore's law.
Problem Statement
[0004] Most collected data is sampled in irregular or semi regular
patterns; therefore an important step in data processing is the
marshalling of data into a uniform spatial grid. This is done by
generating a grid G of M by N cells from P distributed sampling
points characterized by their value Z.sub.p, and their coordinates
X.sub.p and Y.sub.p. The prototypical gridding method uses the
Inverse Distance Weighing (IDW) method to interpolate a value at
each grid point involving the following computation for each grid
cell G.sub.n,m.
G n , m = p = 1 P V p / ( X p - m .DELTA. x ) 2 + ( Y p - n .DELTA.
y ) 2 / p = 1 P 1 / ( X p - m .DELTA. x ) 2 + ( Y p - n .DELTA. y )
2 ##EQU00001##
[0005] This requires P compound operations for each of M.times.N
cells for a total of M.times.N.times.P compound operations. Other
techniques, such as krigging, triangulated irregular network (TIN)
and spline interpolation, also involve the sequential revisiting of
each of the P sampling points for each of the M.times.N grid
points.
[0006] If the grid is to approximate the spatial resolution of the
original data set one would expect the number of grid points
M.times.N to be proportionate to the number of sampling points P.
This yields that processing power needed from gridding a data set
grows with P.sup.2. Clearly such brute force gridding techniques
become untenable with very large data sets.
[0007] Two factors affect the scalability of large data set
analysis: processing power (i.e. number of operations required) and
memory usage (i.e. number and size of storage units). Often one of
these features is normally compromised to optimize the other. A
modest geographic data set by today's standard might be a digital
terrain model of an urban area having point samples at an average
resolution of In over an area of 10 km.times.10 km (100 km2). This
is 100,000,000 data points with an X, Y and Z value which if
represented in double precision translate to 2.3 Gb of
information.
[0008] The popular LiDAR (Light Distancing and Ranging) technology
now commonly delivers as many as 1 to 2 billion X, Y, Z coordinates
in a single surveyed region. This represents 20 to 40 Gb of raw
information. This will stress the memory capacity of many current
computer systems especially if more fields are needed for the
storage of intermediate results.
[0009] With processing requirements of the order of O(P.sup.2) and
assuming only one operation per point, current hardware requires
hundreds of hours of calculations. This clearly places interactive
processing out of reach and reduces the feasibility of exploratory
analysis, that is, performing what if scenarios to aid in decision
making.
BRIEF SUMMARY OF THE INVENTION
[0010] A method of fast gridding of irregular data, has been
developed for spatial interpolation of large irregular spatial
point data sets; for example building a 3D geographic terrain grid
surface from billions of irregularly spaced xyz coordinates on the
earth's surface. The method developed typically translates into
many orders of magnitude gain in computational speed. For example,
to produce a gridded data set (having M rows and N columns) from P
irregularly located sampling points, the computational steps
required can be reduced from a number of the order of
O(M.times.N.times.P) to a lesser number of the order of
O(M.times.N+P) operations. The method achieves this by ensuring
that each of the P sampling points is visited only once. This is
particularly significant since spatial data collection devices
typically collect data points in the billions. The method described
is readily extendible to any number of dimensions.
BRIEF DESCRIPTION OF THE DRAWING
[0011] FIG. 1 is a schematic representation of an exemplary
embodiment of a method for fast gridding of irregular data.
DETAILED DESCRIPTION OF THE INVENTION
Single Pass Point Processing
[0012] In an exemplary embodiment of a fast gridding method 100,
when building a surface grid (having M rows and N columns) from P
irregularly located sampling points, the fast gridding method 100
reduces the processing power required for data gridding to the
order of O(P). This is accomplished through an algorithm where each
individual point, p, is visited only once 102. For each such point,
corresponding grid indices are easily computed by taking the
modulus of the point coordinates 104. This essentially reverses the
traditional processing order and takes immediate advantage of the
regularity of the grid. Instead of scanning each grid point and
then searching though all the samples to find a match, all the data
points are scanned and, through simple arithmetic, the indices of
the corresponding grid points are found.
[0013] In many instances, several sampling points may be matched to
the same grid index and some means of estimating a value at the
grid point from multiple nearby samples is necessary. To accomplish
this, two arrays are formed, that are initialized to zero, and for
each sample point the following is assigned 108: SUM.sub.m, n.,
where the contribution of each sample assigned to (m, n) is
cumulated; COUNT.sub.m, n, which is incremented each time the a
sample is assigned to (m, n).
[0014] Once all P sampling points are processed, the grid can be
scanned once to calculate the grid value Z.sub.m,n for each grid
cell 110. The estimation of Z.sub.m,n is an O(M.times.N) process
using standard techniques such as averaging:
Z.sub.m,n=SUM.sub.m,n/COUNT.sub.m,n
[0015] With appropriately selected M and N the resulting grid will
have a spatial resolution comparable to the original data but since
the coordinates X.sub.m,n, and Y.sub.m, n do not need to be stored
explicitly, the storage requirements are reduced by 66%. Note that
the value of Z can be stored at the same location as SUM, so that
COUNT, which in most cases is a single byte array, is the only
temporary storage required.
[0016] The method 100, where for each sample point a single grid
location (m, n) is computed, is likely to leave unassigned values
of Z.sub.m,n, the filling of which will be discussed later.
[0017] An alternative exemplary embodiment of the method 100
extends further by allowing the assignment of one sample point to
several grid points distributed in m.sub.i.times.n.sub.i patterns
about the sample point, where m.sub.i.times.n.sub.i defines a small
region of influence or footprint for each sample point 106. The
number of operations required by the fast gridding method 100 is
then proportional to P.times.m.sub.i.times.n.sub.i, however,
m.sub.i.times.n.sub.i is for all practical purposes fixed or
selected independently of the number of sampling points, thus
computational load remains O(P). The case of assigning each sample
point to m.sub.i.times.n.sub.i grid points will result in a
smoothed field, equivalent to having passed the gridded data
through an m.sub.i,n.sub.i moving-average filter. Spatial details
will be lost, but the number of unassigned blank spots will have
been reduced.
[0018] In another alternative exemplary embodiment of the method
100, the resolution of the grid can be augmented a priori to
m.sub.i.times.M by n.sub.i.times.N, each sample point assigned a
foot print of m.sub.i.times.n.sub.i and finally the high resolution
grid can be decimated back to M.times.N by simple averaging. At the
cost of more temporary memory usage, the result retains the initial
spatial resolution but still provides some gap filling ability.
Interpolation
[0019] In an alternative exemplary embodiment of the method 100,
other more advanced estimation techniques for Z.sub.m,n can be
incorporated based on the interpolation of surrounding values using
standard interpolation techniques. The following shows how
classical interpolation techniques can be integrated at this stage
110 into the method 100 in such a way that that the requirement for
only one pass is still satisfied. Examples include:
[0020] Inverse Distance Weighting (IDW), which has the advantage
that any sample point very close to a grid point will have much
more influence on its value, thus greatly reducing the footprint
smoothing effect but retaining the gap filling ability. The
resulting grid is very close to what would be obtained by the
traditionally applied IDW technique. The IDW calculation is adapted
as follows:
[0021] WSUM.sub.m, n., cumulates Z.sub.p/D.sub.p,m,n for each
sample assigned to (n, m);
[0022] WCOUNT.sub.m, n, cumulates 1/D.sub.p,m,n for each sample
assigned to (n, m),
[0023] D.sub.p,m,n is the distance between the sampling point p and
the grid point (n, m).
[0024] Final estimates of the gridded values:
Z.sub.m,n=WSUM.sub.m,n/WCOUNT.sub.m,n
For most practical cases the IDW technique is recommended with
(m.sub.i, n.sub.i)=(3, 3).
[0025] Closest estimate, in which case only the point closest to
the grid point is retained in order to assign a value to that grid
point. This results in no smoothing and forces each grid value to
agree exactly with the nearest sample. This technique is the
fastest but, in regions of dense sampling, it is more prone to
aliasing high frequency signals into the grid. Generally the
resulting grid will contain more high frequency energy than with
other techniques. The Closest estimate calculation for each sample
point, p, requires two arrays:
[0026] Z.sub.m,n is assigned Z.sub.p if D.sub.p,m,n<MIN.sub.m,n
for each sample assigned to (n, m);
[0027] MIN.sub.m, n is assigned D.sub.p,m,n if
D.sub.p,m,n<MIN.sub.m,n, where: D.sub.p,m,n is as before,
[0028] MIN.sub.m,n must be initialized to some large value>
{square root over
((m.sub.i.DELTA.x).sup.2+(n.sub.i.DELTA.y).sup.2)}{square root over
((m.sub.i.DELTA.x).sup.2+(n.sub.i.DELTA.y).sup.2)}
[0029] After scanning all the sampling points, Z is obtained
directly.
[0030] Minimum or Maximum estimates are especially useful when
extremes of the sampling points are most relevant. This occurs in
hydrography, for example, where navigation maps must portray the
shallowest depth of water available in an area to assure that ships
do not run aground. This technique purposefully produces a biased
estimate Z , is very fast and is memory efficient. For each sample
point, p, the following simple assignment is:
[0031] if Z.sub.p<Z.sub.m,n and a Minimum estimate is required
then Z.sub.m,n, is assigned Z.sub.p
[0032] if Z.sub.p>Z.sub.m,n and a Maximum estimate is required
then Z.sub.m,n, is assigned Z.sub.p
[0033] (Z.sub.m,n must be initialized to some very large or very
small value, as appropriate)
Pre-Processing
[0034] The objective of the fast gridding method 100 is to produce
a gridded data set preserving as much as possible the information
richness of the original irregular sample data set. The following
problems which arise from this objective can be solved in the
pre-processing stage: [0035] 1. an appropriate resolution for the
grid must be chosen; [0036] 2. points may need to be aggregated
into portions small enough to be processed.
Choosing an Appropriate Grid Resolution
[0037] At the time of sample data aggregation, the data must be
scanned to obtain minimum and maximum X and Y coordinates which
define a bounding rectangle for the grid. If the sampling points
can be assumed to be uniformly distributed, a grid with the same
number of cells as the number of sampling points should preserve
the essential spatial features of the data.
[0038] Thus the resolution can be derived 112 from:
D.sub.x=D.sub.y=Sqrt(((Max(X.sub.p)-Min(X.sub.p))*(Max(Y.sub.p)-Min(Y.su-
b.p)))/P)/K
[0039] And the grid dimensions can be set 112 as:
M=(Max(X.sub.p)-Min(X.sub.p))/D.sub.x
N=(Max(Y.sub.p)-Min(Y.sub.p))/D.sub.y
[0040] By default K=1, but if the data set is particularly clumpy
it then K should be set to a larger value to obtain a finer
resolution. Conversely a coarser resolution grid is obtained by
setting K to a value between 0 and 1.
Point Aggregation
[0041] Although the described gridding method 100 is relatively
memory efficient there are some data sets that are so large (i.e.
P>P.sub.max) that it is not practical to keep the entire
sampling data set or the complete grid in memory at once (because
doing so can result in so much virtual memory page swapping as to
completely degrade performance). In an exemplary embodiment of the
method 100, where the number of points is exceedingly large, the
fast gridding method 100 pre-aggregates sample points into coarse
tiles 114. This aggregation is very fast and requires O(P)
operations to simply assign the samples to a rectangular
sub-domain. Often the input data is already organized in such
fashion either exactly or approximately.
[0042] After performing point aggregation, gridding can take place
tile by tile. In the case where the region of influence is greater
than one cell, i.e. (m.sub.i, n.sub.i) is not equal to (1,1), up to
nine tiles of the gridded data set must be kept in memory at once
to ensure seamless transition across tile boundaries. This still
greatly reduces the amount of memory required, and relieves
scalability issues with respect to the size of the sampling data
sets.
[0043] To achieve further memory economy, the fast gridding method
100 can store real variables in the form of more compact integers
with fixed point representation. The calculations described above
are performed with floating point arithmetic but the result can be
stored in fixed point representation. As gridding progresses
through the tiled data set, any tile which is completed is
immediately converted to fixed point representation and transcribed
to permanent storage. Thus at most nine tiles need to kept in
floating point representation at any one time.
Post-Processing
[0044] The nature of irregular data usually results in gaps in the
surface grid that is generated. Efficiencies in filling these gaps
are automatically accommodated by the fast gridding method 100.
[0045] 1. Small gaps in the grid resulting from regions of low
sampling density are filled 116; [0046] 2. Larger gaps in the grid
can be filled where no data points exist 118; [0047] 3. Finally,
the accuracy of the resulting gridded surface in reproducing the
sample data surface must be assessed by calculating the normalized
deviation 120.
Gap Filling
[0048] The fast gridding method 100 optionally processes gaps in
two passes:
[0049] In the first pass, any empty grid cell is assigned a value
equal to the average of m.sub.j.times.n.sub.j non-empty surrounding
cells. This will fill small holes 116 with interpolated, smoothed
values and in cases where the sample data points are relatively
uniformly distributed will result in a complete grid. In other
cases large empty areas may remain.
[0050] In the second pass larger gaps can be filled in one of
several ways 118: [0051] 1. assign an arbitrary distinctive value
to unassigned regions as an indication that no data is present;
[0052] 2. compute the average value of the cells bounding the
unassigned region and assign this value to the empty cells, this is
appropriate for example to represent lakes; [0053] 3. use the IDW
technique, or other interpolation techniques, to interpolate the
interior field from the values of cells bounding the unassigned
region, this produces a smooth surface that will not over or
undershoot the bounding region.
[0054] Beyond these automatic gap filling techniques, other
modeling techniques can be used to fill larger gaps in the post
processing phase.
Conclusion
[0055] The ability of the fast gridding method 100 to efficiently
produce regular gridded data sets from irregularly distributed
sampling points has been described. The method 100 imposes an O(P)
computational load which is a vast improvement over more
conventional methods and affords orders of magnitude improvement in
speed.
[0056] Although the method 100 has been described in the context of
2D geographical data, it is equally applicable in 3D such as for
medical imaging or in N dimensional problems such as may be
encountered in various domains of science. Similarly, the dependent
variable Z can be a real number scalar, an integer, a vector or any
other combination of attributes.
[0057] The efficient and scalable gridding of irregular data is but
one step in the larger library of algorithms available for the
modeling and analysis of huge regular data sets.
* * * * *