U.S. patent application number 10/357290 was filed with the patent office on 2003-09-11 for system and method for mining data.
Invention is credited to Fairweather, John.
Application Number | 20030172053 10/357290 |
Document ID | / |
Family ID | 27663215 |
Filed Date | 2003-09-11 |
United States Patent
Application |
20030172053 |
Kind Code |
A1 |
Fairweather, John |
September 11, 2003 |
System and method for mining data
Abstract
A system and method for extracting data, hereinafter referred to
as MitoMine.TM., that produces a strongly-typed ontology defined
collection referencing (and cross referencing) all extracted
records. The input to the mining process can be any data source,
such as a text file delimited into a set of possibly dissimilar
records. Mitomine contains parser routines and post-processing
functions, known as `munchers`. The parser routines can be accessed
either via a batch mining process or as part of a running server
process connected to a live source. Munchers can be registered on a
per data-source basis in order to process the records produced,
possibly writing them to an external database and/or a set of
servers. The present invention also embeds an interpreted ontology
based language within a compiler/interpreter (for the source
format) such that the statements of the embedded language are
executed as a result of the source compiler `recognizing` a given
construct within the source and extracting the corresponding source
content. In this way, the execution of the statements in the
embedded program will occur in a sequence that is dictated wholly
by the source content. This system and method therefore make it
possible to bulk extract free-form data from such sources as
CD-ROMs, the web etc. and have the resultant structured data loaded
into an ontology based system.
Inventors: |
Fairweather, John; (Santa
Monica, CA) |
Correspondence
Address: |
Kendall I. Thiessen
Gibson, Dunn & Crutcher LLP
Suite 4100
1801 California Street
Denver
CO
80202
US
|
Family ID: |
27663215 |
Appl. No.: |
10/357290 |
Filed: |
February 3, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60353487 |
Feb 1, 2002 |
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.001 |
Current CPC
Class: |
G06F 8/427 20130101;
Y10S 707/99931 20130101; Y10S 707/99942 20130101; Y10S 707/99933
20130101; Y10S 707/913 20130101; G06F 9/4493 20180201; Y10S 707/966
20130101; G06K 13/0825 20130101 |
Class at
Publication: |
707/1 |
International
Class: |
G06F 017/30; G06F
007/00 |
Claims
1) A system for the extraction of data from a variety of sources
into a single unifying ontology, comprising: a) an ontology based
environment, such environment including an ontology description
language (ODL) and a run-time accessible types system; b) logically
connected thereto, an extensible, parsing environment, wherein such
parsing environment supports customized reverse-polish plug-in
operators; c) logically connected thereto, a configurable outer
parser capable of accepting a BNF (or equivalent) specification
describing the source data format; d) an embedded inner parser
capable of executing statements and performing actions directly on
the objects and types described by the system ontology.
2) The system of claim 1, further comprising a memory system
logically connected thereto, for storing and managing persistent
data being processed by the system.
3) The system of claim 1, wherein the ontology-based environment is
the ontology environment described in the Ontology Patent.
4) The system of claim 1, wherein the run-time accessible types
system is the run-time accessible types system described in the
Types Patent.
5) The system of claim 1, wherein the parsing environment is the
parsing environment described in the Parse Patent.
6) The system of claim 2, wherein the memory system uses the memory
system and model described in the Memory Patent.
7) The system of claim 1, wherein the outer parser is capable of
accepting a BNF specification containing specifications for
embedded plug-ins.
8) The system of claim 7, wherein the outer parser is capable of
accepting a BNF specification containing specification(s) for
embedded plug-ins that are passed textual strings consisting of
interpretable source statements.
9) The system of claim 7, wherein the embedded inner parser is
capable of being invoked by one or more the plug-in(s) in the outer
parser, wherein the specification for such plug-ins is stored in
the BNF specification.
10) The system of claim 1, further comprising a line processor
function, wherein such function permits pre-examination and
alteration of the lines appearing in the source data before such
data is processed by the parser.
11) The system of claim 2, further comprising one or more
post-processor functions (munchers), wherein such munchers are
registered on the system and are able to perform arbitrary
operations on a collection of extracted ontological records prior
to their instantiation into persistent storage by the memory
system.
12) The system of claim 1, wherein both parsers are logically
connected to a common environment, wherein such comment environment
contains a set of assignable and readable registers whose type
adapts automatically to any data item assigned to such registers
and which registers are persistent in the common enviroment so that
they may be used to store the current state of the parsers.
13) The system of claim 1, wherein system can be invoked to mine
data from a given source based on actions in a user interface.
14) The system of claim 1, wherein the system can be registered
with a running server process connected to a data source in order
to allow that server to extract ontological information from the
data source on a continuous un-attended basis.
15) A method for the extracting data from a variety of sources into
a single unifying ontology, comprising the steps of: a) receiving
source data; b) parsing the source format with an outer parser,
wherein such outer parser includes an embedded parser for an
interpreted ontology descriptions language (ODL); c) parsing the
source data with the outer parser and embedded parser using the
parsed source format; d) passing statements in an embedded language
to the embedded parser; e) responsive to one or more actions by the
outer parser, executing one or more statements in the embedded
language.
16) The method of claim 15, wherein the step of parsing includes
the step of receiving a BNF specification.
17) The method of claim 16, wherein the step of parsing includes
the step of receiving a BNF specification that includes
specifications for embedded plug-ins.
18) The method of claim 17, further comprising the step of passing
one or more textual strings to the embedded plug-ins specified in
the received BNF specification.
19) The method of claim 18, wherein the step of passing one or more
textual strings to the embedded plug-ins includes passing
interpretable ODL statements.
20) The method of claim 18, further comprising the step of
executing one or more statements ODL statements.
21) The method of claim 20, wherein the step of executing one or
more statements comprises ordering the execution of such statements
based on the progress of the step of parsing.
22) The method of claim 15, further comprising the step of altering
the source data before such data is parsed by the parsers.
23) The method of claim 15, further comprising the step of creating
a collection of ontological records.
24) The method of claim 23, further comprising the step of
instantiating the collection of ontological records created by the
method into persistent storage.
25) The method of claim 24, further comprising the step of
registering one or more munchers on the system, wherein such
munchers are able to perform arbitrary operations on the collection
of ontological records prior to their instantiation into persistent
storage
26) The method of claim 15, further comprising the step of
registering the method with a server process connected to a data
source, such that the server process is able to extract ontological
information from the data source on a continuous un-attended basis.
Description
BACKGROUND OF THE INVENTION
[0001] The data ingestion and conversion process is generally known
as data mining, and the creation of robust systems to handle this
problem is the subject of much research, and has spawned the
creation of many specialized languages (e.g., Perl) intended to
make this process easier. Unfortunately, while there have been some
advances, the truth of the matter is that none of these `mining`
languages really provides anything more than a string manipulation
library embedded into the language syntax itself. In other words,
such languages are nothing more than shorthand for the equivalent
operations written as a series of calls to a powerful subroutine
library. A prerequisite for any complex data processing
application, specifically a system capable of processing and
analyzing disparate data sources, is a system that can convert the
structured, semi-structured, and un-structured information sources
into their equivalent representation in the target ontology,
thereby unifying all sources and allowing cross-source
analysis.
[0002] For example, in a current generation data-extraction script,
the code involved in the extraction basically works its way through
the text from beginning to end trying to recognize delimiting
tokens and once having done so to extract any text within the
delimiters and then assign it to the output data structure. When
there is a one-to-one match between source data and target
representation, this is a simple and effective strategy. As we
widen the gap between the two, however, such as by introducing
multiple inconsistent sources, increasing the complexity of the
source, nesting information in the source to multiple levels, cross
referencing arbitrarily to other items within the source, and
distributing and interspersing the information necessary to
determine an output item within a source, the situation rapidly
becomes completely unmanageable by this technique, and highly
vulnerable to the slightest change in source format or target data
model. This mismatch is at the heart of all problems involving the
need for multiple different systems to intercommunicate meaningful
information, and makes conventional attempts to mine such
information prohibitively expensive to create and maintain.
Unfortunately for conventional mining techniques, much of the most
valuable information that might be used to create truly intelligent
systems comes from publishers of various types. Publishing houses
make their money from the information that they aggregate, and thus
are not in the least bit interested in making such information
available in a form that is susceptible to standard data mining
techniques. Furthermore, most publishers deliberately introduce
inconsistencies and errors into their data in order both to detect
intellectual property rights violations by others, and to make
automated extraction as difficult as possible. Each publisher, and
indeed each title from any given publisher, uses different formats,
and has an arrangement that is custom tailored to the needs of
whatever the publication is. The result is that we are faced with a
variety of source formats on CD-ROMs, databases, web sites, and
other legacy systems that completely stymie standard techniques for
acquisition and integration. Very few truly useful sources are
available in a nice neat tagged form such as XML and thus to rely
on markup languages such as XML to aid in data extraction is a
woefully inadequate approach in real-world situations.
[0003] One of the basic problems that makes the extraction process
difficult is that the control-flow based program that is doing the
extraction has no connection to the data itself (which is simply
input) and must therefore invest huge amounts of effort extracting
and keeping track of its `state` in order to know what it should do
with information at any given time. What is needed, then, is a
system in which the content of the data itself actually determines
the order of execution of statements in the mining language and
automatically keeps track of the current state. In such a system,
whenever an action was required of the extraction code, the data
would `tell` it to take that action, and all of the complexity
would melt away. Assuming such a system is further tied to a target
system ontology, the mining problem would become quite simple.
Ideally, such a solution would tie the mining process to compiler
theory, since that is most powerful formalized framework available
for mapping source textual content into defined actions and state
in a rigorous and extensible manner. It would also be desirable to
have an interpreted language that is tied to the target ontology
(totally different from the source format), and for which the order
of statement execution could be driven by source data content
SUMMARY OF INVENTION
[0004] The system of this invention takes the data mining process
to a whole new level of power and versatility by recognizing that,
at the core of our past failings in this area, lies the fact that
conventional control-flow based programming languages are simply
not suited to the desired system, and must be replaced at the
fundamental level a more flexible approach to software system
generation. There are two important characteristics of the present
invention that help create this paradigm shift. The first is that,
in the preferred embodiment, the system of the present invention
includes a system ontology such that the types and fields of the
ontology can be directly manipulated and assigned within the
language without the need for explicit declarations. For example,
to assign a value to a field called "notes.sourceNotes" of a type,
the present invention would only require the statement
"notes.sourceNotes=". An ontology is an explicit formal
specification of how to represent the objects, concepts and other
entities that are assumed to exist in some area of interest and the
relationships that hold among them. The second, and one of the most
fundamental characteristics, is that the present invention gives up
on the idea of a control-flow based programming language (i.e., one
where the order of execution of statements is determined by the
order of those statements within the program) in order to
dramatically simplify the extraction of data from a source. In
other words, the present invention represents a radical departure
from all existing "control" notions in programming.
[0005] The present invention, hereinafter referred to as
MitoMine.TM., is a generic data extraction capability that produces
a strongly-typed ontology defined collection referencing (and cross
referencing) all extracted records. The input to the mining process
tends to be some form of text file delimited into a set of possibly
dissimilar records. Mitomine contains parser routines and post
processing functions, known as `munchers`. The parser routines can
be accessed either via a batch mining process or as part of a
running server process connected to a live source. Munchers can be
registered on a per data-source basis in order to process the
records produced, possibly writing them to an external database
and/or a set of servers. The present invention embeds an
interpreted ontology based language within a compiler/interpreter
(for the source format) such that the statements of the embedded
language are executed as a result of the source compiler
`recognizing` a given construct within the source and extracting
the corresponding source content. In this way, the execution of the
statements in the embedded program will occur in a sequence that is
dictated wholly by the source content. This system and method
therefore make it possible to bulk extract free-form data from such
sources as CD-ROMs, the web etc. and have the resultant structured
data loaded into an ontology based system.
[0006] In the preferred embodiment, a MitoMine.TM. parser is
defined using three basic types of information:
[0007] 1) A named source-specific lexical analyzer
specification
[0008] 2) A named BNF specification for parsing the source
[0009] 3) A set of predefined plug-in functions capable of
interpreting the source information via C** statements.
[0010] Other improvements and extentions to this system will be
defined herein.
BRIEF DESCRIPTION OF THE FIGURES
[0011] [NONE]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0012] The present invention is built upon this and, in the
preferred embodiment, uses a number of other key technologies and
concepts. For example, these following patent applications (which
are expressly incorporated herein) disclose all the components
necessary to build up a system capable of auto-generating all user
interface, storage tables, and querying behaviors required in order
to create a system directly from the specifications given in an
ontology description language (ODL). These various building-block
technologies have been previously described in the following patent
applications:
[0013] 1) Appendix 1--Memory Patent
[0014] 2) Appendix 2--Lexical Patent
[0015] 3) Appendix 3--Parser Patent
[0016] 4) Appendix 4--Types Patent
[0017] 5) Appendix 5--Collections Patent
[0018] 6) Appendix 6--Ontology Patent
[0019] In the Parser Patent, a system was described that permits
execution of the statements in the embedded program in a sequence
that is dictated wholly by the source content, in that the `reverse
polish` operators within that system are executed as the source
parse reaches an appropriate state and, as further described in
that patent, these operators are passed a plug-in hint string when
invoked. In the preferred embodiment, the plug-in hint string will
be the source for the interpreted ontology-based language and the
plug-ins themselves will invoke an inner level parser in order to
execute these statements. The Ontology Patent introduced an
ontology based language that is an extension of the C language
known as C*. This is the preferred ontology based language for the
present invention. We will refer to the embedded form of this
language as C**, the extra `*` symbol being intended to imply the
additional level of indirection created by embedding the language
within a source format interpreter. The output of a mining process
will be a set of ontology defined types (see Types Patent) within a
flat data-model collection (see Memory Patent and Collection
Patent) suitable for instantiation to persistent storage and
subsequent query and access via the ontology (see patent reference
6).
[0020] In the preferred embodiment, a MitoMine.TM. parser is
defined using three basic types of information:
[0021] 1) A named source-specific lexical analyzer
specification
[0022] 2) A named BNF specification for parsing the source
[0023] 3) A set of predefined plug-in functions capable of
interpreting the source information via C** statements.
[0024] The BNF format may be based upon any number of different BNF
specifications. MitoMine.TM. provides the following additional
built-in parser plug-ins which greatly facilitate the process of
extracting unstructured data into run-time type manager
records:
<@1:1>
<@1:2>
[0025] These two plug-ins delimit the start and end of an arbitrary
possibly multi-lined string to be assigned to the field designated
by the following call to <@1:5:fieldPath=$>. This is the
method used to extract large arbitrary text fields. The token
sequence for these plug-ins is always of the form
<@1:1><1:String><@1:2>, that is any text
occurring after the appearance of the <@1:1> plug-in on the
top of the parsing stack will be converted into a single string
token (token #1) which will be assigned on the next <@1:5>
plug-in. The arbitrary text will be terminated by the occurrence of
any terminal in the language (defined in the LEX specification)
whose value is above 128. Thus the following snippet of BNF will
cause the field `pubName` to be assigned whatever text occurs
between the token <PUBLICATION> and <VOLUME/ISSUE> in
the input file:
<PUBLICATION><@1:1><1:String><@1:2><@1:5:pubNam-
e=$>
<VOLUME/ISSUE><3:Declnt><@1:5:volume=$>
[0026] In the preferred embodiment, when extracting these arbitrary
text fields, all trailing and leading white space is removed from
the string before assignment, and all occurrences of LINE_FEED are
removed to yield a valid text string. The fact that tokens below
128 will not terminate the arbitrary text sequence is important in
certain situations where a particular string is a terminal in the
language and yet might also occur within such a text sequence where
it should not be considered to have any special significance. All
such tokens can be assigned token numbers below 128 in the LEX
specification thus ensuring that no confusion arises. The
occurrence of another <@1:1> or a <@1:4> plug-in causes
any previous <1:String> text accumulated to be discarded. A
<@1:5> causes execution of a C** statements that generally
cause extracted information to be assigned to the specified field
and then clears the record of the accumulation. If a plug-in hint
consisting of a decimal number follows the <@1:1> as in
<@1:1:4> that number specifies the maximum number of lines of
input that will be consumed by the plug-in (four in this example).
This is a useful means to handle input where the line number or
count is significant.
<@1:3>
[0027] In the preferred embodiment, the occurrence of this plug-in
indicates that the extraction of a particular record initiated by
the <@1:4> plug-in is complete and should be added to the
collection of records extracted.
<@1:4:typeName>
[0028] In the preferred embodiment, the occurrence of the plug-in
above indicates that the extraction of a new record of the type
specified by the `typeName` string is to begin. The "typename" will
preferably match a known type manager type either defined elsewhere
or within the additionally type definitions supplied as part of the
parser specification.
<@1:5:C** assignment(s)>
[0029] In the preferred embodiment, the plug-in above is used to
assign values to either a field or a register. Within the assigned
expression, the previously extracted field value may be referred to
as `$`. Fields may be expressed as a path to sub-fields of the
structure to any depth using normal type manager path notation
(same as for C). As an example, the field specifier
"description[$aa].u.equip.specifications" refers to a field within
the parent structure that is within an array of unions. The symbol
`$aa` is a register designator. There are 26*26 registers `$aa` to
`$zz` which may be used to hold the results of calculations
necessary to compute field values. A single character register
designator may also be used instead thus `$a` is the same as `$aa`,
`$b` is the same as `$ba` etc. Register names may optionally be
followed by a text string (no spaces) in order to improve
readability (as in $aa:myIndex) but this text string is ignored by
the C** interpreter. The use of registers to store extracted
information and context is key to handling the distributed nature
of information in published sources. In the example above, `$a` is
being used as an index into the array of `description` fields. To
increment this index a "<@1:5:$a=$a+1>" plug-in call would be
inserted in the appropriate part of the BNF (presumably after
extraction of an entire `description` element). All registers are
initially set to zero (integer) when the parse begins, thereafter
their value is entirely determined by the <@1:5> plug-ins
that occur during the extraction process. If a register is assigned
a real or string value, it adopts that type automatically until a
value of another type is assigned to it. Expressions may include
calls to functions (of the form $FuncName), which provide a
convenient means of processing the inputs extracted into certain
data types for assignment. These functions provide capabilities
comparable to the string processing libraries commonly found with
older generation data mining capabilities.
[0030] When assigning values to fields, the <@1:5> plug-in
performs intelligent type conversions, for example:
[0031] 1) If the token is a <1:String> and the field is a
`charHdI`, a handle is created and assigned to the field. Similarly
for a `charPtr`. If the field is a fixed length character array,
the string is copied into it. If it won't fit, a bounds error is
flagged. If the field is already non-empty (regardless of type)
then the <@1:5> plugin appends any new text to the end of the
field value (if possible). Note that registers do not append
automatically unless you use the syntax $a=$a+"string".
[0032] 2) If the field is numeric, appropriate type conversions
from the extracted value occur. Range checking could be automatic.
Multiple assignments may be separated by semi-colons. The full
syntax supported within the `assignment` string is defined by the
system BNF language "MitoMine" (described below).
[0033] Note that because the order of commutative operator (e.g.,
"+") evaluation is guaranteed to be left-to-right, multiple
non-parenthesized string concatenation operations can be safely
expressed as a single statement as in:
fieldname="Hello"+$FirstCapOnly($a)+"do you
like"+$b+".backslash.n"
[0034] The <@1:5> plug-in may also be used to support limited
conditional statements which may be performed using the `if` and
`ifelse` keywords. The effect of the `if` is to conditionally skip
the next element of the production that immediately follows the
<@1:5> containing the `if` (there can be only one statement
within an `if` or `ifelse` block). For example:
<@1:5:if(1==0)><@1:4:typeName>
[0035] would cause the <@1:4> plug-in to be discarded without
interpretation. Similarly:
<@1:5:ifelse(1==0)><@1:4:typeName1><@1:4:typeName2>
[0036] causes execution of the second of the two <@1:4>
plug-ins while:
<@1:5:ifelse(0=
=)><@1:5:$a=$a+1;$b=1><@1:5:$a=$a-1;$b=0>- ;
[0037] causes execution of the first block to increment $a and
assign $b to 1.
[0038] More significantly, since it is possible to discard any
element from the production in this manner, the prudent use of
conditional <@1:5> evaluation can be used to modify the
recognized syntax of the language. Consider the following
production:
myProduction::=<@1:5:ifelse($a>=0)>positive_prod
negative_prod
[0039] In this example, the contents of register `$a` is
determining which of two possible productions will get evaluated
next. This can be a very powerful tool for solving non-context-free
language ambiguities (normally intractable to this kind of parser)
by remembering the context in one of the registers and then
resolving the problem later when it occurs. The results of misusing
this capability can be very confusing and the reader is referred to
the incorporated materials of the Parser Patent for additional
details. That having been said, the following simplified guidelines
should help to ensure correctness:
[0040] For any production of the form:
prod::=<@1:5:ifelse (expression)> thenClause elseClause
[0041] Ensure:
[0042] 1) FIRST(thenClause)==FIRST(elseClause)
[0043] 2) Either both thenClause and elseClause are NULLABLE, or
neither is
[0044] 3) If elseClause is not NULLABLE, and if necessary
(depending on other occurences of thenClause),
[0045] include a production elsewhere {that may never be executed}
to ensure that FOLLOW(thenClause) includes FOLLOW(elseClause)
[0046] For any production of the form:
prod::=prevElement<@1:5:if(expression)>thenClause
nextElement
[0047] Ensure that if thenClause is not NULLABLE, and if necessary
(depending on other occurences of nextElement), include a
production elsewhere {that may never be executed} to ensure that
FIRST(nextElement) is entirely contained within
FOLLOW(prevElement).
[0048] Note that all plug-ins may contain multiple lines of text by
use of the <cont> symbol (see Parser patent). This may be
required in the case where a <@1:5> statement exceeds the
space available on a single line (e.g., many parameters to a
function). The maximum size of any given plug-in text in the
preferred embodiment is 8 KB.
[0049] The present invention also permits the specification of the
language specific parser to include any user dialogs and warnings
that might be required for the parser concerned, any additional
type definitions that might be required as part of parser
operation, and any custom annotations and scripts (see Collections
Patent) that might be necessary.
[0050] Within the <@1:5> plug-in, in addition to supporting
conditionals, additive, multiplicative and assignment operators,
this package preferably provides a number of built-in functions
that may be useful in manipulating extracted values in order to
convert them to a form suitable for assignment to typed fields.
These functions are loosely equivalent to the string processing
library of conventional mining languages. Function handlers may be
registered (via a registry API--see Parser Patent for further
details) to provide additional built in functions. In the built-in
function descriptions below, for example, the type of a given
parameter is indicated between square brackets. The meaning of
these symbols in this example is as follows:
[0051] [I]--Integer value (64 bit)
[0052] [F]--Floating point value (double)
[0053] [S]--String value
[0054] The following is a partial list of predefined built-in
functions that have been found to be useful in different data
mining situations. New functions may be added to this list and it
is expected that use of the system will often include the step of
adding new functions. In such a case, if a feature is not provided,
it can be implemented and registered as part of any particular
parser definition. On the other hand, none of the features listed
below are required meaning that a much smaller set of functions
could also be used. In the preferred embodiment, however, the
following functions (or ones having similar functionality) would be
available.
[0055] 1) [F] $Date( )
[0056] get current date/time into a date-double
[0057] 2) [F] $StringToDate([S] dateString,[S] calendar)
[0058] convert "dateString" to date/time double, current date if
date string format invalid. The currently supported calendar values
are "G"--Gregorian, "J"--Julian etc. Note that in the Gregorian
calendar you may specify the date string in a wide variety of
formats, in any other calendar it must be in the following format:
"yyyy:mm:dd [hh:mm[:ss] [AM/PM]]"
[0059] 3) [S] $TextAfter([S] srcStr,[S] delimStr)
[0060] Return the string portion after the specified delimiter
sequence. Returns"" if not found.
[0061] 4) [S] $TextBefore([S] srcStr,[S] delimStr)
[0062] Return the string portion before the specified delimiter
sequence. Returns "" if not found.
[0063] 5) [S] $TextBetween([S] srcStr,[S] startStr,[S] endStr)
[0064] Return the string portion between the specified delimiter
sequences. Returns"" if not found.
[0065] 6) [I] $Integer([S] aString)
[0066] Convert the specified string to an integer (decimal or
hex)
[0067] 7) [F] $Real([S] aString)
[0068] Convert the specified string to a real number
[0069] 6) [I] $IntegerWithin([S] aString,[I] n)
[0070] Extract the n'th integer (decimal or hex, n=1 . . . ) within
the specified arbitrary string
[0071] 7) [F] $RealWithin([S] aString,[I] n)
[0072] Extract the n'th real (n=1 . . . ) within the specified
arbitrary string
[0073] 8) [S] $StripMarkup([S] aString)
[0074] Strip any Markup language tags out of a string to yield
plain text.
[0075] 9) [S] $SourceName( )
[0076] Inserts the current value of `languageName`
[0077] 10) [S] $SetPersRefInfo([S] aString)
[0078] This function allows you to append to the contents of the
`stringH` field of a persistent reference field rather than
assigning to the name. The function result is equal to `aString`
but the next assignment made by the parser will be to the `stringH`
sub-field, not the `name` sub-field.
[0079] 11) [S] $FirstCapOnly([S] aString)
[0080] Converts a series of words in upper/lower case such that
each word starts with an upper case character and all subsequent
characters are lower case.
[0081] 12) [S] $TextNotAfter([S] srcStr,[S] delimStr)
[0082] Similar in operation to $TextBefore( ) except if `delimStr`
is not found, the original string is returned un-altered.
[0083] 13) [S] $TextNotBefore([S] srcStr,[S] delimStr)
[0084] Similar in operation to $TextAfter( ) except if `delimStr`
is not found, the original string is returned un-altered.
[0085] 14) [S] $TextNotBetween([S] srcStr,[S] startStr,[S]
endStr)
[0086] Returns what remains after removing the string portion
between the specified delimiter sequences (and the delimiter
sequences themselves). If the sequence is not found, the original
string is returned un-altered.
[0087] 15) [S] $TruncateText([S] srcStr,[I] numChars)
[0088] Truncated the source string to the specified number of
characters.
[0089] 16) [S] $TextBeforeNumber([S] srcStr)
[0090] This function is similar in operation to $TextBefore( ) but
the `delimStr` is taken to be the first numeric digit
encountered.
[0091] 17) [S] $TextWithout([S] srcStr,[S] sequence)
[0092] This function removes all occurrences of the specified
sequence from the source string.
[0093] 18) [S] $WordNumber([S] srcStr,[I] number)
[0094] This function gets the specified word (starting from 1) from
the source string. If `number` is negative, the function counts
backward from the last word in the source string.
[0095] 19) [S] $Ask([S] promptStr)
[0096] This function prompts the user using the specified string
and allows him to enter a textual response which is returned as the
function result.
[0097] 20) [S] $TextWithoutBlock([S] srcStr,[S] startDelim,[S]
endDelim)
[0098] This function removes all occurences of the delimited text
block (including delimiters) from the source string.
[0099] 21) [S] $ReplaceSequence([S] srcStr,[S] sequence,[S]
nuSequence)
[0100] This function replaces all occurences of the target sequence
by the sequence `nuSequence` within the given string.
[0101] 22) [S] $AppendIfNotPresent([S] srcStr,[S] endDelim)
[0102] This function determines if `srcStr` ends in `endDelim` and
if not appends `endDelim` to `srcStr` returning the result.
[0103] 23) [S] $ProperNameFilter([S] srcStr,[I] wordMax,[S]
delim)
[0104] This function performs the following processing (in order)
designed to facilitate the removal of extraneous strings of text
from `delim` separated lists of proper names (i.e., capitalized
first letter words):
[0105] a) if the first non-white character in a `delim` bounded
block is not upper case, remove the entire string up to and
including the trailing occurence of `delim` (or end of string).
[0106] b) for any `delim` bounded block, strip off all trailing
words that start with lower case letters.
[0107] c) if more than `wordMax` words beginning with a lower case
letter occur consecutively between two occurrences of `delim`,
terminate the block at the point where the consecutive words
occur.
[0108] 24) [S] $Sprintf([S] formatStr, . . . )
[0109] This function performs a C language sprintf( ) function,
returning the generated string as its result.
[0110] 25) [S] $ShiftChars([S] srcStr,[I] delta)
[0111] This function shifts the character encoding of all elements
of `srcStr` by the amount designated in `delta` returning the
shifted string as a result. This functionality can be useful for
example when converting between upper and lower case.
[0112] 26) [S] $FlipChars([S] srcStr)
[0113] This function reverses the order of all characters in
`srcStr`.
[0114] 27) [S] $ReplaceBlockDelims([S] srcStr,[S] startDelim,[S]
endDelim,[S] nuStartDelim,[S] nuEndDelim,[I] occurence,[I]
reverse)
[0115] This function replaces the start and end delimiters of one
or more delimited blocks of text by the new delimiters specified.
If `occurence` is zero, all blocks found are processed, otherwise
just the block specified (starting from 1). If `reverse` is
non-zero (i.e.,1), this function first locates the ending delimiter
and then works backwards looking for the start delimiter. Often if
the start delimiter is something common like a space character
(e.g., looking for the last word of a sentence), the results of
this may be quite different from those obtained using
`reverse`=0.
[0116] 28) [S] $RemoveIfFollows([S] srcStr,[S] endDelim)
[0117] This function determines if `srcStr` ends in `endDelim` and
if so removes `endDelim` from `srcStr` returning the result.
[0118] 29) [S] $RemoveIfStarts([S] srcStr,[S] startDelim)
[0119] This function determines if `srcStr` starts with
`startDelim` and if so removes `startDelim` from `srcStr` returning
the result.
[0120] 30) [S] $PrependIfNotPresent([S] srcStr,[S] startDelim)
[0121] This function determines if `srcStr` starts with
`startDelim` and if not prepends `startDelim` to `srcStr` returning
the result.
[0122] 31) [S] $NoLowerCaseWords([S] srcStr)
[0123] This function eliminates all words beginning with lower case
letters from `srcStr` returning the result.
[0124] 32) [S] $ReplaceBlocks([S] srcStr,[S] startDelim,[S]
endDelim,[I] occurence,[S] nuSequence)
[0125] This function replaces one or all blocks delimited by the
specified delimiter sequences with the replacement sequence
specified. If `occurrence` is zero, all blocks are replaced,
otherwise the occurrence is a one-based index to the block to
replace.
[0126] 33) [S] $AppendIfNotFollows([S] srcStr,[S] endDelim)
[0127] This function determines if `srcStr` ends in `endDelim` and
if not appends `endDelim` to `srcStr` returning the result.
[0128] 34) [I] $WordCount([S] srcStr)
[0129] This function counts the number of words in the source
string, returning the numeric result.
[0130] 35) [S] $PreserveParagraphs([S] srcStr)
[0131] This function eliminates all line termination characters
(replacing them by spaces) in the source string other than those
that represent paragraph breaks. Source text has often been
formatted to fit into a fixed page width (e.g., 80 characters) and
since we wish the captured text to re-size to fit whatever display
area is used, it is often necessary to eliminate the explicit line
formatting from large chunks of text using this function. A
paragraph is identified by a line termination immediately followed
by a tab or space character (also works with spaces for right
justified scripts), all other explicit line formatting is
eliminated. The resulting string is returned.
[0132] 36) [I] $StringSetIndex([S] srcStr,[I] ignoreCase,[S]
setStr1 . . . [S] setStrN)
[0133] This function compares `srcStr` to each of the elements in
the set of possible match strings supplied, returning the index
(starting from 1) of the match string found, or zero if no match is
found. If `ignoreCase` is non-zero, the comparisons are case
insensitive, otherwise they are exact.
[0134] 37)[S] $IndexStringSet([I] index,[S] setStr1 . . . [S]
setStrN)
[0135] This function selects a specific string from a given set of
strings by index (1-based), returning as a result the selected
string. If the index specified is out of range, an empty string is
returned.
[0136] 38) [S] $ReplaceChars([S] srcStr,[S] char,[S] nuChar)
[0137] This function replaces all occurrences of `char` in the
string by `nuChar` returning the modified string as a result.
[0138] 39) [S] $Sentence([S] srcStr,[I] index)
[0139] This function extracts the designated sentence (indexing
starts from 0) from the string, returning as a result the sentence.
If the index specified is negative, the index counts backwards from
the end (i.e., -1 is the last sentence etc.). A sentence is
identified by any sequence of text terminated by a period.
[0140] 40) [S] $FindHyperlink([S] srcStr,[S] domain, [I] index)
[0141] This function will extract the index'th hyperlink in the
hyperlink domain specified by `domain` that exists in `srcStr` (if
any) and return as a result the extracted hyperlink name. This
technique can be used to recognize known things (e.g., city or
people names) in an arbitrary block of text. If no matching
hyperlink is found, the function result will be an empty
string.
[0142] 41) [S] $AssignRefType([S] aString)
[0143] This function allows you to assign directly to the typeID
sub-field of a persistent reference field rather than assigning to
the name. The function result is equal to `aString` but the next
assignment made by the parser will be to the typeID sub-field
`aString` is assumed to be a valid type name), not the `name`
sub-field.
[0144] 42) [I] $RecordCount( )
[0145] This function returns the number of records created so far
during the current mining process.
[0146] 43) [S] $Exit([S] aReason)
[0147] Calling this function causes the current parsing run to exit
cleanly, possibly displaying a reason for the exit (to the console)
as specified in the `aReason` string (NULL if no reason given).
[0148] 44) [I] $MaxRecords( )
[0149] This function returns the maximum number of records to be
extracted for this run. This value can either be set by calling
$SetMaxRecords( ) or it may be set by external code calling
MN_SetMaxRecords( ).
[0150] 45) [I] $SetMaxRecords([I] max)
[0151] This function sets the maximum number of records to be
extracted for this run. See $MaxRecords( ) for details.
[0152] 46) [I] $FieldSize([S] fieldName)
[0153] This function returns the size in bytes of the field
specified in the currently active type record as set by the
preceeding <@1:4:typeName> operator. Remember that variable
sized string fields (i.e., char @fieldName) and similar will return
a size of sizeof(Ptr), not the size of the string within it.
[0154] 47) [I] $TextContains([S] srcText,[S] subString)
[0155] This function returns 0 if the `srcText` does not contain
`subString`, otherwise it returns the character index within
`srctext` where `subString` starts+1.
[0156] 48) [I] $ZapRegisters([S] minReg,[S] maxReg)
[0157] This function empties the contents of all registers starting
from `minReg` and ending on `maxReg`. The parameters are simply the
string equivalent of the register name (e.g., "$aa"). When
processing multiple records, the use of $ZapRegisters( ) is often
more convenient than explicit register assignments to ensure that
all the desired registers start out empty as record processing
begins. The result is the count of the number of non-empty
registers that were zapped.
[0158] 49) [I] $CRCString([S] srcText)
[0159] This function performs a 32-bit CRC similar to ANSI X3.66 on
the text string supplied, returning the integer CRC result. This is
can be useful when you want to turn an arbitrary (i.e.,
non-alphanumeric) string into a form that is (probably!) unique for
name generating or discriminating purposes.
[0160] Note that parameters to routines may be either constants (of
integer, real or string type), field specifiers referring to fields
within the current record being extracted, registers, $ (the
currently extracted field value), or evaluated expressions which
may include embedded calls to other functions (built-in or
otherwise). This essentially creates a complete programming
language for the extraction of data into typed structures and
collections. The C** programming language provided by the
<@1:5> plug-ins differs from a conventional programming
language in that the order of execution of the statements is
determined by the BNF for the language and the contents of the data
file being parsed. In the preferred embodiment, the MitoMine.TM.
parser is capable of recognizing and evaluating the following token
types:
[0161] 3--DecInt--syntax as for a C strtoul( ) call but ignores
embedded commas.
[0162] 4--Real--real--as for C strtod( )
[0163] 5--Real--real scientific format--as for C strtod( )
[0164] The plug-in 5 MitoMine.TM. parser, in addition to
recognizing registers, $, $function names, and type field
specifications, can also preferably recognize and assign the
following token types:
[0165] 2--character constant (as for C)
[0166] 7--Hex integer (C format)
[0167] 3--decimal integer (as for C strtoul)
[0168] 10--octal integer (as for strtoul)
[0169] 4--real (as for strtod)
[0170] 5--real with exponent (as for strtod)
[0171] 12--string constant (as for C).
[0172] Character constants can be a maximum of 8 characters long,
during input, they are not sign extended. The following custom
parser options would preferably be supported:
[0173] kTraceAssignments (0x00010000)--Produces a trace of all
<@1:5> assignments on the console
[0174] kpLineTrace (0x00020000)--Produces a line trace on the
console
[0175] kTraceTokens (0x00040000)--Produces a trace of each token
recognized
[0176] These options may be specified for a given parser language
by adding the corresponding hex value to the parser options line.
For example, the specification below would set
kTraceAssignments+kpLineTrace options in addition to those
supported by the basic parse package:
=0x30000+kPreserveBNFsymbols+kBeGreedyParser
[0177] The lexical analyzer options line can also be used to
specify additional white-space and delimiter characters to the
lexical analyzer as a comma separated list. For example the
specification below would cause the characters `a` and `b` to be
treated as whitespace (see LX_AddWhiteSpace) and the characters `Y`
and `Z` to be treated as delimiters (see LX_AddDelimiter).
=kNoCaseStates+whitespace(a,b)+delimiter(Y,Z)
[0178] Appendix A provides a sample of the BNF and LEX
specifications that define the syntax of the <@1:5> plug-in
(i.e., C**) within MitoMine.TM. (see Parser Patent for further
details). Note that most of the functionality of C** is already
provided by the predefined plug-in functions (plug-in 0) supplied
by the basic parser package. A sample implementation of the
<@1:5> plug-in one and a sample implementation of a
corresponding resolver function are also provided.
[0179] As described previously, the lexical and BNF specifications
for the outermost parser vary depending on the source being
processed (example given below), however the outer parser also has
a single standard plug-in and resolver. A sample implementation of
the standard plug-in one and a sample implementation of a
corresponding resolver function are also provided in Appendix
A.
[0180] The listing below gives the API interface to the
MitoMine.TM. capability for the preferred embodiment although other
forms are obviously possible. Appendix A provides the sample pseudo
code for the API interface.
[0181] In the preferred embodiment, a function, hereinafter called
MN_MakeParser( ), initializes an instance of the MitoMine.TM. and
returns a handle to the parser database which is required by all
subsequent calls. A `parserType` parameter could be provided to
select a particular parsing language to be loaded (see PS_LoadBNF)
and used.
[0182] In the preferred embodiment, a function, hereinafter called
MN_SetRecordAdder( ) determines how (or if) records once parsed are
added to the collection. The default record adder creates a set of
named lists where each list is named after the record type it
contains.
[0183] In the preferred embodiment, a function, hereinafter called
MN_SetMineFunc( ), sets the custom mine function handler for a
MitoMine.TM. parser. Additional functions could also be defined
over and above those provided by MitoMine.TM. within the <@1:5:
. . . > plugin context. A sample mine function handler
follows:
1 static Boolean myFunc ( // custom function handler ET_ParseHdl
aParseDB, //IO:handle to parser DB int32 aContextID //I:context
{cube root} ) // R:TRUE for success p = (myContextPtr)aContextID;
// get our context pointer opCount = PS_GetOpCount(aParseDB,TOP);
// get # of operands tokp = PS_GetToken(aParseDB,opCount); // get
fn name for ( i = 0 ; i < opCount ; i++ ) if (
?PS_EvalIdent(aParseDB, i) ) // eval all elements on stack { res =
NO; goto BadExit; } if ( ?US_strcmp(tokp, "$myFuncName") ) //
function name { -- check operand count and type -- implement
function -- set resulting value into stack 'opCount' e.g.: .sup.
PS_SetiValue(aParseDB,opCount,result); else if ( ?US_strcmp(tokp,
"$another function") )
[0184] In the preferred embodiment, a function, hereinafter called
MN_SetMaxRecords( ), sets the maximum number of records to be mined
for a MitoMine.TM. parser. This is the number returned by the
built-in function $GetMaxRecords( ). If the maximum number of
records is not set (i.e., is zero), all records are mined until the
input file(s) is exhausted.
[0185] In the preferred embodiment, a function, hereinafter called
MN_SetMineLineFn( ), sets the MitoMine.TM. line processing function
for a given MitoMine.TM. parser. A typical line processing function
might appear as follows:
2 static void myLineFn .sup. ( // Built-in debugging mine-line fn
ET_ParseHdl .sup. aParseDB, // I:Parser DB .sup. int32 aContextID,
// I:Context .sup. int32 lineNum, // I:Current line number .sup.
charPtr lineBuff, // IO:Current line buffer .sup. charPtr
aMineLineParam // I:String parameter to function ) // R:void
[0186] These functions can be used to perform all kinds of
different useful functions such as altering the input stream before
the parse sees it, adjusting parser debugging settings, etc. The
`aMineLineParam` parameter above is an arbitrary string and can be
formatted any way you wish in order to transfer the necessary
information to the line processing function. The current value of
this parameter is set using MN_SetMineLineParam( ).
[0187] In the preferred embodiment, a function, hereinafter called
MN_SetMineLineParam( ), sets the string parameter to a MitoMine.TM.
line processing function.
[0188] In the preferred embodiment, two functions, hereinafter
calleds MN_SetParseTypeDB( ) and MN_GetParseTypeDB( ), can be used
to associate a type DB (probably obtained using
MN_GetMineLanguageTypeDB) with a MitoMine.TM. parser. This is
preferable so that the plug-ins associated with the extraction
process can determine type information for the structures unique to
the language. In the preferred embodiment, the function
MN_GetParseTypeDB( ) would return the current setting of the parser
type DB.
[0189] In the preferred embodiment, a function, hereinafter called
MN_SetFilePath( ), sets the current file path associated with a
MitoMine.TM. parser.
[0190] In the preferred embodiment, a function, hereinafter called
MN_GetFilePath( ), gets the current file path associated with a
MitoMine.TM. parser.
[0191] In the preferred embodiment, a function, hereinafter called
MN_SetCustomContext( ), may be used to get the custom context value
associated with a given MitoMine.TM. parser. Because MitoMine.TM.
itself uses the parser context (see PS_SetContextID), it provides
this alternative API to allow custom context to be associated with
a parser.
[0192] In the preferred embodiment, a function, hereinafter called
MN_GetCustomContext( ), may be used to get the custom context value
associated with a given MitoMine.TM. parser. Because MitoMine.TM.
itself uses the parser context (see PS_SetContextID), it provides
this alternative API to allow custom context to be associated with
a parser.
[0193] In the preferred embodiment, a function, hereinafter called
MN_GetParseCollection( ), returns the collection object associated
with a parser. MN_SetParseCollection( ) allows this value to be
altered. By calling MN_SetParseCollection( . . . ,NULL) it is
possible to detach a collection from the parser in cases where you
wish the collection to survive the parser teardown process.
[0194] In the preferred embodiment, a function, hereinafter called
MN_SetParseCollection( ), returns the collection object associated
with a parser. MN_SetParseCollection( ) allows this value to be
altered. By calling MN_SetParseCollection( . . . ,NULL) it is
possible to detach a collection from the parser. This would be
useful in cases where it is preferable to permit the collection to
survive the parser teardown process.
[0195] In the preferred embodiment, a function, hereinafter called
MN_GetMineLanguageTypeDB( ), returns a typeDB handle to the type DB
describing the structures utilized by the specified mine language.
If the specified typeDB already exists, it is simply returned,
otherwise a new type DB is created by loading the type definitions
from the designated MitoMine.TM. type specification file.
[0196] In the preferred embodiment, a function, hereinafter called
MN_KillParser( ), disposes of the Parser database created by
MN_MakeParser( ). A matching call to MN_KillParser( ) must exist
for every call to MN_MakeParser( ). This call would also invoke
MN_CleanupRecords( ) for the associated collection.
[0197] In the preferred embodiment, a function, hereinafter called
MN_Parse( ), invokes the MitoMine.TM. parser to process the
designated file. The function is passed a parser database created
by a call to MN_MakeParser( ). When all calls to MN_Parse( ) are
complete, the parser database must be disposed using MN_KillParser(
).
[0198] In the preferred embodiment, a function, hereinafter called
MN_RunMitoMine( ), creates the selected MitoMine.TM. parser on the
contents of a string handle. An parameter could also be passed to
the NMN_MakeParser( ) call and can thus be used to specify various
debugging options.
[0199] In the preferred embodiment, a function, hereinafter called
MN_CleanupRecords( ), cleans up all memory associated with the set
of data records created by a call to MN_RunMitoMine( ).
[0200] In the preferred embodiment, a function, hereinafter called
MN_RegisterMineMuncher( ), can be used to register by name a
function to be invoked to post process the set of records created
after a successfull MitoMine.TM. run. The name of the registered
Muncher function would preferably match that of the mining language
(see MN_Parse for details). A typical mine-muncher function might
appear as follows:
3 static ET_CollectionHdl myMuncher( // My Mine Muncher function
ET_MineScanRecPtr scanP, // IO:Scanning context record
ET_CollectionHdl theRecords, // I:Collection of parsed records char
typeDBcode, 1/ I:The typeDB code charPtr parserType, // I:The
parser type/language name ET_Offset root, // I:Root element
designator charPtr customString // I:Avail pass cstm strig to
muncher ) // R:The final collection
[0201] The `scanP` parameter is the same `scanP` passed to the file
filter function and can thus be used to communicate between file
filters and the muncher or alternatively to clean up any leftovers
from the file filters within the `muncher`. Custom `muncher`
functions can be used to perform a wide variety of complex tasks,
indeed the MitoMine.TM. approach has been used successfully to
extract binary (non-textual) information from very complex sources,
such as encoded database files, by using this technique.
[0202] In the preferred embodiment, a function, hereinafter called
MN_DeRegisterMineMuncher( ), de-registers a previously registered
mine muncher function.
[0203] In the preferred embodiment, a function, hereinafter called
MN_InvokeMineMuncher( ), invokes the registered `muncher` function
for the records output by a run of MitoMine (see MN_RunMitoMine).
If no function is registered, the records and all associated memory
are simply disposed using MN_CleanupRecords( ).
[0204] In the preferred embodiment, a function, hereinafter called
MN_RegisterFileFilter( ), can be used to register by name a file
filter function to be invoked to process files during a
MitoMine.TM. run. If no file filter is registered, files are
treated as straight text files, otherwise the file must be loaded
and pre/post processed by the file filter. A typical file filter
function might appear as follows:
4 static EngErr myFileFilter ( // Scan files and mine if appropr
.sup. HFileInfo *aCatalogRec, // IO:The catalog search record .sup.
int32Ptr flags, // IO:available for flag use .sup.
ET_MineScanRecPtr scanP // IO:Scanning context record ) // R:zero
for success, else error #
[0205] In the preferred embodiment, a function, hereinafter called
MN_ListFileFilters( ),obtains a string list of all know
MitoMine.TM. file filter functions.
[0206] In order to illustrate how MitoMine.TM. is used to extract
information from a given source and map it into its ontological
equivalent, we will use the example of the ontological definition
of the Country record pulled from the CIA World Fact book. The
extract provided in Appendix B is a portion of the first record of
data for the country Afganistan taken from the 1998 edition of this
CD-ROM. The format of the information in this case appears to be a
variant of SGML, but it is clear that this approach applies equally
to almost any input format. The lexical analyzer and BNF
specification for the parser to extract this source into a sample
ontology are also provided in Appendix B. The BNF necessary to
extract country information into a sample ontology is one of the
most complex scripts thus far encountered in MitoMine.TM.
applications due to the large amount of information that is being
extracted from this source and preserved in the ontology. Because
this script is so complex, it probably best illustrates a less than
ideal data-mining scenario but also demonstrates use of a large
number of different built-in mining functions. Some of the results
of running the extraction script below can be seen in the Ontology
patent relating to auto-generated UI.
[0207] Note that in the BNF provided in Appendix B, a number of
distinct ontological items are created, not just a country. The BNF
starts out by creating a "Publication" record that identifies the
source of the data injested, it also creates a "Government" record,
which is descended from Organization. The Government record is
associated with the country and forms the top level of the
description of the government/organization of that country (of
which the military branches created later are a part). In addition,
other records could be created and associated with the country, for
example the "opt_figure" production is assigning a variety of
information to the `stringH` field of the "mapImage" field that
describes a persistent reference to the file that contains the map
image. When the data produced by this parse is written to
persistent storage, this image file is also copied to the image
server and through the link created, can be recalled and displayed
whenever the country is displayed (as is further demonstrated in
the UI examples of the Ontology Patent). In fact, as a result of
extracting a single country record, perhaps 50-100 records of
different types are created by this script and associated in some
way with the country including government personel, international
organizations, resources, poulation records, images, cities and
ports, neighboring countries, treaties, notes, etc. Thus it is
clear that what was flat, un-related information in the source has
been converted to richly interconnected, highly computable and
usable ontological information after the extraction completes. This
same behavior is repeated for all the diverse sources that are
mined into any given system the information from all such sources
becomes cross-correlated and therefore infinitely more useful that
it was in its separate, isolated form. The power of this approach
over conventional data mining technologies is clear.
[0208] The foregoing description of the preferred embodiments of
the invention has been presented for the purposes of illustration
and description. For example, although described with respect to
the C* programming language, any programming language that includes
the appropriate extentions could be used to implement this
invention. Additionally, the claimed system and method should not
be limited to the particular API disclosed. The descriptions of the
header structures should also not be limited to the embodiments
described. While the sample pseudo code provides examples of the
code that may be used, the plurality of implementations that could
in fact be developed is nearly limitless. For these reasons, this
description is not intended to be exhaustive or to limit the
invention to the precise form disclosed. Many modifications and
variations are possible in light of the above teaching. It is
intended that the scope of the invention be limited not by this
detailed description, but rather by the claims appended hereto.
* * * * *