U.S. patent application number 10/265825 was filed with the patent office on 2003-09-25 for schema-oriented content management system.
Invention is credited to Ernst, Matthias, Gawecki, Andreas, Kummer, Olaf, Wienberg, Axel, Wienberg, Frank.
Application Number | 20030182308 10/265825 |
Document ID | / |
Family ID | 27771875 |
Filed Date | 2003-09-25 |
United States Patent
Application |
20030182308 |
Kind Code |
A1 |
Ernst, Matthias ; et
al. |
September 25, 2003 |
Schema-oriented content management system
Abstract
A schema-oriented content management system, in particular to an
apparatus and a method for storing and accessing data in a content
management system which allows content schema evolution while
maintaining operation based on already stored content data. A data
processing apparatus for storing and accessing data in a content
management system comprises a content object generation arrangement
for generating content objects to store content data, wherein a
content object comprises a plurality of property-value bindings, a
content object storing arrangement for storing generated content
objects, a content schema generation arrangement for generating an
object-oriented content schema to model content data, a content
schema storing arrangement for storing the generated content
schema, and a content object accessing arrangement to access stored
content objects by way of the stored content schema.
Inventors: |
Ernst, Matthias; (Hamburg,
DE) ; Gawecki, Andreas; (Hamburg, DE) ;
Wienberg, Axel; (Hamburg, DE) ; Wienberg, Frank;
(Hamburg, DE) ; Kummer, Olaf; (Halstenbek,
DE) |
Correspondence
Address: |
KENYON & KENYON
ONE BROADWAY
NEW YORK
NY
10004
US
|
Family ID: |
27771875 |
Appl. No.: |
10/265825 |
Filed: |
October 7, 2002 |
Current U.S.
Class: |
1/1 ;
707/999.103 |
Current CPC
Class: |
Y10S 707/99944 20130101;
G06F 16/289 20190101 |
Class at
Publication: |
707/103.00Z |
International
Class: |
G06F 017/00 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 21, 2002 |
EP |
02 006 585.0 |
Claims
What is is claimed is:
1. Method for storing and accessing data in a content management
system, wherein content data is stored in content objects (5, 6, 7)
comprising a plurality of property-value bindings, wherein a
content object (5, 6, 7) can bind properties (2, 3, 4, 9, 16) to
values, wherein each property (2, 3, 4, 9, 16) has a respective
property type and the values are stored according to the property
type corresponding to the respective property (2, 3, 4, 9, 16); an
object-oriented content schema (10, 11) for modelling content data
by defining content object types (1, 8) is generated and stored,
wherein the content object types (1, 8) declare required object
properties (2, 3, 4, 9, 16); and a content object (5, 6, 7) is
accessed by means of the content schema (10, 11).
2. Method according to claim 1, wherein the integrity of the stored
content objects (5, 6, 7) is examined by verifying if the stored
content objects (5, 6, 7) match the content schema (10, 11).
3. Method according to claim 2, wherein upon a detection of an
integrity issue a predetermined issue resolution procedure resolves
the mismatch by adapting the content data, a mismatch report is
generated, and/or the data mismatching the content schema is
presented to a user who is prompted to adapt the data.
4. Method according to claim 2 or 3, wherein the integrity of
content objects (5, 6, 7) is examined by comparing the types and
the properties (2, 3, 4, 9, 16) of stored objects (5, 6, 7) to the
object type definitions (1, 8) of the content schema (10, 11).
5. Method according to any one of claims 2 to 4, wherein upon a
detection of an integrity issue a predetermined issue resolution
procedure is invoked to resolve the schema mismatch by adapting
property values and/or property types of content objects (5, 6, 7)
mismatching the content schema (10, 11).
6. Method according to any one of claims 1 to 5, wherein a property
is a link property linking a content object having the link
property to a target object and/or a property is a link collection
property comprising a set or bag of links to target objects.
7. Method according to any one of claims 1 to 6, wherein the
content schema (10, 11) is imposing constraints on the content
objects (5, 6, 7); and wherein the integrity of the stored content
objects (5, 6, 7) is examined by verifying the constraints imposed
on the stored content objects (5, 6, 7).
8. Method according to claim 7, wherein the constraints imposed on
content objects (5, 6, 7) relate to permitted and/or required
properties of objects (5, 6, 7) and/or to attributes of properties
(2, 3, 4, 9, 16), such as permissible ranges of property values or
types of link property targets.
9. Method according to claim 7 or 8, wherein upon a violation of a
constraint a predetermined issue resolution procedure resolves the
schema violation by adapting the content data violating the
constraint, in particular by adapting, adding, or removing property
values.
10. Method according to any one of claims 1 to 9, wherein in
response to a content schema modification the integrity of stored
objects (5, 6, 7) is examined only for those objects that may be
affected by the schema modification.
11. Method according to claim 10, wherein the integrity is examined
for objects of a modified object type or a subtype of the modified
object type, for objects comprising a modified property, and/or for
objects affected by modifications of constraints relating to
attributes of properties.
12. Method according to any one of claims 1 to 11, wherein a
property is a computed property, and the value of the computed
property is automatically calculated upon access on a content
object whose type declares the computed property.
13. Method according to claim 12, wherein the value of a computed
property of an object is calculated based on a set of properties of
that object, or the value of a computed property is a collection of
objects having link properties which are linking back to a content
object having the computed property, or on a link collection
property of that object and property values of the objects
referenced by the link collection.
14. Method according to any one of claims 1 to 13, wherein resource
versions for modifications of content objects (5, 6, 7) or object
type definitions (1, 8) are recorded, wherein for the purpose of
versioning content objects (5, 6, 7) and object type definitions
(1, 8) are treated uniformly as resources.
15. Method according to claim 14, wherein a label referring to a
set of objects each having a specified version and/or a label
referring to a content schema version comprising an aggregation of
object type definitions (1, 8) each having a specified version are
stored.
16. Method according to claim 14 or 15, wherein versions of content
objects comprising link properties or link collection properties
are stored irrespective of the versions of target objects.
17. Method according to any one of claims 1 to 16, wherein the
content schema (10, 11) and the content objects (5, 6, 7) are
stored in j workspaces (12, 13, 14, 15); and wherein modifications
of the content schema (10, 11) and the content objects (5, 6, 7)
remain local to the respective workspace (12, 13, 14, 15) in which
the modification occurred until explicitly transferred to another
workspace (12, 13, 14, 15).
18. Method according to claim 17, wherein a development workspace
(12) for the modification of the content schema (10, 11), a
production workspace (13) for the generation and modification of
content objects (5, 6, 7) and an integration workspace (15) for the
integration of content schema modification, content object
generation, and content object modification are provided.
19. Method according to claim 17 or 18, wherein different versions
of the same resource from a source and a target workspace are
unified in the target workspace by selecting the latest version for
the resource of one workspace, if the latest version of the other
workspace is preceding the selected version; otherwise a version
conflict is detected and the versions from both workspaces are
marked as conflicting versions.
20. Method according to claim 19, wherein the conflicting versions
for the same resource are merged into a single version; and wherein
the merging is performed individually for every property used in
any of the versions, taking into account the property values of a
suitable determined common predecessor version of all conflicting
versions and the property values of the conflicting versions.
21. Method according to claim 19 or 20, wherein a merge procedure
for selecting one of the conflicting versions as merged version of
the resource for selecting one of the property values of the
conflicting versions as merged property value for the resource, or
for calculating a property value based on the property values of
the conflicting versions is invoked.
22. Method according to any one of claims 1 to 21, wherein a
content object (5, 6, 7) is accessed or presented to a user by
accessing each property (2, 3, 4, 9, 16) of the content object
according to the respective property declaration and/or the object
type definition (1, 8) for the content object (5, 6, 7).
23. Method according to any one of claims 1 to 22, wherein content
objects (5, 6, 7), object types (1, 8) and/or properties (2, 3, 4,
9, 16) are identified by unique invariable identifiers.
24. Method according to claim 23, wherein content objects (5, 6, 7)
and/or content types (1, 8) are represented in a persistent store
that is updated transactionally, in particular in a relational
database or an object database.
25. Method according to any one of claims 1 to 24, wherein event
notifications are distributed upon changes to the content objects
or the schema.
26. Computer program comprising program code means for performing
the method of any one of the claims 1 to 25.
27. Computer program product comprising program code means stored
on a computer readable medium for performing the method of any one
of the claims 1 to 25 when the computer program product is executed
on a computer.
28. Computer readable medium storing a computer program for
performing the method of any one of the claims 1 to 25 when the
computer program is executed on a computer.
29. Data processing apparatus for storing and accessing data in a
content management system, comprising: content object generation
means (20) for generating content objects (5, 6, 7) to store
content data, wherein a content object (5, 6, 7) comprises a
plurality of property value bindings, wherein each property (2, 3,
4, 9, 16) has a respective property type and the values correspond
to the respective property type; content object storing means (21)
for storing generated content objects (5, 6, 7); content schema
generation means (22) for generating an object-oriented content
schema (10, 11) to model content data by defining content object
types (1, 8), wherein a content object type (1, 8) declares
required object properties; content schema storing means (23) for
storing the generated content schema (10, 11); and content object
accessing means (24) to access stored content objects (5, 6, 7) by
means of the stored content schema (10, 11).
30. Data processing apparatus according to claim 29, comprising an
integrity examination means to examine the integrity of the stored
content objects (5, 6, 7) by verifying if the stored content
objects (5, 6, 7) match the stored content schema (10, 11).
31. Data processing apparatus according to claim 30, comprising an
integrity issue resolution means to resolve a detected schema
mismatch by adapting the respective content data.
32. Data processing apparatus according to any one of claims 29 to
3 1, comprising a resource version recording means to record
resource versions for modified content objects (5, 6, 7) or object
type definitions (1, 8), wherein content objects (5, 6, 7) and
object type definitions (1, 8) are handled uniformly as
resources.
33. Data processing apparatus according to any one of claims 29 to
32, comprising a workspace means to store individual versions of
the content schema (10, 11) and the content objects (5, 6, 7) in
workspaces (12, 13, 14, 15), wherein modifications of the content
schema (10, 11) and the content objects (5, 6, 7) remain local to
the respective workspace (12, 13, 14, 15) in which the modification
occurred until explicitly transferred to another workspace (12, 13
14, 15).
34. Data processing apparatus according to claim 33, comprising an
unification and merging means to integrate different versions of
the same resource from a source and a target workspace.
35. Data processing apparatus according to any one of claims 29 to
34, comprising a database means to store content objects (5, 6, 7)
and/or content type definitions (1, 8) persistently.
36. Data processing apparatus according to any one of claims 29 to
35, comprising an input means to input content data to be stored in
content objects (5, 6, 7) and/or comprising an output means to j
output content data by accessing each property (2, 3, 4, 9, 12) of
a stored content object (5, 6, 7) according to the respective
property declaration and/or an object type definition (1, 8) stored
in the content schema storing means (23).
Description
FIELD OF THE INVENTION
[0001] The present invention relates to a schema-oriented content
management system, in particular to an apparatus and a method for
storing and accessing data in a content management system.
BACKGROUND INFORMATION
[0002] The pervasive use of internet technologies for the access of
all sorts of data sources and the increasing size and complexity of
internet systems constitute major challenges for the providers of
information technology infrastructure. The information to be
exchanged must be produced, validated, stored, retrieved, analysed,
formatted, and delivered while observing high availability and
performance requirements.
[0003] As the volume of data increase, it becomes insufficient to
provide automated support only for the delivery of information to
the user, which is often done via standard protocols like HTTP
utilising standard software, such as web servers and web browsers.
The data creation process must be supported in its entirety. E.g.,
for an online magazine it is necessary that the content of the
magazine, which might consist of text documents, pictures, sound
tracks, or video streams, is properly gathered and administered.
Web content management systems (WCMSs) address the desire to
produce larger and more complex web sites more quickly and with
higher quality.
[0004] Large web sites are often developed collaboratively by
several people whose access has to be coordinated and controlled.
WCMSs usually do this by offering exclusive locks on individual
documents and by verifying proper authorisation. Furthermore, it is
necessary to separate content and layout of the web site, since
different people have specialised roles and responsibilities with
respect to the web site development or operation, e.g., text
editor, designer, programmer, and administrator. A WCMS therefore
tries to structure the information so that different roles can work
as independently as possible, e.g., allowing a text editor to focus
on producing text without bothering with layout. The content is not
just meant for access by human users but is also the data on which
import, export, and personalization services operate.
[0005] The actual web site is often generated from a content
database using templates which select and combine the content. For
example, navigation bars are computed from the current position in
the navigation hierarchy, a centre pane receives text articles, and
a side bar features related content.
[0006] Because material published on a web site immediately goes
public, quality assurance is important. To exploit the web's
potential for up-to-date information, publication should be as fast
as possible. On the other hand, published material should adhere to
certain quality standards, at minimum contain no spelling mistakes,
dangling links, or broken HTML.
[0007] In a content management system an explicit content schema
may be used to model the content data to be handled by the WCMS.
However, a content schema is almost impossible to get right on the
first attempt during the development of the web site. Furthermore,
the schema is not totally fixed over time: Organisational or
technical considerations can suggest improvements and extensions to
the content schema. Therefore, changing application requirements
make it necessary to change, the schema even when the web site is
already in production and content data has been accumulated.
[0008] Because the content data itself is a valuable asset, it is
very expensive to throw away existing data and to start the data
collection from scratch after modifying the content schema. In
response to a schema migration, portions of the data already
accumulated may be automatically converted to the new schema but
sometimes human interaction is required to adapt content data to
the new schema. This process is slow, so that inconsistent
intermediate states will need to be managed persistently by the
system. During these inconsistent periods some of the automated
parts of the WCMS will not be fully functional due to the mismatch
between schema and data. This may interrupt the entire web
publishing process and halt the web site delivery operation.
[0009] Data migration strategies which, after a schema
modification, convert the entire existing data to the new content
schema are no solution to the evolving schema development process
which is typical for large web site development. Many people are
involved in this development process, and it is often that
modifications to the schema are applied that may cause conflicts on
the content data. Furthermore, it is likely that some changes to
the content schema or the content data are undone later on in the
process, which is only possible when the data is kept in its
original form as long as possible. Data migration operations may
cause irreparable data loss and, therefore, may prevent the
restoration of the original content data, when applied
automatically on the entire data. In addition, converting the
entire content data of large web sites to a new schema is very
expensive and requires a lot of computational effort. This becomes
even more important when the site development or operation is an
evolution like process requiring frequent releases of content
schema and data.
[0010] Furthermore, inconsistencies within the content schema or
between content and schema might arise out of schema evolution
operations. These inconsistencies need to be detected efficiently
and reliable, in order to maintain the availability and quality
requirements for online publishing.
SUMMARY OF THE INVENTION
[0011] It is an object of the present invention to provide a method
and an apparatus for handling content data in an content management
system, which allows content schema evolution while maintaining
operation based on already stored content data.
[0012] A method and an apparatus for storing and accessing data in
a content management system is provided. Content data is generated
and stored in content objects comprising a plurality of property
value bindings. A content object can bind properties to values. In
general, a content object can use all properties to store content
data. Preferably, some predetermined properties are used to store
certain types of content data in content objects. The values are
stored according to the property type corresponding to the
respective property. Each property has a declared property type
defining the type of information that may be stored in the
corresponding value. Available classes of properties may include
atomic properties such as string and integer properties, media
properties or link properties.
[0013] An object aggregates property bindings, each binding a
property to an appropriate value. Each object may have a declared
type. An object type declares properties and can inherit further
properties from a parent type.
[0014] Storing content data in content objects comprising property
value bindings is a very flexible way of organizing a content
database. Content data, e.g., an article, a picture, a headline, or
an index page may be represented as distinct objects that are
associated in various ways. Automated procedures may be applied to
the content data, which would be very difficult when storing
unstructured data.
[0015] An object oriented content schema for modeling content data
by defining content object types is generated and stored. The
content object j types declare required object properties. The
schema may be an aggregation of meta objects, namely types and
property declarations, interconnected by inheritance and
associations. The schema may be a mutable schema which is modified
to adapt to changing requirements. The meta objects themselves are
again modeled according to an immutable meta schema.
[0016] A content object is (read and/or write) accessed by means of
the content schema. Accessing contents by means of the content
schema allows some well defined access procedures for
reading/writing the content data to be called when the data is
used, e.g., in an application. Since the data is stored in content
objects based on properties, the content schema used for accessing
the data objects can interpret and validate the stored data based
on the content object types. E.g., it is possible to fill in values
for absent property values of content objects. Default values may
be derived from an object or property specific recovery procedure.
This allows the content management system to operate on an
"imperfect" database having missing data and/or data type
mismatches. In addition, it is possible to save storage space in
the database when storing nullable properties. Since these
properties are not stored when not present, i.e. when no value is
available to be bound to the property, the database size is
reduced. A default value for the respective property may be used
when the content object is accessed. In contrary to many database
systems, no memory locations for nulled properties are allocated
and no "null" values are stored.
[0017] Furthermore, a content object may be presented to a user or
an application by accessing (reading) each property of the content
object according to the respective property declaration or the
object type definition for the content object. E.g., the properties
of a content object may be formatted in different ways depending on
the property or the property type. This allows predetermined data
transformation to be applied on content data when it is used.
Accessing the content objects by means of the content schema allows
many automated procedures to be applied to the content data.
[0018] On the other hand, since content data is generated and
stored independently of the content schema, the proposed method and
apparatus for storing and accessing data is much more flexible than
known object oriented databases. In an object oriented database the
content objects must strictly conform to the object type
definitions of the schema. In object oriented databases content
objects are always generated and stored according to the content
schema. Upon a content schema modification the entire database is
usually converted to match the modified schema. On the contrary, in
the present invention, content objects may be generated according
to the available content data. This is particularly important, if
content data is imported from content sources and the available
data does not match the content schema. In the present invention,
the imported data may be stored in content objects comprising the
respective properties of the imported data and not the desired
properties declared in the schema. This gives much flexibility in
storing the data, e.g., imported from different sources, but still
maintains a structured database.
[0019] It is preferable to examine the integrity of the stored
content objects by verifying if the stored content objects match
the content schema. Content objects may mismatch the content
schema, because objects are stored independently of a content
schema and may comprise properties from those declared in the
object types. Content objects may also mismatch the content schema
after a modification of the content schema which happens frequently
during web site development.
[0020] Typical operations to modify the content schema during
schema evolution are: add property; remove property; move property
to supertype; move property to subtype; rename property; reorder
properties; add subtype; extract common supertype; duplicate type;
rename type; and change property type. These operations may be
applied to the content schema to adapt the present content schema
to changing requirements and are supported by the content
management system of the present invention without losing its
access functionality to content data already stored. Since in the
present invention the content objects are stored independently of
the content schema, access to the accumulated content objects is
always maintained, even if the stored content and the content
schema deviate, and even if the content schema itself is
inconsistent. E.g., when a property in the content schema is moved
to a subtype, access to the respective property in content objects
is still possible as long as the relation between the moved
property in the content schema and the stored value bound to the
property in the content object is maintained.
[0021] A property may be a link property linking a content object
having the link property to a target object. Link properties are in
particular useful to express relationships between objects. A
property may be a link collection property comprising a set or bag
of links to target objects. This is in particular useful if a link
collection is indexed using a given key property to represent a
branching relation wherein the actual branch to be followed is
determined by the key. It is preferable that the integrity of
content objects is examined by comparing the types and property
value bindings of stored objects to the object type definitions of
the content schema. By comparing the property value bindings of
stored objects to the required object properties according to the
content schema type mismatches may be detected.
[0022] The content schema may impose additional constraints on the
content objects. A constraint may be the declaration of permitted
properties. Based on this constraint only properties declared in
the object type definition may be stored with a content object.
Another constraint may relate to required properties: properties
declared as nonnullable in an object type must be nonnull and,
therefore, stored in a respective object. An example for a required
property requires that each article must have a title and a text.
Furthermore, range constraints declaring minimum and maximum values
for a property value may apply. Cardinality constraints may require
that the number of elements of a link collection property must not
be outside the declared minimum and maximum cardinality. For
example, a home page must link at least three and at most ten
articles. Target type constraints may declare the target type of
elements linked to by a link collection property. In addition to
property type mismatches, stored content objects may violate
constraints imposed by the content schema. The integrity of the
stored content objects may be examined by verifying the constraints
imposed on the stored objects.
[0023] Upon a detection of an integrity issue a mismatch report may
be generated. The mismatch report may list all mismatching content
objects, their mismatching properties and/or the type of mismatch.
The mismatch report may be used to protocol the effects appearing
after schema modifications. Furthermore, data mismatching the
content schema may be presented to a user who is prompted to adapt
the data. The user may correct mismatching property types or fill
in missing values for nonnullable properties. This manual
correction process may be supported by a suitable user interface
presenting error messages, invalid property values and some
explanation about the reason for the integrity issue to the
user.
[0024] While the integrity issues described above are acceptable
and allow the core system to function, for many embodiments of the
present invention it is preferable to resolve as many integrity
issues as possible automatically within a short time. Only the
remaining integrity issues persistently affect the access of
content data.
[0025] Therefore, upon a detection of an integrity issue a
predetermined issue resolution procedure may be employed to resolve
the mismatch by adapting the content data, especially by adapting
property values. This resolution procedure may apply one or more of
the following rules, depending on the necessity of issue resolution
and the necessity of keeping old data.
[0026] Rules to apply when a property binding violates constraints
from the property declaration: change property value to a default
value, possibly to a null value; compute new property value based
on old value; if an integer value outside of the declared minimum
and/or maximum value of a range constraint is detected, clip it to
meet the constraint; if an link list is too long, truncate the link
list.
[0027] Rules to apply when a property is illegally present and
there is another nulled property of similar name: move the value of
the illegally present property to the other property; move the
value of the illegally present property to another property after
type correction, e.g., link.fwdarw.link list, integer.fwdarw.float,
string.fwdarw.date.
[0028] Rules to apply when a property is illegally present: look
for a subtype of the current type declaring the property and
replace the current type by the subtype; remove the property.
[0029] Rules to apply when a property is illegally nulled: look for
an undefined, but existing property of similar type and name, then
move the value for that property to the nulled property; initialise
the property with a default value; compute property from other
property using some application level procedure; in the case of a
link property, create a resource of the required type, binding
properties to values with a suitable heuristic. The latest issue
resolution procedure may, e.g., when a property is illegally
present in the original resource and that property is declared in
the link target type, move the property to the newly created
resource.
[0030] In response to a content schema modification the integrity
of stored objects may be examined only for those objects that may
be affected by the schema modification. It is preferable to examine
the integrity of objects of a modified object type or a subtype of
a modified object type. It is also preferable to examine the
integrity for objects comprising a modified property and/or for
objects affected by modifications of constraints relating to
attributes of properties. Based on the object-oriented type
hierarchy declared in the content schema it is possible to
determine the effects of schema modifications on content objects.
Therefore, only a partial examination of the content objects is
required to verify the integrity of stored objects. Schema
modifications may be verified without excessive integrity checking.
This partial object verification saves computational effort in
maintaining consistency in the entire database. This is in
particular important for large content databases, since it is not
necessary to examine the entire database upon a modification of an
object-type definition or a property-type definition.
[0031] Based on the object-oriented approach for storing and
accessing data the following rules may be used in order to
determine the content objects that may be affected by a schema
modification.
[0032] Rules relates to permitted properties: when a property is
changed in a certain object, check the object; when a property
declaration is added or removed from an objecttype, check all
instances of the type and all subtypes.
[0033] Rules relating to required properties, range constraints,
cardinality constraints or target type constraints: when a property
is changed in a certain object, check the object; when a property
declaration is added, removed or changed, check. all instances of
all subtypes of the type that defines the property; when a property
is moved from one type to another, only check the instances of each
type that is an instance of a subtype of exactly one (not both) of
source and destination type.
[0034] Rules relating to index properties:
[0035] when a key property of an object is changed, recheck all
indexes that include the object; when an object is added to or
removed from an indexed link collection, recheck the index.
[0036] Rules relating to modifications of the type hierarchy:
[0037] when the type hierarchy is changed (i.e. when a supertype
link is changed) and the new supertype inherits a set of properties
different from the original one, then all instances of the moved
type and its current subtypes must be rechecked.
[0038] In order to automate updates of property values it is
preferable that a property is a computed property. The value of the
computed property is automatically calculated upon access on a
content object whose type declares the computed property. Because
computed properties are updated automatically, computed properties
satisfy global integrity constraints more easily, reducing the need
for integrity checking. Furthermore, computed properties reduce the
required storage size.
[0039] The value of a computed property of an object may be
calculated based on a set of properties of that object, e.g., a
computed property may be the mean value of a specified set of
property values. The value of a computed property may also be a
collection of objects having link properties which are linking back
to a content object having the computed property. This "reverse
link collection" is a set or bag of objects that link to the
object. Further computed properties may be indices on a link
collection property. The resource stores the property LINKS and
every linked resource has a property NAME. A lookup table may be
generated to automatically locate a resource in LINKS with a given
NAME. When a property holds a link to a binary object (e.g.
picture, movie, Word document) a computed property may be deployed
whose value depends on the binary property, e.g., the height of the
stored picture, the playtime of the movie, the number of pages of
the Word document. It is also possible to calculate a computed
property based on a predetermined algorithm that derives the
property value given the object and the state of the remaining
workspace. For computed properties, care is to be taken to cache
computed properties as long as possible, but to recompute the
values automatically when needed.
[0040] Computed properties are especially useful for
directories/folders. When the content of a folder is stored using
an ordinary link collection that references the contained document,
every document that is added or removed in the directory requires a
change to the folder object. When documents link to folders so that
folders can keep track of their contained document using a computed
property, the folder objects are changed only rarely. This removes
a possible performance bottleneck. Computed properties may also
take the place of obsolete properties after a schema migration, so
that read access still works as before. For this reason, computed
properties should be treated like stored properties whenever
possible.
[0041] Resource versions for modifications of content objects or
object-type definitions may be recorded. A resource version stores
a version of a resource, i.e., either a content object or an
object-type definition, in order to record the evolution process of
the resource and reproduce old versions. It is preferable to treat
content objects and object-type definitions uniformly as resources
for the purpose of versioning. Since the content schema is an
aggregation of object-type definitions and property type
definitions, the content schema may be versioned in the same way as
content objects.
[0042] Resource versions may be stored upon modifications of
content objects or object-type definitions, at certain points of
time, upon receiving a user command, or upon receiving a versioning
request by some system component. It may not be necessary to store
every modification of resources. It is preferable to record some
selected state of the resource, possibly based on a mandatory
check-out/check-in procedure. E.g., a developer may check out an
object-type definition of the content schema, thereby acquiring
exclusive access to the resource, apply a set of modifications to
the object type and check in a modified object type definition.
Upon the check-in of the modified object-type definition, a new
version for this resource may be stored.
[0043] A label referring to a set of objects each having a
specified version may be stored in order to refer to this set of
objects. Also a label referring to a content schema version
comprising an aggregation of object-type definitions each having a
specified version may be stored. By means of this label a version
of the entire content schema is generated. Preferably a label is
used to record a set of versions that fulfil some interobject
integrity constraints.
[0044] In order to prevent a proliferation of versions in the
referring or aggregating object, versions of content objects
comprising link properties or link collection properties are stored
irrespective of the versions of target objects. Otherwise, when a
version is directly linked from many sources, a modified version
would only be linked after changing all referring resources,
thereby generating new version of these resources.
[0045] The content schema and the content objects may be stored in
workspaces. A workspace may be formed based on labels. The labels
select the set of objects comprising the workspace. Modifications
of resources remain local to the respective workspace in which the
modification occurred until explicitly transferred to another
workspace. A workspace is the virtual content management system in
which a user works, i.e., his view on the content management
system. Changes applied in other workspaces have no effects on the
present workspace until the changes are integrated in a controlled
way. This allows parallel development and/or content data authoring
to happen in each workspace individually.
[0046] In content management systems it is preferable that a
development workspace for the modification of the content schema, a
production workspace for the generation and modification of content
objects and an integration workspace for the integration of the
content schema modification, the content object generation, and the
content object modification are provided. In this setup the roles
of different people in web site development and operation are
reflected. Developers modifying the content schema operate in the
development workspace. Modifications of the content schema do not
influence the work of authors operating in a production workspace.
In the production workspace, content objects are generated and
modified in order to accumulate and update content data. Upon
reaching a consistent content schema and verifying the effects of
schema modifications on content data, resources from the
development workspace and the production workspace are integrated
in an integration workspace. During this integration some integrity
issues may be detected, and the content schema and/or content
objects may be adapted. It is preferable to transfer the resources
of the integration workspace after the issue resolution to an
additional live workspace. In order to prevent any impediment
caused by the integration on the actual web site operation, the
actual web page generation for the public user of the web site may
operate based on the live workspace.
[0047] Different versions of the same resource (object or object
type) from a source and a target workspace may be unified in the
target workspace. Unification may be performed by selecting the
latest version for the resource of one workspace, if the latest
version of the other workspace is preceding the selected version.
Otherwise, a version conflict is detected and the versions from
both workspaces are marked as conflicting versions. Conflicting
versions for the same resource may be merged into a single version.
Merging may be performed individually for every property used in
any of the versions, taking into account the property values of a
suitably determined common predecessor version of all conflicting
versions and the property values of the conflicting versions.
[0048] Unification means the integration of possibly branched
version histories from different workspaces into a single
workspace. Merging means the integration of possibly different
versions into a single version that combines all changes. It is
preferable to treat every resource (i.e. every version history)
separately.
[0049] The integration of different versions may preferably operate
as follows: Unless already present, all versions of the resource
required for merging are copied from the source to the target
workspace. If the current version (head version) of the resource in
the source workspace precedes the current version of the resource
in the target workspace, the current version of the target
workspace is selected and no merging is required. If the current
version of the resource in the target workspace precedes the
current version of the resource in the source workspace, the
version from the source workspace becomes the new current version
in the target workspace and no merging is required. Otherwise, the
versions are unordered and merging is required.
[0050] For a merge of a resource a common predecessor of the
versions to be merged is determined. For merging it is suitable to
treat all properties separately. If a property has changed in one
version, but not in the other, the changed property may be used as
merged property. It may be reasonable to report a warning about a
heuristic merge. If the property has changed in both versions, a
property-type specific merge algorithm may be enacted. The merge
procedure may be dynamically configurable.
[0051] Content objects, content types and/or properties may be
identified by unique invariable identifiers. Content objects and/or
content types may be represented in a persistent store. The
representation of content objects and/or content types may be based
on the unique invariable identifiers.
[0052] The persistent store may be updated in isolated
transactions. This way, changes from different sources can be
applied to the objects without interfering with each other.
Transactional access is desirable when unifying many resources in a
workspace at once or when making joint updates of the schema and
configuration objects.
[0053] The persistent store may be a relational database or an
object database. A mapping of the content objects and/or the
content schema to a relational database may be provided based on
the identifiers. The relational database may be used to store the
content data in content objects independently of the content
schema. Access to the stored information in the database is
maintained, even if the stored content objects and the content
schema deviate. The mapping of the objects to the database may be
performed by a set of tables, where each table stores
property-value bindings for one property type, i.e. integer
properties, string properties, and so on. Other database mappings
are possible, e.g., one table may be used per property, or one
table may be shared by different property types.
[0054] It is preferable to distribute event notifications upon
changes to the content objects or the schema. Based on received
events other parts of the content management system, e.g., the web
site generator, may take actions. Events may be stored persistently
so that the history of events may be retrieved. A persistent
storage is important for those content applications that must
analyse the entire stream of changes to the repository, e.g., to
maintain a searchable index or to collect statistical data. The
persistent storage will allow such applications to resume work
after a system malfunction.
[0055] The method according to the present invention is preferably
executed by an electronic computer system comprising suitable hard
and software. The inventive apparatus is preferably embodied as an
electronic computer system comprising suitable hard and
software.
[0056] The apparatus according to the present invention may
preferably comprise an input means for inputting content data to be
stored. Data may be inputted from an manual input terminal for data
acquisition and/or by inputting data from other systems, in
particular databases or online data distributors. Inputted data may
be converted or transformed in order to match the format of the
content management system. The apparatus may comprise an output
means to output content data, e.g. an output terminal to display
the data for processing. Outputted content data may also be
processed by further data processing systems, such as web page
generators or web servers. A web page generator may request content
data from the content management system in order to generate web
pages to be displayed.
[0057] The presented method and apparatus for storing and accessing
data in a content management system stores content data in content
objects, i.e., in a structured way. This allows automated
procedures to be applied to the content data, which would be very
difficult when storing unstructured data. Since the content objects
are generated independently of the content schema, a flexible
object-oriented content database is provided enabling content
schema evolution based on the changing requirements for a web site.
The content schema in the present invention may be modified
resulting in deviations between content schema and stored content
objects.
[0058] The presented method for operating a content management
system allows access to the stored content data even if the content
schema and the content objects mismatch. The operation of the web
site is always maintained and data to be presented may be retrieved
from the database. The operation of the database is independent of
a schema mismatch and inconsistent objects may be accessed and
corrected.
[0059] The proposed method and apparatus allow efficient data
import based on the data format of the imported data. Since no data
conversion is required, the computational effort for converting an
entire database is avoided. Furthermore, it is not necessary to
store different versions of content objects caused by converting
content objects to a new data format directed by a modified content
schema. This saves significant amounts of storage spaced required
to store different versions of the entire database in various
formats and allows large web sites to the developed without
excessive storage requirements. The access, the presentation and
the modification of the content data is governed by an explicitly
stored meta schema.
[0060] According to the present invention, inconsistencies between
content and schema that might arise out of schema evolution
operations may be detected. Based on the objectoriented structure
of the database this issue detection may be performed without
examining the entire database, i.e., all content objects, which
reduces the computational load on the content management system
caused by frequent schema modifications. Furthermore, several issue
correction procedures are provided to automatically adapt the
content data to the modified schema.
[0061] In the present invention, workflow support and multiuser
access in web site development and operation are provided by using
workspaces that isolate users or groups of users from each other
while enabling controlled information exchange. By means of
automated unify and merge procedures different versions of possibly
nonlinear branching version histories of resources are integrated.
Since object-type definitions and content objects are treated in
the same way, a unified approach to handle content schema evolution
and content generation is provided.
BRIEF DESCRIPTION OF THE DRAWINGS
[0062] FIG. 1 shows a simple content schema for content modelling,
represented as UM.
[0063] FIG. 2 shows a simplified UML class diagram for a meta
model.
[0064] FIG. 3 is a first illustration of an example to explain the
handling of content data according to the present invention.
[0065] FIG. 4 is a second illustration of an example to explain the
handling of content data according to the present invention.
[0066] FIG. 5 is a third illustration of an example to explain the
handling of content data according to the present invention.
[0067] FIG. 6 shows an example to explain the concurrent evolution
of content and layout.
[0068] FIG. 7 shows another example to explain the evolution of
shared content.
[0069] FIG. 8 shows an example to explain the correction of
content.
[0070] FIG. 9 shows a preferred system architecture for
implementing the invention.
[0071] FIG. 10 shows a flow diagram to explain an evolutive web
site development process.
[0072] FIG. 11 shows a flow diagram to explain the unification of a
set of versions across workspaces.
[0073] FIG. 12 shows a flow diagram to explain the resolution of
conflicts.
[0074] FIG. 13 illustrates how multiple merges across workspaces
and issue detection may minimise impact on production work.
[0075] FIG. 14 shows an example to explain a possible mapping of a
content base to relational database.
[0076] FIG. 15 shows a flow diagram to explain how the set of
properties for a resource is determined.
DETAILED DESCRIPTION
[0077] In the CoreMedia Content Application Platform (CAP) an
object-oriented schema is used to model content data. The schema is
explicitly defined, stored in and understood by the content
management system. Content objects are classified by their
structure, and their properties and associations are defined. In
order to allow a content schema evolution, the schema needs to be a
variable, with a concrete schema developed and adapted on a per
project basis. The schema and structure for a web site is usually
created by starting from a "best practice" sample solution and
modifying it until it fits the site's requirements. Based on a
flexible schema it is possible to provide an application platform,
providing the basic services and frameworks upon which various
applications can be built.
[0078] Deploying an explicit schema in content management provides
a communication vocabulary for all the people involved in
developing and maintaining the web site. It eases the understanding
and the handling of instances of the schema. When used to express
assumptions made by algorithms it improves correctness and
efficiency of automated services by offering formalised consistency
guarantees. FIG. 1 shows a simple content schema for content
modelling, represented as UML.
[0079] However, due to initial lack of understandings and to
changing requirements, the explicit content schema needs to be
changed during operation of the web site without unduly disrupting
operations. Since collecting and processing content data is very
expensive, content schema evolution must be possible, even if a
considerable amount of content has already been accumulated.
[0080] In the present invention an object-oriented schema is
deployed. This allows automated procedures to be applied to the
content data, which would be very difficult when storing
unstructured data. A meta-model is used for expressing the schema
and for expressing the content. The model is object-oriented,
implementing a subset of UML. Objects are the semantic unit for
reference, locking and versioning.
[0081] FIG. 2 shows a simplified UML class diagram for a
meta-model. An object aggregates property bindings, each binding a
property to an appropriate value. An object can bind a property
only once. Available classes of properties include atomic
properties such as string and integer properties, media properties
bound to XML or binary data, and link properties bound to
collections of links to other objects. Each object has a declared
type. A type declares properties and can inherit further properties
from a single parent type. The subtype/supertype relationship is
acyclic. The predefined type "Resource" is the single root of the
inheritance hierarchy. A declaration of a link property includes
the expected type for targets of the link or link collection and
other application level constraints such as cardinality
restrictions.
[0082] The schema is an aggregation of several meta objects, namely
types and property declarations, interconnected by inheritance and
association. A number of constraints may relate the schema to the
instances. Some natural language examples of constraints are:
[0083] an object only binds properties declared in its declared
type or a supertype thereof,
[0084] if a property p is declared as nonnullable, then it must be
bound in all instances of p's declared type, and all instances of
transitive subtypes thereof,
[0085] for all targets t in a binding of a link property p, the
declared type of t is a transitive subtype of or equal to the
target type of p;
[0086] the number of targets in a binding of a link property p is
not less than p's minimum cardinality and is not larger than p's
maximum cardinality.
[0087] FIG. 3 shows a schematic diagram of an example to explain
the handling of content data according to the present invention. In
FIG. 3A a content schema 10, consisting for simplicity of a single
object type Article 1, is presented. In this first version of the
object-type definition article three properties 2, 3, 4 are
defined. The property ID 2 is an atomic property of the integer
type. The properties Text 3 and Author 4 are atomic properties of
the string type. In addition, two content objects 5, 6 are
presented. In the content management system according to the
present invention, content data is stored in content objects. The
content objects are used by a page generator to generate a web page
using templates.
[0088] The objects 5, 6 are generated to store article data. In
content object 5, property values for the properties ID 2, Text 3
and Author 4 are stored by binding these properties to the
respective values "1", "This is text" and "Joe Average". In content
object 6, the properties Text 3, Author4 and Date are bound to
their respective values. Since during the generation of content
object 6, no identification number for this article was available,
no value is bound to the property ID 2 of this object 6. An
additional property Date was bound for object 6 to store the date
when the object was generated. Since the generation and storing of
content objects is independent of the content schema 10, the
proposed method to store and access content data is very flexible.
Content objects may be generated based on the available information
to store in the object.
[0089] FIG. 4A shows a simple example of a template to illustrate
the generation of web pages based on content objects. In this
example the Text properties and the Author properties of three
articles selected by the system are displayed. In FIG. 4B the
displayed output of the web page is shown. The fields of the
template are substituted by the content data, more precisely by the
value of the respective properties of the content objects. Since in
this example in the beginning only two content objects 5, 6 are
available, empty fields are displayed at the bottom of the article
list.
[0090] In FIG. 3B a modified content type definition 8 for the
Article type is presented. In this second version of the content
type Article 8 the property Author 4 was dropped and has been
replaced by an integer property AuthorNo 16 in order to store an
author number as opposed to an author name, in order to obtain
unambiguous data. In addition, a new property Headline 9 was added.
This property is a string property which is nullable. Nullable
properties must not be present in an object to match the object
type.
[0091] During operation of the content management system, a new
content object 7 was generated and stored the next day. In this
content object 7 an ID property, a Text property, an AuthorNo
property, a Headline property and a Date property are stored.
[0092] According to the present invention stored objects are
accessed by accessed by means a content schema. In FIG. 4C the
displayed output of the web page generated by the template shown in
FIG. 4A is presented. Because the template has not yet been adapted
to the new content schema, it still accesses the old Author 4
property, which is undefined for the third article. The Text
property of all three content objects is correctly displayed. The
Headline property 9 of the modified Article type 8 is not used in
the template shown in FIG. 4A. Additional information stored in
properties of content objects not required in the content schema is
ignored when accessing the content data by means of the content
schema. Although the stored content objects 5, 6 do not completely
match the modified content schema 11, the content management system
may still access the available data in the content objects without
converting the entire database to a format complying to the
modified content schema.
[0093] FIG. 5A shows the web page template after being adapted to
the object type declarations of the modified content schema 11.
This template makes use of the properties of the second version of
the Article object type 8 by using the AuthorNo property 16 and the
Text property 3 of the content objects. Because an author number is
not easily readable on a web page, a conversion function toName
takes care of outputting the author's name.
[0094] FIG. 5B shows a web site generated by the template of FIG.
5A. During the access on the content objects 5, 6 the absence of
the AuthorNo 16 property is detected, and an automated substitution
procedure is invoked. The automated substitution procedure replaces
missing property by a default. Afterwards, the conversion function
to Name can convert the default value into the output "Unknown".
While the output is still not perfect, it is now possible to adapt
the stored data to the new schema.
[0095] In order to obtain and maintain consistency between
object-type definitions stored in the content schema and the
content objects, the integrity of the stored content objects may be
examined by verifying if the stored content objects match type
definitions and constraints of the content schema. Upon a detection
of an integrity issue a predetermined issue resolution procedure is
invoked to resolve the mismatch by adapting the content data
accordingly. Alternatively, data mismatching the content schema may
be presented to a user who is prompted to adapt the data. A user
interface may assist the user by presenting schema and content and
listing integrity issues to be resolved by the user. Schema
violations may be reported and the user prompted to correct the
content data or to fix the violation of metalevel integrity
constraints if the schema itself is inconsistent. This may happen,
e.g., when the inheritance hierarchy of the object-oriented schema
is changed.
[0096] In the example shown in FIGS. 3, 4 and 5 the content objects
5, 6 and 10 are examined by comparing the types and properties to
the object-type definitions of the modified content schema 7.
Because the AuthorNo property 16 was not marked as nullable, the
objects 5 and 6 will be reported as having issues, namely a missing
property. After an automated correction that takes the property
Author 4 into account or after a manual editing of the missing
AuthorNo property 16. The web site si now correctly displayed as
shown in FIG. 5C.
[0097] Other typical constraints imposed by a schema are range
constraints for property values, cardinality constraints for the
number of elements of a link collection property, or target type
constraints restricting the type for a target object of a link
property or a link collection property. Further constraints may
relate to index properties. If a link collection is indexed using a
given key property, then one such constraint may be that this key
property must be stored in all elements of the link collection.
Another constraint declares that, if a link collection is indexed
using a given key property, then no two elements of the link
collection may have the same key value. Furthermore, if the
declaration of a link collection property (reverse link set, link
list or link map) disallows duplicates, then the property must have
no duplicate elements. Another constraint declares that if the
declaration of a link collection property defines an inverse link
collection property, then whenever A links to B using the forward
property, B must link to A using the inverse property. Another
possible constraint imposed by a schema is acyclicity: If a
property is declared to be acyclic, then there must not exist a
resource loop when following the given property. Every possible
constraint gives rise to an issue. Issues may be efficiently
checked incrementally, i.e., by only rechecking the resources that
may be affected by a change of the content schema.
[0098] Based on the requirements for quality control and multiuser
access configuration management for the content schema and the
content objects is required. By means of configuration management
the reproducible construction of versions of resources and its
control evolution into new versions is enabled. Controlled
evolution requires change management, status accounting, a review
process, some form of workflow support and some form of teamwork
support. Reproducibility is achieved by immutably recording all
data and procedures used during development of the web site (either
by name or by value) and by assigning a stable name to the set. All
relevant resources are digitally represented and manipulated using
computer implemented tools which allows the recording as well as
the construction process to be automated. In the content management
system according to the invention, the content schema and the
content objects are stored in workspaces. A workspace is the place
where a user or a group of users work together on advancing the web
site in some direction. Changes can be applied to all resources in
the content management system, but remain local to the workspace
until explicitly transferred to another workspace. This allows to
review changes in context, which is necessary since different
resources can influence each other in subtle ways. Conversely, it
allows to temporarily ignore changes applied in other workspaces.
Since development happens in parallel, there is a potential for
incompatible changes. These conflicts have to be resolved,
sometimes involving user interaction, but also involving tool
support for visualising the changes and proposing mergers.
Workspaces are also known as long-term transactions.
[0099] Workspaces may be based on labels. A label refers to a set
of resources, and selects exactly one version of each resource from
the set. It can be considered a version of the aggregate of the
referenced resources, and provides a single name for referring to
the entire configuration. A label can be used to record a set of
versions that "go together", i.e., fulfil some interresource
integrity constraints or to record the versions of resources
delivered to some customer, or to record the state of the entire
web site at some significant point of time.
[0100] The content of a web site is malleable and is released very
frequently. In content management, configuration management has to
deal with explicit links between resources. Links are relevant,
since a link between two resources manifests some kind of
dependency or influence, and therefore gives hints about the
possible implications of a change. After changes to the
participating resources, the validity of a link may have to be
rechecked. Storing links to fixed versions can lead to a
proliferation of versions in the referring or aggregating objects,
because for every new version of the link target, the link source
will have to be updated as well. As a more feasible solution, links
to objects are stored irrespective of versions, and on access, the
version of the target object to use is resolved depending on the
context. This context may be given by the notion of the
workspace.
[0101] Another important notion of WCMS is staging, where content
is duplicated into a private area and a publicly viewable area. The
private copy of the web site becomes a production workspace, and a
live copy is another workspace that is only accessed by merging
updates from the production workspace. As soon as developments
begin to interfere, the creation of additional workspaces becomes
worthwhile. If the site layout is represented as content (content
objects and object-type definitions are treated uniformly as
resources), it is important to prevent broken intermediate layout
versions to appear on the web site, or to prevent the broken layout
to block the authoring of content.
[0102] As shown in FIG. 6, this conflict is resolved by introducing
a layout development workspace 12, with only stable layout states
being merged into the content production workspace 13, and from
there to the live site 14. In the example of FIG. 6, the layout of
the web site is modified in the development workspace 12, while the
content is changed in the production workspace 13, and user
accesses are handled in the live workspace 14. On the right hand
side of FIG. 6, it is presented how these independent changes are
integrated. First, the modified layout is merged from the
development workspace 12 to the production workspace 13. In the
production workspace 13 the modified layout and the changed content
are integrated. After integration, the changes are merged from the
production workspace 13 to the live workspace 14 where they will
appear to the general public.
[0103] Additional refinements may include the introduction of an
additional quality assurance staging level, and the introduction of
an additional level of layout or editorial workspaces which are
responsible for specific independent areas of the web site.
Developing the content schema and operating with content in
workspaces results in increasing parallelism and can protect from
unwanted interference. Based on the concept of workspaces an
application service provider scenario can be implemented: Several
customers use a common set of documents types, templates and
structural objects, but develop their own site content and layout
in isolation. The common framework is initially copied from a base
workspace to a client workspace. However, the customer will want to
stay up-to-date regarding further development of the framework.
This can be achieved by developing the framework in separate
workspaces until a stable state is reached, and then reintegrating
the updates into the common base workspace. From there, a customer
can on demand merge the update into his workspace. Since customers
refer to a common framework, they may even exchange content
directly from customer to customer. In the example shown in FIG. 7,
two content applications are developed. After merging the content
base into content application 1, content is added in this content
application. A subset of the content base is merged into content
application 2, where some of the content is edited (black
triangle). After editing the content is merged back from content
application 2 to the content base. From there, modified content is
merged to content application 1. After these merges, the content
data in content application' 1 comprises all modifications of
content.
[0104] Schema evolution during web site development and operation
requires the recording of multiple versions of the schema and/or
the recording of changes to the schema. The schema is an
aggregation of several meta objects, namely types and property
declarations interconnected by inheritance and association.
Versioning only the complete schema cannot be maintained when
independent modifications to the schema are to be merged e.g. when
two new types are independently added to the schema. Merging schema
versions must recognise the modifications as affecting disjoined
meta objects, and therefore potentially being compatible.
[0105] It is therefore preferable to store versions of each meta
object separately and to consider a schema version to be a label,
i.e., an aggregation of individual versions of resources.
Associations between meta objects are treated just like links
between content objects, in the sense that the version of the
target object to use is determined by the workspace, not by the
link. This, however, can lead to the violation of meta level
integrity constraints when seemingly independent changes are
applied. E.g., when combining changes to the inheritance hierarchy,
a merger may easily produce cycles. In order to allow manual
resolution of integrity violations of this type, it is necessary to
cope with objects not matching the schema, which is one of the
advantages of the present invention.
[0106] It is preferable that links to explicitly given versions are
not supported. Instead, only links to objects disregarding the
versions are permitted. The same holds for schema data: links
within the schema target schema objects, not versions. The singular
link from content objects to schema objects, which denotes an
object's type, also refers to the object type, not to a specific
version of the type. This ensures a consistent handling of schema
objects. In other words, the schema (the set of object type
definitions) is treated in the same way as content objects. For the
purpose of versioning, the schema is content.
[0107] Based on the concept of workspaces an application service
provider scenario can be implemented: Several customers use a
common set of documents types, templates and structural objects,
but develop their own site content and layout in isolation. The
common framework is initially copied from a base workspace to a
client workspace. However, the customer will want to stay
up-to-date regarding further development of the framework. This can
be achieved by developing the framework in separate workspaces
until a stable state is reached, and then reintegrating the updates
into the common base workspace. From there, a customer can on
demand merge the update into his workspace. Since customers refer
to a common framework, they may even exchange content directly from
customer to customer. In the example shown in FIG. 7, two content
applications are developed. After merging the content base into
content application 1, content is added in this content
application. A subset of the content base is merged into content
application 2, where some of the content is edited (black
triangle). After editing the content is merged back from content
application 2 to the content base. From there, modified content is
merged to content application 1. After these merges, the content
data in content application' 1 comprises all modifications of
content.
[0108] Schema evolution during web site development and operation
requires the recording of multiple versions of the schema and/or
the recording of changes to the schema. The schema is an
aggregation of several meta objects, namely types and property
declarations interconnected by inheritance and association.
Versioning only the complete schema cannot be maintained when
independent modifications to the schema are to be merged e.g. when
two new types are independently added to the schema. Merging schema
versions must recognise the modifications as affecting disjoined
meta objects, and therefore potentially being compatible.
[0109] It is therefore preferable to store versions of each meta
object separately and to consider a schema version to be a label,
i.e., an aggregation of individual versions of resources.
Associations between meta objects are treated just like links
between content objects, in the sense that the version of the
target object to use is determined instances. The set of property
values bound by an object has to be stored individually for each
object. The store only depends on the constant property type of
property declarations, which determines the format of the values
bound by that property.
[0110] Independence of the mutable schema results in increased
flexibility when storing objects and properties. As mentioned
before, the user defined schema is stored as objects, instances of
predefined, immutable meta-types. For the purpose of storing, the
schema is treated as content, meaning that object-type definitions
and content objects are handled in the same way and are referred to
as resources. The schema is subject to check-in/check-out access
control and versioning just like regular content objects. All
changes are recorded, at the latest when merging them between
workspaces. Recorded states of individual objects and aggregates
can be retrieved and compared later on the application level. If
possible, the store allows the interpretation of previously stored
data according to an evolved schema, e.g., by delivering default
values for absent properties. In this way, the creation of
additional versions is avoided, which simplifies or avoids merging.
This is necessary especially when a schema update is merged into a
busy production workspace. In this case, editorial work should be
disturbed as little as possible.
[0111] An example of a complex schema evolution which requires no
changes to the instances (the content objects) appears in
refactoring, when a type T is split in two types in such a way that
a type Vis inserted which becomes the new supertype of T and
receives the old supertype of T as its own supertype. Since the
total set of inherited properties of T has not changed, its
instances are not affected even if the properties previously
declared in T are now distributed across T and T'.
[0112] The part of the system level closest to the application
level is the integrity issue detection. In order to detect
integrity issues, the system interprets the rules expressed in the
user-defined schema (if possible), applies them to the content base
and reports any violations. Checks may be performed incrementally
for efficiency reasons.
[0113] Since all intraschema integrity constraints can be expressed
using an immutable meta schema, integrity issues in the
user-defined schema are reported like any other integrity issue. In
order to avoid reporting after effects, integrity issues in
instances of a type may be suppressed, if the type itself has
issues.
[0114] It is possible to preclude integrity issues in certain
workspaces, most importantly in live workspaces 14 which are used
to generate the web pages of the public web server. FIG. 8 shows an
example for a situation where malformed content arises due to a
changed content type (black circle) that was merged into the
production workspace 13. The problem is reported to an operator who
must resolve the issue by adapting a content object (black
triangle) before merging the schema and the content into the live
workspace 14, thereby keeping the live workspace 14 permanently
operational. The application layer of the content management system
is responsible for reacting to schema problems by analysing the
change history and evaluating problems in context; determining a
course of actions to resolve issues, either by calling an issue
resolution procedure, or by proposing different solutions to the
operator; and implementing the resolution of schema issues by
migrating content or merging in compatible schema changes.
[0115] Automated support for analysing schema changes is feasible,
even for changes involving multiple types and their relation to
each other. Sometimes it is preferable that a human operator has to
verify the proposed transformer before it is applied to the
contents.
[0116] FIG. 9 shows a preferred system architecture for
implementing a data processing apparatus according to the
invention. In this apparatus, content objects to store content data
are generated by a content object generation means 20. The
generated content objects are stored in a content object storing
means 21 which is structured in this preferred embodiment as a
relational database. The relational database is used by an
object/relational mapping that does not yet incorporate information
about the schema. This mapping is only concerned with nonnull
stored properties, although it provides basic computation methods
for reverse navigation, computation of indices and the like. An
object-oriented content schema is generated by a content schema
generation means 22 and stored in a content schema storing means
23.
[0117] Two other components, the bulk operation handler and the
issue detection, execute directly on the relational database in
order to gain a performance advantage. On top of these components a
content object accessing means 24 structured as an objectoriented
presentation layer is created. The object-oriented presentation
layer is aware of the schema and presents computed properties and
null-valued properties correctly. The presentation layer performs a
schema-aware object/relational mapping.
[0118] FIG. 10 gives an overview of an example of the lifecycle of
a content application that employs schema migration and
configuration management to respond to changing requirements and
inevitable shortcomings of the initially conceived setup. The
presented lifecycle assumes that there are essentially two phases
when setting up a content application: the initial creation and the
subsequent maintenance of the operation. Providing the possibility
of schema evolution according to the invention simplifies
maintenance, but it also takes some burden of the setup phase,
because suboptimal solutions can be improved later on.
[0119] The most important versioning operations in integrating
different versions of different workspaces are unifying and
merging. Unification ensures that all versions present in one
workspace are also accessible in the other workspace. Merging
ensures that the information contained in conflicting unified
versions is combined into a new merged version of the resource.
[0120] Unification can always be done fully automatically without
any user intervention. Sometimes merging might require further
actions of the user whenever version conflicts, i.e., inconsistent
changes of the same resource in different workspaces, are detected.
An automatic merge procedure may be dynamically configurable (to be
exchanged during run time of the system). In a property type
specific merge procedure, a merged property value may be calculated
(partially or completely). If the value can only be merged
partially, a conflict is reported, where the report includes both
changed property values and the proposed partial merge. If the
property merge can be completed automatically, a warning about a
heuristic merge may be reported. Another preferred solution to
resolve merge conflicts is to query the content schema and to give
preference to the value that satisfies the schema and avoids
issues. On the other hand, it might be sensible not to resolve
conflicts in the schema automatically at all, so that the user is
requested to resolve the conflict even in those cases that lend
themselves to an algorithm based solution. For example, quality or
security requirements might lead to such a decision. After an
automatic merge of versions, issue detection may be performed, and
detected issues may be included in a generated report.
[0121] Merging and unification of resource versions may be applied
to object type definitions as part of the schema as well as to
content objects, since both are treated in the same way as general
resources of the content management system.
[0122] FIG. 11 shows a flow diagram to explain the unification of a
set of versions across workspaces.
[0123] FIG. 12 shows a flow diagram explaining the resolution of
conflicts between different versions of a resource (merging).
Unification and merging must not necessarily be performed in
immediate succession. In fact, multiple unifications are possible
before resolving a merge conflict.
[0124] Issues are violations of either system defined constraints
or mismatches of the type system and the stored resource state.
Issue detection is completely separate from merge conflict
detection. In fact, a merge without conflicts might result into
issues. Similarly, no issues might turn up while many merge
conflicts are resolved.
[0125] Which issues are defined and detected depends on the actual
type system used. E.g., an implementation that supports multiple
inheritance might, or might not, report an issue if one supertype
is inherited twice via different paths. An implementation that does
not use inheritance at all might still report types without name.
Any implementation that allows to specify target types for link
properties might complain about illegal values, whereas simpler
systems might only complain if a link property is not marked as
nullable or nonnullable.
[0126] FIG. 13 illustrates how multiple mergers across workspaces
and issue detection can lead to a clean production workspace 13
with minimum interruption of the normal work procedure. First, the
schema is modified and tested in a separate development workspace
12, so that preliminary experimental schema versions are not
visible outside. Secondly, the content data and the new schema are
merged into a dedicated integration workspace 15. Issues are
immediately reported. Thirdly, issues can be resolved by automated
procedures that were developed beforehand in the development
workspace 12. This stage might not be able to resolve all issues in
all cases. Therefore a manual correction of issues follows as a
fourth stage. While the issues were resolved in the integration
workspace 15, a few resources win have been modified in the
production workspace 13. These changes can now be merged into the
integration workspace 15. Since only a few resources were modified,
only a few issues need to be resolved, and only few further changes
are introduced into the production workspace 13. When the
integration workspace 15 is finally merged into the production
workspace 13, only very few issues will remain, which can be
resolved in the production workspace 13 without fatal effects on
the remaining work.
[0127] FIG. 14 shows an example of a content base. A single page
object, which represents the home page of the managed site,
contains two articles, one sports article and one politics article.
The sports article is enriched by an image. Note that this example
is considerably simplified in order to clarify the following
presentation. In real applications, articles might also contain a
separate headline, a summary, the name of the writer, links to
other articles, an expiration date, and so on.
[0128] In the following it will be shown how this content base can
be stored in a relational database. In FIG. 14, properties were
represented by their names, which is inappropriate for permanent
storage for several reasons: it consumes a lot of storage space, it
makes it impossible to change the names of the properties later on,
which may very well be desirable, and it makes it more difficult to
introduce different properties with the same name for different
object types. Therefore, the properties are internally represented
by identifiers, e.g., numbers. FIG. 14B shows an arbitrarily chosen
mapping.
[0129] Similarly, IDs are chosen for the objects. In FIG. 14A such
identifiers j are already provided in the form of numbers for each
object. E.g., the page object has the ID 100 1.
[0130] One possible representation of the content data can be
obtained by creating one database table per property type, i.e.,
one table for string properties, one table for link properties, and
so on. FIG. 14C shows the string property table.
[0131] The other table that will be shown is the link collection
property that associates article to pages. FIG. 14D shows that the
database may contain more than one row for each object/property
pair, because a link collection might contain more than one
element. Still, it holds that each object binds each property to at
most one value, in this case to one set value.
[0132] When a content object is accessed, the property-value
bindings of the content object can be retrieved by selecting from
the property tables all rows with the object ID associated to the
content object. The precise representation of the object can
afterwards be determined by taking the schema into account.
[0133] The selected table structure makes the storage entirely
independent of the schema defining the content types. Whether the
Article type actually defines the property imglink or not is
irrelevant. If an article contains a binding of the img link
property, this property is stored, otherwise it isn't.
[0134] In FIG. 14D the association of property IDs to property
names was treated as a special case, but it is of course possible
to consider the properties themselves as specialised objects. This
would lead to a uniform space of IDs and would allow to treat the
name of the property simply as a special string property that is
only provided for objects that represent properties. Similarly,
content object types can be represented as objects. It is
convenient to define resources as the union of content objects and
schema objects.
[0135] While the IDs were shown as simple integers, more complex ID
structures are possible, which might make it easier to keep IDs
unique.
[0136] As mentioned before, null data values are not stored in
order to ease the removal of unnecessary properties from the store
and to save storage size. However, when the properties of a
resource are accessed, the null properties will be readded whenever
they are mentioned by the schema. FIG. 15 shows a flow diagram
explaining how the set of all properties for a resource is
determined.
* * * * *