U.S. patent application number 14/480334 was filed with the patent office on 2014-12-25 for enterprise evidence repository.
The applicant listed for this patent is International Business Machines Corporation. Invention is credited to Roman Kisin, Pierre Raynaud-Richard.
Application Number | 20140379764 14/480334 |
Document ID | / |
Family ID | 45353524 |
Filed Date | 2014-12-25 |
United States Patent
Application |
20140379764 |
Kind Code |
A1 |
Kisin; Roman ; et
al. |
December 25, 2014 |
ENTERPRISE EVIDENCE REPOSITORY
Abstract
A controller is configured to generate and propagate
instructions to an execution agent which, in turn, is configured to
collect and deposit collected artifacts into a repository. Write
access to a location in the repository for collected artifacts that
are to be deposited into a specified location is granted to the
execution agent. Once the execution agent deposits the collected
artifacts in the specified location in the repository, a summary of
collected artifacts is propagated to the controller. The controller
manages appropriate levels of access to the collected artifacts,
while the repository enforces the level of access. The controller
can grant read only access to the collected artifacts or it can
allow for controlled changes to be made to the metadata associated
with the collected artifact. An agent processes the data and
generates additional metadata that can be associated with the
collected artifacts and then saved in the repository. A system can
have more than one repository, where the controller allocates
storage in an appropriate repository and issues instructions to the
execution agent with the location in an appropriate repository. The
summary of the actual collections is then propagated to the
controller from the repositories.
Inventors: |
Kisin; Roman; (San Jose,
CA) ; Raynaud-Richard; Pierre; (Redwood City,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
International Business Machines Corporation |
Armonk |
NY |
US |
|
|
Family ID: |
45353524 |
Appl. No.: |
14/480334 |
Filed: |
September 8, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12826471 |
Jun 29, 2010 |
8832148 |
|
|
14480334 |
|
|
|
|
Current U.S.
Class: |
707/812 |
Current CPC
Class: |
G06F 16/22 20190101;
G06F 16/10 20190101; G06Q 50/18 20130101; G06F 17/00 20130101 |
Class at
Publication: |
707/812 |
International
Class: |
G06F 17/30 20060101
G06F017/30; G06Q 50/18 20060101 G06Q050/18 |
Claims
1. A computer implemented method for storing and accessing
collected artifacts in an electronic discovery system (EDMS),
comprising: managing, by an EDMS, electronic discovery workflow in
an enterprise, including issuing and propagating instructions to a
collection agent, and generating one or more collection plans that
specify one or more custodians that are responsible for data in the
enterprise; managing, by one or more evidence repositories,
collected artifacts along with contextual data and metadata,
wherein the one or more evidence repositories include a transient
storage area to which collected artifacts are deposited, wherein
the transient storage area includes a directory structure created
based on the one or more collection plans, and wherein the
directory structure includes one or more automatically provisioned
locations for depositing the collected artifacts for a given
custodian in the one or more custodians; and performing, by the
collection agent, artifact collection based at least in part on the
one or more collection plans, including depositing collected
artifacts to locations in the directory structure of the transient
storage area based at least in part on the one or more collection
plans.
2. The method of claim 1, further comprising: controlling, by the
EDMS, space allocation in the transient storage area for upcoming
collections.
3. The method of claim 1, further comprising: providing, by the one
or more evidence repositories, a content management module
configured to provide advanced collaboration capabilities,
extensible metadata, and access control.
4. The method of claim 1, wherein eDiscovery process metadata
associated with the collected artifacts comprises one or more of
collection target, reasons for collection, collected by, collected
on, collection plan, and legal case.
5. The method of claim 4, wherein the eDiscovery process metadata
further comprises external properties.
6. The method of claim 3, further comprising managing, by the EDMS,
access control.
7. The method of claim 6, further comprising: propagating access
control rules from the EDMS to the one or more evidence
repositories.
8. The method of claim 7, further comprising: granting a specific
data processing tool read access to a selected subset of the
collected artifacts in the one or more evidence repositories, as
defined by said EDMS.
9. The method of claim 8, wherein the data processing tool
comprises an early case assessment (ECA) tool.
10. The method of claim 9, further comprising: writing, by the data
processing tool, application specific metadata associated with the
collected artifacts into the one or more evidence repositories.
11. The method of claim 1, further comprising: granting, to a data
exporting tool, read access to the collected artifacts in the one
or more evidence repositories to extract, by the data exporting
tool, collected artifacts and metadata and to package, by the data
exporting tool, the extracted collected artifacts and metadata for
outside review.
12. The method of claim 11, further comprising: extracting a
summary of said extracted collected artifacts and metadata, the
summary comprising one or more of date, purpose, description,
volume in MB, estimated number of pages, number of documents
overall and broken down by document type and by person or data
source.
13. The method of claim 12, further comprising the step of:
propagating the export metadata from the one or more evidence
repositories to the EDMS.
14. The method of claim 12, further comprising: propagating the
export metadata from the one or more evidence repositories to a
discovery cost forecasting system (DCF).
15. The method of claim 1, further comprising: automatically
issuing collection instructions for information technology (IT) and
integrating the collection instructions with an overall discovery
workflow.
16. The method of claim 15, wherein the instructions comprise one
or more of a unique location of a collection staging area for a
given legal case, a collection plan, a collection log, a data
source, and a custodian.
17. The method of claim 1, further comprising: issuing automated
preservation and collection instructions and propagating the
instructions to an automated or semi-automated collection tool.
18. The method of claim 15, wherein the collection instructions
comprise a secure token for identifying collection parameters, the
method further comprising: automatically validating, by the secure
token, integrity of a collection, including chain of custody.
19. A non-transitory computer readable storage medium for storing
program instructions that, when executed by a processor, cause the
processer to perform operations for storing and accessing collected
artifacts in an electronic discovery system (EDMS), comprising:
managing, by an EDMS, electronic discovery workflow in an
enterprise, including issuing and propagating instructions to a
collection agent, and generating one or more collection plans,
wherein the one or more collection plans specify one or more
custodians that are responsible for data in the enterprise;
managing, by one or more evidence repositories, collected artifacts
along with contextual data and metadata, wherein the one or more
evidence repositories include a transient storage area to which
collected artifacts are deposited, wherein the transient storage
area includes a directory structure created based on the one or
more collection plans, and wherein the directory structure includes
one or more automatically provisioned locations for depositing the
collected artifacts for a given custodian in the one or more
custodians; perform, by the collection agent, artifact collection
based on the one or more collection plans, including depositing
collected artifacts to locations in the directory structure of the
transient storage area based on the one or more collection
plans.
20. An apparatus for storing and accessing collected artifacts in
an electronic discovery system (EDMS), comprising: at least one
processor; an EDMS operable by the at least one processor and
configured to: manage electronic discovery workflow in an
enterprise, issue and propagate instructions to a collection agent,
and generate one or more collection plans, wherein the one or more
collection plans specify one or more custodians that are
responsible for data in the enterprise; one or more evidence
repositories configured to manage collected artifacts along with
contextual data and metadata, wherein the one or more evidence
repositories include a transient storage area to which collected
artifacts are deposited, wherein the transient storage area
includes a directory structure created based on the one or more
collection plans, and wherein the directory structure includes one
or more automatically provisioned locations for depositing the
collected artifacts for a given custodian in the one or more
custodians; and the collection agent operable by the at least one
processor and configured to: perform artifact collection based on
the one or more collection plans and deposit collected artifacts to
locations in the directory structure of the transient storage area
based on the one or more collection plans.
Description
PRIORITY CLAIM
[0001] This application is a continuation of Ser. No. 12/826,471,
filed on Jun. 29, 2010, entitled ENTERPRISE EVIDENCE REPOSITORY,
the entire content of which is incorporated herein by
reference.
TECHNICAL FIELD
[0002] The invention relates to electronic discovery (eDiscovery).
More particularly, the invention relates to an enterprise evidence
repository.
BACKGROUND OF THE INVENTION
[0003] Electronic discovery, also referred to as e-discovery or
eDiscovery, concerns discovery in civil litigation, as well as tax,
government investigation, and criminal proceedings, which deals
with information in electronic form. In this context, the
electronic form is the representation of information as binary
numbers. Electronic information is different from paper information
because of its intangible form, volume, transience, and
persistence. Also, electronic information is usually accompanied by
metadata, which is rarely present in paper information. Electronic
discovery poses new challenges and opportunities for attorneys,
their clients, technical advisors, and the courts, as electronic
information is collected, reviewed, and produced. Electronic
discovery is the subject of amendments to the Federal Rules of
Civil Procedure which are effective Dec. 1, 2006. In particular,
for example, but not by way of limitation, Rules 16 and 26 are of
interest to electronic discovery.
[0004] Examples of the types of data included in e-discovery
include e-mail, instant messaging chats, Microsoft Office files,
accounting databases, CAD/CAM files, Web sites, and any other
electronically-stored information which could be relevant evidence
in a law suit. Also included in e-discovery is raw data which
forensic investigators can review for hidden evidence. The original
file format is known as the native format. Litigators may review
material from e-discovery in any one or more of several formats,
for example, printed paper, native file, or as TIFF images.
[0005] The revisions to the Federal Rules formally address
e-discovery and in the process, have made it a nearly certain
element of litigation. For corporations, the rules place a very
early focus on existing retention practices and the preservation
and discovery of information.
[0006] In response to the climate change in the e-discovery arena,
corporations are:
1) enhancing their processes for issuing legal holds and tracking
collections; 2) looking for ways to reduce the costs of collecting,
processing and reviewing electronic data; and 3) looking upstream
to reduce the volume of unneeded data through better retention
policies that are routinely enforced.
[0007] The new field of e-discovery management has emerged to
assist companies that are overwhelmed by the requirements imposed
by the new rules and the spate of legal and regulatory activity
regarding e-discovery.
[0008] Currently, e-discovery management applications (EMA) rely on
a variety of approaches to store electronic data for e-discovery.
For example:
[0009] EMAs store content as binary objects in a database.
Transaction information as well as file collections are typically
stored in the same relational database located on a database
server;
[0010] EMAs also store content as content objects in a content
management system. EMAs can use a content management system, such
as EMC DOCUMENTUM, EMC CORPORATION, Hopkinton, Mass., to store
unstructured content; and
[0011] EMAs can use a local or networked file system to store
content as files in a file system and a database to store file
metadata.
[0012] Such conventional methods provide convenience and
functionality, such as allowing the data to be updated, allowing it
to be checked in and checked out, and so on. However, data stored
for the purpose of e-discovery typically has the character of being
immutable and unstructured, i.e. the data is to be permanently
stored, or at least stored for a very long time; the data is not to
be changed or updated or checked-in or -out very often; and it is
typically unnecessary to organize or structure the data in a
database or content base. In view of the immutable, unstructured
nature of e-discovery data, such conventional storage approaches,
in spite of their convenience and functionality, involve a number
of disadvantages: [0013] High hardware cost: Databases, content
management systems, and local file systems are usually stored in
arrays of hard disks. The high hardware expense may be justified
for transactional data, but it is exorbitant in the case of the
immutable, unstructured content typically used in e-discovery;
[0014] High maintenance cost: In all of the above scenarios,
maintenance requires a skilled administrator. In the case of a
database, the administrator must be trained in database technology;
in case of a content management system (which usually resides on
top of a database), the administrator must also be skilled in
content management systems. These maintenance costs may amount to
hundreds of thousands of dollars in salary and thousands in
training costs. As above, such expense may be justified for
transactional data but is needless in the case immutable
unstructured content; [0015] Extra information technology (IT)
planning and coordination: Necessary disk space must be projected
and purchased upfront, requiring close involvement of IT personnel,
e.g. coordination between parties such as the Chief Legal Officer
and the Chief Information Officer; [0016] High capital investment:
To ensure available disk space, the company has to buy more disk
space than it needs at any particular time; and [0017]
Inefficiencies in cost accounting: It would be beneficial to treat
storage as a cost related to a particular litigation matter as
opposed to a capital expense.
[0018] Thus, there exists a need to provide a way of storing
collected content in e-discovery applications that eliminates
unnecessary expense and managerial and administrative overhead,
thus achieving cost savings and simplifying operations.
SUMMARY
[0019] An embodiment of the invention comprises a system that
includes a controller that is configured to generate and propagate
instructions to an execution agent. The execution agent is
configured to collect and deposit collected artifacts into a
repository. The controller coordinates allocation of the storage in
the repository. The controller propagates the collection
instructions to the execution agent: the instructions contain a
location for depositing collected artifacts. Write access must be
granted to the execution agent. Such access is provided to a
location in the repository for collected artifacts that are to be
deposited into a specified location. Once the execution agent
deposits the collected artifacts in the specified location in the
repository, a summary of collected artifacts is propagated to the
controller, thus providing transparency into the overall collection
process.
[0020] Collected artifacts can be made available to a processing
agent that is configured to perform various processing functions on
them. The controller manages appropriate levels of access to the
collected artifacts, while the repository enforces the level of
access. The controller can grant read only access to the collected
artifacts or it can allow for controlled changes to be made to the
metadata associated with the collected artifact. An agent can
process the data and generate additional metadata that can be
associated with the collected artifacts and then saved in the
repository.
[0021] Collected artifacts, along with the contextual data and
additional metadata, reside in the repository. The controller can
grant read only access to an agent that is capable of extracting
all of the data from the repository and exporting it out.
[0022] A system can have more than one repository to store
collected artifacts and metadata. In such a configuration, the
controller allocates storage in an appropriate repository. The
controller issues instructions to the execution agent with the
location in an appropriate repository. The summary of the actual
collections is then propagated to the controller from the
repositories.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] FIG. 1 is a block diagram that illustrates a system of a
controller, an execution agent, and a repository for storing
collected artifacts according to the invention;
[0024] FIG. 2 is a block diagram that illustrates an eDiscovery
management system (EDMS) according to the invention;
[0025] FIG. 3 is a block diagram that illustrates automatic
provisioning of a collection staging area controlled by an EDMS
according to the invention.
[0026] FIG. 4 is a diagram that illustrates an example of an EER
data model in an ECM system according to the invention;
[0027] FIG. 5 is a block diagram that illustrates a flow of DCF
metadata according to the invention;
[0028] FIG. 6 is a block diagram that illustrates a flow of
metadata from an evidence repository to an EDMS according to the
invention;
[0029] FIG. 7 is a block diagram that illustrates how content and
metadata from a staging area is ingested into an evidence
repository according to the invention;
[0030] FIG. 8 is a diagram that illustrates collection content and
metadata re-use according to the invention;
[0031] FIG. 9 is a block diagram that illustrates different
components of the overall collection process monitoring system
according to the invention;
[0032] FIG. 10 is a block diagram that illustrates support for
multiple repositories according to the invention;
[0033] FIG. 11 is a screen shot showing a `My Tasks` tab according
to the invention;
[0034] FIG. 12 is a first screen shot showing an initial collection
from key players according to the invention;
[0035] FIG. 13 is a second screen shot showing an initial
collection from key players according to the invention;
[0036] FIG. 14 is a third screen shot showing an initial collection
from key players according to the invention; and
[0037] FIG. 15 is a block schematic diagram of a machine in the
exemplary form of a computer system within which a set of
instructions may be executed to cause the machine to perform any of
the herein disclosed methodologies.
DETAILED DESCRIPTION
Related Documents
[0038] The following documents are cited herein to provide
background information in connection with various embodiments of
the herein disclosed invention. These documents are incorporated
herein in their entirety based upon this reference thereto:
[0039] Discovery cost forecasting patent applications:
[0040] Forecasting Discovery Costs Using Historic Data; Ser. No.
12/165,018; filed 30 Jun. 2008; attorney docket no. PSYS0007;
[0041] Forecasting Discovery Costs Based On Interpolation Of
Historic Event Patterns; Ser. No. 12/242,478; filed 30 Sep. 2008;
attorney docket no. PSYS0012;
[0042] Forecasting Discovery Costs Based on Complex and Incomplete
Facts; Ser. No. 12/553,055; filed 2 Sep. 2009; attorney docket no.
PSYS0013; and
[0043] Forecasting Discovery Costs Based On Complex And Incomplete
Facts; Ser. No. 12/553,068; filed 2 Sep. 2009; attorney docket no.
PSYS0015;
[0044] Automation patent application:
[0045] Method And Apparatus For Electronic Data Discovery; Ser. No.
11/963,383; filed 21 Dec. 2007; attorney docket no. PSYS0001;
and
[0046] Collection transparency patent application:
[0047] Providing Collection Transparency Information To An End User
To Achieve A Guaranteed Quality Document Search And Production In
Electronic Data Discovery; Ser. No. 12/017,236; 21 Jan. 2008;
attorney docket no. PSYS0003.
[0048] Terminology
[0049] The following terms have the meaning associated with them
below for purposes of the discussion herein:
[0050] Enterprise Discovery Management System (EDMS): technology to
manage eDiscovery workflow in an enterprise such as the Atlas
Enterprise Discovery Management system offered by PSS Systems of
Mountain View, Calif.;
[0051] Enterprise Content Management (ECM) tools: a set of
technologies to capture, manage, retain, search, and produce
enterprise content, such as IBM's FileNet;
[0052] Early Case Assessment (ECA) tools: technology to evaluate
risks associated with eDiscovery by identifying and analyzing
relevant evidence;
[0053] Discovery Cost Forecasting (DCF): technology to model,
forecast costs associated with eDiscovery, such as the Atlas
DCF;
[0054] Evidence Repository (EvR): a system and processes for
securely collecting, preserving, and providing access to documents
and related metadata collected as part of eDiscovery;
[0055] Collection Manifest: a file describing various attributes of
the contents of a collection including, but not limited to, the
following type of metadata: chain of custody, file types, sizes,
MAC dates, original locations, etc; and
[0056] Self-collections: a process of collection in which a legal
function sends collection instructions directly to custodians and
the custodians perform collection from local PCs, email, PDAs, file
share, etc.
[0057] Abstract System
[0058] FIG. 1 is a block diagram that illustrates a system 100
comprising a controller 110 that is configured to generate and
propagate instructions to an execution agent 120. Instructions can
be structured as well defined parameters, including date range and
other filtering criteria applicable to a particular data source; or
unstructured, including text instructions including data location,
filtering criteria, and where to deposit collected artifacts. The
execution agent is configured to collect and deposit collected
artifacts into a repository 130. The controller coordinates
allocation of the storage in the repository. The controller
propagates the collection instructions to the execution agent: the
instructions contain a location for depositing collected artifacts.
The location can be a physical location of the repository, network
file path, etc. Write access must be granted to the execution
agent. Such access is provided to a location in the repository for
collected artifacts that are to be deposited into a specified
location. Once the execution agent deposits the collected artifacts
in the specified location in the repository, a summary of collected
artifacts is propagated to the controller, thus providing
transparency into the overall collection process. The summary can
be an unstructured description of the data collected or a
structured collection manifest with additional metadata.
[0059] Collected artifacts can be made available to a processing
agent (see 510 on FIG. 5) that is configured to perform various
processing functions on them, such as review, culling, tagging,
etc. The controller manages appropriate levels of access to the
collected artifacts, while the repository enforces the level of
access. The controller can grant read only access to the collected
artifacts or it can allow for controlled changes to be made to the
metadata associated with the collected artifact. An agent can
process the data and generate additional metadata, such as tags and
notes, that can be associated with the collected artifacts and then
saved in the repository.
[0060] Collected artifacts, along with the contextual data and
additional metadata, reside in the repository. The controller can
grant read only access to an agent that is capable of extracting
all of the data from the repository and exporting it out.
[0061] A system can have more than one repository to store
collected artifacts and metadata (see FIG. 10). In such a
configuration, the controller allocates storage in an appropriate
repository. The controller issues instructions to the execution
agent with the location in an appropriate repository. The summary
of the actual collections is then propagated to the controller from
the repositories.
[0062] eDiscovery System
[0063] FIG. 2 is a diagram that illustrates an integrated
electronic discovery (eDiscovery) system 200 in which individual
systems and methods for collecting and managing evidence are
employed according to one embodiment of the invention. In FIG. 2,
an eDiscovery management system (EDMS) 210 is configured to
propagate instructions to an IT person 220 and to allocate space in
an evidence repository 250. The IT collects data from a data source
230. Data is deposited to the evidence repository. Monitoring data
is propagated from the evidence repository to the EDMS. The system
manages the overall eDiscovery workflow and provides visibility
into the overall process by monitoring how many documents are
deposited, ingested, indexed, etc. To do this, a number of files
and their states are monitored at each stage of the process and
that information is propagated to EDMS. It is not necessary to
propagate collected data back to the EDMS. In one embodiment, the
EDMS 210 allocates storage and provisions directories in the
transient storage, also referred to herein as the staging area
260.
[0064] The EDMS propagates the legal case and other process data
and metadata to the evidence repository, including (see FIG. 3)
legal matter 331, collection plan 340, and collection logs 350,
351, based on data source or custodian, etc.
[0065] The EDMS 210 also generates a structured collection plan
with detailed collection instructions. The IT 220 receives the
instructions and performs collections from the data source 230,
depositing the collected documents to the location of the directory
in the staging area specified in the collection instructions
received from EDMS.
[0066] Content source metadata is propagated along with the content
of the collected documents. This type of metadata is derived from
the content of collected file, for example size in bytes, page
count, checksum, or hash code, calculated based on the content of a
file, MIME type, etc.
[0067] Location metadata is propagated along with the contents of
collected files. This type of metadata represents the location from
where the files were originally collected. Examples of the metadata
include: name or address of a PC, server, file path, file name, and
modified, accessed, and created date of the file.
[0068] Collected documents and the metadata are ingested from the
staging area to the evidence repository. Collected documents are
grouped and linked to appropriate metadata that has been previously
propagated to the evidence repository.
[0069] Collection Staging Area
[0070] FIG. 3 is a diagram that illustrates how a collection
staging area 300 is provisioned and managed. A collection plan is
typically used by the legal department to manage the process of
collecting potential evidence. For example, the legal department
can initiate a new collection plan for collecting evidence from key
players identified in a case. Additionally, the collection plan
also specifies additional collection instructions as parameters for
collections, e.g. list of keywords, effective date range, etc.
[0071] When a new collection plan is created and published, the
EDMS automatically propagates the collection plan, custodians, and
data source information, and creates a directory structure in the
collection staging area, which in FIG. 3, for example, includes a
root 320, matter 330, 331, plan 340, and logs 350, 351. The
structure of the directories is optimized to simplify manual
processing, with the directories named in a human-readable way that
refers to legal matters, collection plans, data sources, collection
logs, and custodians. The structure shown in FIG. 3 is provided for
purposes of example only; those skilled in the art will appreciate
that other structures may be used in connection with the invention
herein disclosed.
[0072] The EDMS also propagates the access control rules to the
staging area by granting an appropriate level of access on a target
collection deposit directory to an appropriate user or a group of
users, based on the work assignment as defined in the EDMS.
[0073] Having an automatically managed staging area for collections
enables simple and reliable collection process. The EDMS contains
all of the data necessary to execute a collection based on the
collection parameters specified by the legal department as part of
the collection plan. Folders in the staging area are automatically
provisioned for collections, data sources, and custodians. IT does
not need to create folders manually. Collection instructions are
automatically issued by the EDMS when the collection plan is
published. The drop-off location parameters are automatically
generated based on the network file share location path of an
auto-provisioned directory in the collection staging area.
[0074] Evidence Repository
[0075] The evidence repository manages large volumes of collected
documents and metadata and can be built on top of an existing
content management system, such as an ECM.
[0076] FIG. 4 is a diagram that illustrates an example of the
evidence repository data model 400 for an ECM system. The evidence
repository root 410 is a top-level container for all the legal
cases and related data. Entities representing matters 420 and 421
contain the process metadata propagated from EDMS, including, for
example, any of the following: legal case name, legal case unique
identifier, description, matter security group, matter type,
attorney, legal assistant, outside counsel, effective date, status,
etc.
[0077] The legal case entity is a container for the collection or
interview plans 440, 441, 442, 443 which can be further categorized
into structured collection plans, such as 440, 442, 443, and self
collection plans 441. Collection plans have process metadata
propagated from the EDMS that includes, for example, the following
properties: name, status, date, collection parameters, etc.
[0078] Collection plans contain collection logs 460, 461, 462, 463,
464, 465. Collection logs have process metadata that includes, for
example custodian, data source, log entry, conducted by, date
conducted, status, etc. The collection logs contain evidence items
that include the content and metadata of the collected documents.
The metadata for the collection log is comprised of the process,
source, and location metadata, as defined above.
[0079] Self-Collections
[0080] Advanced EDMS systems, such as the Atlas LCC, allow for
custodian self-collections. This is a type of collection process
when individual custodians receive collection instructions from the
legal department and collect evidence, such as emails, documents,
and other data, with easy to use tools provided to individual
custodians. When using that mechanism the content and metadata may
be collected to a dedicated EDMS storage.
[0081] The EDMS is responsible for propagating the data collected
as part of a self-collection to the evidence repository.
[0082] Existing collections stored in EDMS are automatically
migrated by moving the content and related case metadata to the
evidence repository. This allows for centralized evidence
management regardless of the type of a collection and its
origins.
[0083] Data Processing
[0084] Data processing is an important part of the overall
eDiscovery process. The EDMS can grant an appropriate level of
access to users authorized to use a processing tools against the
collected data stored in the evidence repository to enable the data
processing. Examples of such access include read-only access to the
case data and metadata or a subset of this data, and write access
to a subset of metadata. Some data processing tools, such as Early
Case Assessment (ECA) tools, can generate additional metadata, such
as tags, notes, etc. The metadata generated by such a tool can be
stored in the evidence repository if the EDMS grants write access
on the subset of metadata associated with documents in the context
of a specified legal case, plan, etc.
[0085] Data Export
[0086] Export tools 520 (see FIG. 5), such as export module of the
Atlas EDM suite, are used to extract the content and metadata of
documents collected in the evidence repository and to package and
ship the data for an outside review or other use.
[0087] Export metadata is a metadata associated with an event of
exporting set of documents for an outside review. The metadata
contains, for example, the date of export, volume of export in
bytes, estimated number of pages exported, number of documents
exported, etc.
[0088] DCF Metadata
[0089] The evidence repository is expected to track the
overwhelming majority of the collected data. Facts created as a
result of the collection, processing, and exporting of the
collected data are automatically propagated from the evidence
repository to a DCF system. Having the most accurate and up-to date
facts is critical for reliable and precise eDiscovery cost modeling
and forecasting.
[0090] FIG. 5 is a diagram that illustrates a flow 500 of the DCF
metadata; and FIG. 6 is a block diagram that illustrates a flow of
metadata from an evidence repository to an EDMS. As the collection
content and metadata are being ingested into the evidence
repository, the summary data on the volume of collections in MB,
estimated page count, time, etc. is being continuously aggregated
on a per-matter basis and propagated from the evidence repository
to the DCF 540.
[0091] The collected content is processed and analyzed by using an
ECA 510 or similar set of tools. The collected content is tagged
with additional ECA metadata and the metadata is propagated to the
evidence repository. The metadata can be further aggregated and
propagated to the DCF system and used to improve the accuracy of
the discovery cost modeling and forecasting further.
[0092] Export tools 520 are used to extract the content and
metadata of documents collected in the evidence repository and
package and ship the data for an outside review or other use. The
volume and timing metrics, such as volume collected in pages and
GB, timing of collections, and number of custodians collected from
or associated with an export event, are critical for an accurate
discovery cost modeling and forecasting. The evidence repository
enables highly reliable and repeatable automated process of
propagating the export metadata to DCF when it becomes
available.
[0093] The export data propagated to the DCF includes, for example,
volume of export in bytes, estimated page count, date of export,
number of documents, etc.
[0094] Ingestion Process
[0095] The ingestion process is responsible for ingesting the
documents and metadata deposited into the collection drop-off
locations within the staging area to the evidence repository.
[0096] The ingestion process relies on relationships between a
folder in the staging area and collection log entity in the ECM
that were previously established by the EDMS. Based on the location
of documents in the staging area, the ingestion process finds
previously created corresponding collection log entities in the
evidence repository and links documents ingested from a collection
log folder to the collection log entity in the evidence
repository.
[0097] FIG. 7 is a diagram that illustrates an example 700 of how
documents and folders in the staging area can be ingested and
mapped to the entities in the evidence repository. The ingestion
process detects a new document in the staging area 210. New
documents 710, 711, 712 were deposited as a response to the
collection request for a given collection log from 1-12-10 which is
associated with a collection target, collection plan, legal
request, and legal matter. The process metadata was automatically
propagated to the EER earlier. The ingestion process looks up the
collection log entity 730 in the repository 250 that corresponds to
the location of the parent folder in the staging area Collection
Log 1-12-10, and it then creates evidence entities 720, 721, 722.
Documents in the staging area can now be removed or archived.
Documents in the evidence repository are now associated with all
the process metadata propagated from the EDMS.
[0098] In some cases collections might also include additional
metadata in a form of a collection manifest which can be in
proprietary formats or in an XML based formats, such as EDRM XML.
Collection manifest metadata is ingested along with collected
contents. A collection manifest contains additional metadata
including, for example, chain of custody, original location, etc.
That metadata gets associated with document evidence entity as part
of the ingestion process.
[0099] Improve Reliability
[0100] The reliability and accuracy of the collection process can
be further improved by adding a secure token to the collection
instructions for the IT. The secure token is a file containing
information that uniquely identifies the identity of an individual
collection target in a context of a collection plan.
[0101] The IT is instructed to deposit the token along with the
collected files into the drop-off location specified in the
instructions. As part of the ingestion process the system
automatically validates the integrity of the collection including
chain of custody and detects inconsistencies by comparing the
information in the secure token against the expected collection
target, collection plan, and other attributes based on the location
from the where collected data is being ingested Depending on the
ingestion policies such as a collection can be rejected. Exceptions
are escalated to an appropriate authority for handling. If, upon
the ingestion validation, the system detects that IT has mistakenly
deposited data collected for a target into incorrect location along
with a secure token for a given target, the system rejects the
collection and alerts appropriate IT users and, optionally the
legal department, with all of the details necessary to correct the
situation by placing collected data in an appropriate location.
This affects the overall status of the collection process
propagated to EDMS, making it transparent to all of the parties
involved until the issue is resolved.
[0102] Collection and Metadata Re-Use
[0103] The evidence repository holds large volumes of collected
data including, for example, content, source, location, process,
export, DCF metadata and the metadata generated by ECA and other
data processing tools. Collection with subsequent analysis and
culling can be very costly, especially if done repeatedly.
Redundant collection can be reduced or eliminated through the
collection re-use.
[0104] FIG. 8 is a diagram that illustrates an example 800 of a
collection reuse. Based on the similar legal case from the past, a
member of legal staff identifies a collection plan 440 and,
optionally, collection targets within the plan to be reused.
[0105] The EDMS instructs the evidence repository to establish
reuse relationships in the collection repository in such a way that
new matter 810 contains a reused collection plan from an existing
matter 420. The relationships can be established by copying the
evidence entities or by referencing existing collection plan
container.
[0106] An entire set the evidence metadata or a subset can also be
reused taking a full advantage of the analysis, culling, and export
that occurred in the legal case and collection plan being
reused.
[0107] Monitoring
[0108] FIG. 9 is a diagram that illustrates overall the different
components for collection process monitoring 900.
[0109] The EDMS 210 is responsible for the overall collection
process. All the stages of the overall process report exceptions, a
summary, and important statistics back to the EDMS. The EDMS
aggregates the monitoring data from all the stages of the
collection process, thus providing additional analytics. The EDMS
thus enables visibility into the overall collection process.
[0110] The staging area 260 is monitored by analyzing the contents
of the drop-off collection locations. The following exceptions and
statistics, for example, are reported back to the EDMS: number of
files deposited, pending ingestion, failed to delete, failed to
ingest within the time limit, etc. These statistics are grouped by
collection log, collection plan, legal case, and repository.
[0111] The evidence repository is monitored using platform specific
mechanisms to detect new documents matching appropriate criterions.
The following exceptions and statistics, for example, are reposted
back to the EDMS: number of files ingested, failed to link to an
appropriate collection log, various timeouts, etc. These statistics
are grouped by collection log, collection plan, legal case, and
repository and are propagated to the EDMS.
[0112] The data processing tools 910 may require an additional
content indexing or linking steps for the collected data to become
available for processing. For example, many ECA tools employ more
sophisticated content and metadata indexing mechanism that evidence
repository may provide. This requires additional processing as part
of making the collected data available for the analysis. The
following exceptions and statistics, for example, are reposted back
to the EDMS from the data processing step: number of files
available for analysis, number of files pending, number of files
failed, various timeouts, etc. These statistics are grouped by
collection log, collection plan, legal case, and repository and are
propagated to the EDMS.
[0113] Multiple Repositories
[0114] The system supports a configuration with multiple
repositories. All the repositories have a dedicated staging area
from where the collected data is ingested to each individual
repository. The EDMS maintains a catalog of evidence repositories
which contains the names, access control rules, and path to the
root of the staging area for each repository.
[0115] The evidence repository can be selected for a matter type,
legal case, and collection plan. When a collection plan is
published, the EDMS allocates storage and provision directories in
the staging area of a selected evidence repository. The EDMS
propagates the legal case and other process data and metadata to
the appropriate evidence repository.
[0116] The EDMS generates and propagates collection instructions to
an IT or an automated collection tools such as Atlas ACA containing
the location of the staging area for a selected repository.
[0117] Many countries have data protection laws designed to protect
information considered to be personally identifiable. For example,
EU directives establish a level of protection that effectively
makes data transfer from an EU member to the US illegal.
[0118] A multiple local evidence repositories can be set up to
eliminate the need to transfer the data across jurisdictions. The
instructions are generated such that collected content and metadata
are deposited in a location within the jurisdiction specific
staging area. Collection is ingested into a local ECM within the
local evidence repository.
[0119] FIG. 10 is a diagram that illustrates an example of a system
1000 with multiple evidence repositories located in the US and in
EU. A collection plan involving custodians and data sources located
in the EU 122 is propagated to an EU repository 1011.
[0120] Multiple repositories with various levels of security can be
used depending on a legal case security group, individual legal
case, and collection plan. Thus, a collection plan involving
custodians and data sources located in an IT department 121 is
propagated to a default repository 1010. For a case with increased
level of security the collection instructions are generated in a
way that collected content and metadata are deposited and managed
by a secure repository. FIG. 10 also illustrates an example of the
integrated system with multiple evidence repositories including
High Security Evidence Repository 123. For a legal case with an
elevated level of security classification the collection
instructions for an IT with higher security clearance and directed
towards a secure repository 1012. As a result collected documents
end up in a secure repository that maintains appropriate level of
security throughout the life cycle of a case.
[0121] Structured Collection Indexed by Custodian
[0122] Paralegal Creates Manual Structured Collection Plan: [0123]
Select DS and associated collection template [0124] Identify
custodians [0125] Provide collection parameters [0126] Click on the
`ownership` tab and assign the owners--IT personnel [0127] Publish
collection plan, e.g. initial email collection from key players for
a specified date range with specified list of keyword
[0128] Staging Area Sync [0129] Provision directories [0130]
Provision access control, e.g. grant write-only or read-write
access to the provisioned directories for appropriate IT users
[0131] Evidence Repository Sync [0132] Collection metadata is
automatically propagated to an EvR [0133] Access Control data is
propagated to the repository
[0134] Data Processing/ECA Sync [0135] Create a case container
[0136] Propagate collection metadata from repository to the data
processing tool [0137] Propagate access control data from EDMS to
the data processing tool [0138] Set up search and indexing
tasks
[0139] IT Person Gets Collection Instructions [0140] IT person or a
group identified as owner of the collection plan on the IT side
finds a new collection request/plan in `My Tasks` tab (see FIG. 11)
[0141] Instructions contain a list of parameters and list of
custodians Collection instructions might define a number of
custodians, e.g. John S, , Amy B. etc, and list of parameters of
various types, e.g. date range, list of keywords, etc. (see FIG.
12)e [0142] IT person clicks on the a specific custodian, this
opens up custodian view (see FIG. 13) [0143] New auto-calculated
parameter `Evidence Repository Collection Location` shows location
where collected files should be deposited Instructions clearly
state that all collected documents are to be placed in the
specified directory, e.g.
\\server_name\path_element1\path_element2\deposit_directory [0144]
Clicking on the new parameter opens Windows Explorer, pointing to
an automatically provisioned location to allow IT personnel to
deposit collected files at that location for a given custodian,
collection plan, etc [0145] Secure token is optionally provided as
part of the collection instructions
[0146] IT Personnel Perform Collections [0147] Use existing
collection process and tools [0148] Optional secure token is
deposited into the specified location [0149] Collection for a given
custodian deposited in the specified location [0150] Upon finishing
collection IT personnel set status for a custodian as completed
(see FIG. 14)
[0151] Collection Ingestion [0152] Files are processed as soon as
deposited into the designated directory [0153] Additional
validation performed to ensure that the files were deposited into
correct directory using secure token validation [0154] All the
documents collected for a given matter are automatically propagated
to a data processing tool via automated search and import
functionality of the processing tools or external timer task.
[0155] Data on the collection summary is propagated to the DCF
[0156] EDM [0157] Legal user selects a legal case [0158] Select an
evidence repository within the case [0159] User is taken to a data
processing or ECA tool, such as the IBM eDiscovery analyzer
[0160] Data Processing Tools/ECA [0161] Access control is
propagated from EDMS so only authorized users get access to the
case data [0162] Custom search template exposes EDMS specific
metadata, such as matterId, requested, collection log, etc. [0163]
User or a group of users performs analysis, culling etc. using eDA
[0164] Data on the data analysis summary is propagated to the
DCF
[0165] Data Export [0166] Authorized legal user creates a set of
data to be exported for outside review [0167] Data on the export
summary is propagated to the DCF
[0168] Collection Process Monitoring [0169] Collection deposit
process: How many files deposited, processed [0170] Collection
ingestion process: How many files ingested, pre-processed (archives
expanded, prepared for indexing), metadata associated, errors, etc
[0171] Analysis preparation: How many files were prepared for
analysis, indexed, added to a case, errors
[0172] Computer Implementation
[0173] FIG. 15 is a block schematic diagram of a machine in the
exemplary form of a computer system 1600 within which a set of
instructions may be executed to cause the machine to perform any
one of the herein disclosed methodologies. In alternative
embodiments, the machine may comprise or include a network router,
a network switch, a network bridge, personal digital assistant
(PDA), a cellular telephone, a Web appliance or any machine capable
of executing or transmitting a sequence of instructions that
specify actions to be taken.
[0174] The computer system 1600 includes a processor 1602, a main
memory 1604 and a static memory 1606, which communicate with each
other via a bus 1608. The computer system 1600 may further include
a display unit 1610, for example, a liquid crystal display (LCD) or
a cathode ray tube (CRT). The computer system 1600 also includes an
alphanumeric input device 1612, for example, a keyboard; a cursor
control device 1614, for example, a mouse; a disk drive unit 1616,
a signal generation device 1618, for example, a speaker, and a
network interface device 1628.
[0175] The disk drive unit 1616 includes a machine-readable medium
1624 on which is stored a set of executable instructions, i.e.
software, 1626 embodying any one, or all, of the methodologies
described herein below. The software 1626 is also shown to reside,
completely or at least partially, within the main memory 1604
and/or within the processor 1602. The software 1626 may further be
transmitted or received over a network 1630 by means of a network
interface device 1628.
[0176] In contrast to the system 1600 discussed above, a different
embodiment uses logic circuitry instead of computer-executed
instructions to implement processing entities. Depending upon the
particular requirements of the application in the areas of speed,
expense, tooling costs, and the like, this logic may be implemented
by constructing an application-specific integrated circuit (ASIC)
having thousands of tiny integrated transistors. Such an ASIC may
be implemented with complementary metal oxide semiconductor (CMOS),
transistor-transistor logic (TTL), very large systems integration
(VLSI), or another suitable construction. Other alternatives
include a digital signal processing chip (DSP), discrete circuitry
(such as resistors, capacitors, diodes, inductors, and
transistors), field programmable gate array (FPGA), programmable
logic array (PLA), programmable logic device (PLD), and the
like.
[0177] It is to be understood that embodiments may be used as or to
support software programs or software modules executed upon some
form of processing core (such as the CPU of a computer) or
otherwise implemented or realized upon or within a machine or
computer readable medium. A machine-readable medium includes any
mechanism for storing or transmitting information in a form
readable by a machine, e.g. a computer. For example, a machine
readable medium includes read-only memory (ROM); random access
memory (RAM); magnetic disk storage media; optical storage media;
flash memory devices; electrical, optical, acoustical or other form
of propagated signals, for example, carrier waves, infrared
signals, digital signals, etc.; or any other type of media suitable
for storing or transmitting information.
[0178] Although the invention is described herein with reference to
the preferred embodiment, one skilled in the art will readily
appreciate that other applications may be substituted for those set
forth herein without departing from the spirit and scope of the
present invention. Accordingly, the invention should only be
limited by the Claims included below.
* * * * *