Predictive Coding


A leader in the research, development and deployment of textual analytics tools to support audits, investigations and litigation, OrcaTec provides legal professionals with technologies packaged as services to support the core requirements of electronic discovery. Key technologies provided by OrcaTec to clients include Predictive Coding.

Predictive Coding

Not surprisingly, costs of predictive coding, even with the use of relatively experienced counsel for machine-learning tasks, are likely to be substantially lower than the costs of human review.

M. R.Grossman & G.V.Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive  Manual Review, XVII, Rich. J.L.&Tech.11 (2011).

OrcaTec’s predictive coding technology provides users with the capability to substantially reduce the overall time and cost required for effective ESI review. Key predictive coding attributes provided by OrcaTec’s Document Decisioning Suite™ include:

  • Accuracy:  Computer-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.
  • Consistency:  A consistency based on best attributes of technology (automation) and manual review (expert reviewer classification).
  • Efficiency:  The increasing volume of digital records makes predictive coding not only a cost effective option, but the only reasonable way to handle a large scale production.
  • Defensibility:  Global Aerospace Inc. v. Landow Aviation, L.P. d/b/a Dulles Jet Center, No. CL 61040 (Vir. Cir. Ct. Apr. 23, 2012).  First known court order approving the results of the use of predictive coding for electronic discovery (VA Circuit Court Judge Chamblin).

Computer-Assisted Quality Control Analysis (Predictive Coding)

Although computer-assisted review  is increasingly used for high volume reviews, certain cases or clients may require a traditional, “eyes on” review despite the significant volume of documents at issue. In those voluminous reviews, automated QC techniques can be employed to streamline the traditional QC process.

Tonya Deem & Dustin Green. Tips for Quality Control In Technology-Assisted Review. Law360 (2012).

OrcaTec’s predictive coding technology provides users with the capability to significantly enhance ESI review quality control (QC) and verification procedures in the electronic discovery process.  Key predictive coding attributes provided by OrcaTec’s Document Decisioning Suite that enhance review QC and validation include:

  • Reliability:  Research increasingly suggests that traditional review methods should not be presumed to be reliable. Computer-Assisted QC helps augment traditional review methods and can increase reliability.
  • Efficiency:  With the advent of computer-assisted review technology, ongoing QC becomes efficient and viable from both a time and cost perspective.
  • Quality:  Testing individual document batches throughout the review process allows adjustments to improve quality during review.
  • Defensibility:  Multiple rounds of QC (ongoing QC) may lead to more defensible results.

Suggested Predictive Coding Protocol

The following paragraphs outline of a basic, effective predictive coding protocol that can be used with the OrcaTec Document Decisioning Suite.  They address the technological issues involved in using predictive coding, while recognizing that there may also be legal / strategic issues that must be considered.   This protocol is only one of many that may be appropriate to a particular situation.

  • Meet and Confer. The parties meet to determine the parameters of eDiscovery, including preservation, collection, selected custodians, time ranges, topics, concepts, and other pertinent issues. Repeat as necessary as the case evolves.  OrcaPredict does not use key words or seed sets, so there is no need for those items to be considered  in the meet and confer.
  • Exploratory Analysis. The producing party, recognizing its obligation to produce responsive documents, begins document analysis.
  • OrcaPredict Training. The producing party, using one authoritative reviewer (the “expert”), begins predictive training. OrcaPredict generates a series of random sample sets for the expert,  , who decides only whether each document presented is responsive or non-responsive.  When all of the documents in a sample set have been coded, the system generates the next random sample set. As the expert codes, using his or her legal judgment about the status of each document, the training set for predictive coding is created.  OrcaPredict learns to match the expert’s decisions as precisely as possible.  If the expert is stingy with the documents, then the computer learns to be stingy.  If the expert is generous, then the computer learns to be generous.
  • Continuous Assessment.  Because each sample is random, each one is representative of the collection as a whole.  The system performance on each sample set is indicative (within the limits of the sample size) of the performance that the system would have if performance were stopped at that point.  OrcaPredict also computes a running average of the last four sample sets to give a more stable prediction of system efficacy.
  • Stopping Rule. The parties may negotiate a stopping rule – a point at which  they agree predictive training is complete.  This rule could entail a certain level of efficacy (e.g., Recall significantly greater than what would be achieved with a team of human reviewers, Recall greater than 50% or 75%).  It could entail a certain number of training documents, or a certain leveling of any increase in accuracy over blocks of, say, 400 documents.  The stopping rule that is appropriate will depend on the stakes of the case, on the subtlety of the distinction between responsive and non-responsive documents, on the consistency of training, and other factors.
  • Predictive Coding. When expert’s predictive training is complete, the remaining documents in the collection are automatically coded by the computer.
  • Evaluation. Because OrcaPredict trains on a series of random samples, each of these samples is representative of the population of all of the documents.  Therefore, the efficacy of the system’s performance on each sample is a statistical estimate of the system’s efficacy on the whole collection.  Strictly speaking, no further assessment is necessary, but depending on the situation, another assessment may be desirable.
  • Optional post assessment.  There are several ways that an evaluation can be conducted following predictive coding.
    • After the documents have been categorized by the system, review can be continued on newly generated samples of documents.  That is, the same expert continues to evaluate random samples of documents as generated by OrcaPredict until a sample size the parties agree is adequate has been obtained.  The system’s efficacy on this sample is taken as a measure of its performance.
    • A separate random sample of documents designated by OrcaPredict as non-responsive can be evaluated to compute the Elusion measure.  Elusion is the proportion of documents classified as putatively non-responsive that should have been classified as responsive.  Ideally, only a small proportion of the documents in the putatively non-responsive set will be found to be responsive.  In practice, the proportion of responsive documents in the putatively non-responsive set should be only a small fraction of the prevalence of responsive documents. The size of this sample will depend on the required confidence level and confidence interval.
    • A set of putatively responsive and a set of putatively non-responsive documents could be evaluated.  Ideally, all of the putatively responsive documents will, in fact, be found to be responsive and none of the putatively non-responsive documents will, in fact, be found to be responsive.  In practice, most of the putatively responsive documents should be found to be responsive and few of the putatively non-responsive documents should be found to be responsive.  This information can be combined to give an estimate of Precision and Recall.
  • Production. The documents designated responsive by OrcaPredict are reviewed by the producing party for privilege.  The remaining non-privileged documents then may be turned over to the receiving party.  Sharing of putatively non-responsive documents is not required to evaluate the technology, but it may sometimes be desirable to evaluate the criteria used by the authoritative expert reviewer.
  • Resolution. If there are disagreements about the produced documents that cannot be resolved by conferring, then a special master may be appointed to examine a sample of the documents and their computer-generated coding.  

About The OrcaTec Document Decisioning Suite

The OrcaTec Document Decisioning Suite™ is an all-in-one technology tool kit for categorizing, analyzing and retrieving information from large sets of electronic documents – and reducing the time, cost and effort of doing so by up to 93%.  The suite is delivered securely via SaaS, and has been used successfully in many languages for eDiscovery, risk management, information governance, compliance and other Big-Data issues.

The OrcaTec Document Decisioning Suite’s four fully integrated modules include:

  • OrcaCluster: Visual Based Clustering, Times, and Social Network Sonar.
  • OrcaSearch: Visual Based Concept Search Augmented by more than 25 Additional Search Types.
  • OrcaPredict: Court Approved Predictive Coding Technology.
  • OrcaReview: Issue and Privilege Tagging, Review and Redaction of Results, Review Analytics and Privilege Logs.

Available as a complete suite or on a per module basis for ease of integration with other industry platforms, the OrcaTec Document Decisioning Suite increases the effectiveness of compliance, investigation and litigation efforts with no incremental investment in hardware or personnel.

About OrcaTec

OrcaTec helps clients address and manage business and legal challenges associated with the discovery and management of unstructured data with advanced analytics and predictive coding technologies delivered in the form of products and services to law firms, corporations and governments. OrcaTec offers a complete suite of textual analytics tools including concept search, visual clustering and predictive coding as part of the OrcaTec Document Decisioning Suite(TM).  The suite provides legal professionals with an all-in-one  offering for the analysis and review phases of the electronic discovery process and includes OrcaPredict for predictive coding, early case assessment and first past review, OrcaSearch for concept searching, OrcaCluster for visual clustering and OrcaReview for second pass document review.

To learn more about OrcaTec and the OrcaTec Document Decisioning Suite, visit


Visit Us On TwitterVisit Us On LinkedinVisit Us On Google PlusCheck Our Feed