By Herbert L. Roitblat, Ph.D.
This is the first part of a two-part series discussing Tracy Greer’s article on “Technology-Assisted Review and Other Discovery Initiatives at the Antitrust Division.” Tracy Greer is the US Department of Justice, Antitrust Division’s Senior Litigation Counsel for Electronic Discovery. This part focuses on some of her recommendations for responding to Second Requests. The second part concerns a response to issues raised by some critics concerning statistical analysis, confidence levels, and confidence intervals.
Anyone planning on having any matter before the Division would be well served to read her article. She presents a thoughtful analysis of some of the differences between Second Requests and litigation and raises some questions about the effective use of technology in providing data to the Antitrust Division.
While acknowledging some of the limitations of predictive coding (TAR; Technology Assisted Review; or my preference, CAR; Computer Assisted Review), she also highlights some of the main values derived from its use. In her experience, she notes that TAR/CAR “produced smaller, more responsive document productions.” She notes correctly, that predictive coding is effective only with text documents, not images, videos, or drawings. Although her experience is that websites, intranets, and other information sources not directly associated with a custodian can be challenging, there is nothing inherent in these types of files that would prevent predictive coding from working well. At OrcaTec, we have had good experience with these file types and with spreadsheets. For example, with spreadsheets, the system focuses on the row and column headings because the numbers are usually too highly variable both within and between sheets.
Although some parties prefer to use more traditional review methods, the Division does not encourage it, she notes. Predictive coding is preferred because the “judgments about responsiveness during manual review are less accurate and almost certainly are not consistent among reviewers.”
She also recommends against the use of keywords to cull the documents prior to predictive coding, because their use “has the potential to exclude many responsive documents from the collection and , thus , to render ineffective the TAR platform.” Instead, she recommends a more objective approach to culling the collection, for example, limiting the set to certain date ranges, or eliminating emails from or to certain domains (e.g., to/from Amazon or Travelocity).
She recognizes that the process of conducting predictive coding is important beyond the technology that is used. How the training or seed set of documents is derived and by whom can be critical for determining the success.
Using the predictive coding platform’s measurement tools is also important. In my mind, this is the need for transparency in the process as well as an ongoing measure of progress. Without progress monitoring, it is impossible to know if time is being wasted on ineffective training or to judge whether training is sufficiently complete. Progress can be measured using an “overturn rate,” Precision and Recall, or with other measures.
In addition to predictive coding, the scope of merger investigations can also be extraordinarily broad. The Division staff need to explore complex theories and facts, and examine and evaluate hypotheses. Such an examination would appear to benefit from sophisticated analytic tools that do more than predict whether a document is responsive or not. The over-arching goal of a Second Request is to evaluate whether a merger will result in anti-trust issues, which is more general than the typical investigation in litigation. Systems like OrcaTec’s Document Decisioning Suite™ provide detailed analytics including predictive coding.
The second part of this series will consider more detailed analysis of selecting a confidence interval and its implications for eDiscovery and for Second Requests.
The opinions expressed in this paper are my own or my interpretations of Ms. Greer’s. They do not represent directly her opinion or any opinion of the US Department of Justice.
Herbert L. Roitblat, Ph.D. is Chief Scientist, Chief Technology Officer, OrcaTec LLC, Co-founder of OrcaTec LLC (CA). Before starting OrcaTec, Dr. Roitblat was Chief Scientist and a co-founder of DolphinSearch, as well as an award-winning Professor of psychology at the University of Hawaii. He has been awarded four patents on conceptual search technologies. Dr. Roitblat is widely recognized as an expert in search and retrieval technology, particularly in the area of eDiscovery. Dr. Roitblat, was the technology expert in the recent Global Aerospace case, in which the court approved the use of predictive coding over the objections of opposing counsel.
OrcaTec helps clients address and manage business and legal challenges associated with the discovery and management of unstructured data with advanced analytics and predictive coding technologies delivered in the form of products and services to law firms, corporations and governments. OrcaTec offers a complete suite of textual analytics tools including concept search, visual clustering and predictive coding as part of the OrcaTec Document Decisioning Suite™. The suite provides legal professionals with an all-in-one offering for the analysis and review phases of the electronic discovery process and includes OrcaPredict for predictive coding, early case assessment and first pass document review, OrcaSearch for concept searching, OrcaCluster for visual clustering and OrcaReview for second pass document review.
To learn more about OrcaTec and the OrcaTec Document Decisioning Suite, visit OrcaTec.com.
Security Update: “OrcaTec is pleased to let our user community know our systems are NOT impacted by the widely reported OpenSSL Heartbleed bug. We take the security of our systems very seriously and validated this as soon as the news started to surface earlier this week. If you have any questions, please contact COO Quin Gregor at 1.888.335.2200, Ext 228.”