All posts tagged Predictive Coding

Ediscovery trends in 2017: from artificial intelligence to mobile data centres


2017 is set to be a year of change as organisations prepare for the new General Data Protection Regulation (GDPR) and the accelerated adoption of artificial intelligence. Faced with the need to manage greater volumes of data as well as multiplying communications channels, organisations and their legal representatives will be increasingly reliant on ediscovery technology processes to reduce the time needed to identify and manage information required to satisfy regulatory and legal issues.

Against this backdrop, we make the following predictions for 2017:

  1. Technology will play a vital role in helping organisations prepare for GDPR

The tough new General Data Protection Regulation currently being implemented in Europe will have a global impact. In cross-border litigation and investigations, where data needs to cross borders to comply with discovery requests, mobile discovery will become essential.  These solutions capture, process, filter and examine data on-site, avoiding the need to transfer data across borders. GDPR has strict rules for protecting individuals’ right to be forgotten and organisations will need the relevant tools to find and erase personal data. Breaches of some provisions by businesses, which law makers have deemed to be most important for data protection, could lead to fines of up to €20 million or 4% of global annual turnover for the preceding financial year, whichever is the greater, being levied by data watchdogs.

  1. Ediscovery will find new homes beyond regulation and legislation

While ediscovery is widely used by professionals working on legal cases in litigation, regulation, competition law and merger control, employment law and arbitration, it will be used more and more this year in an anticipatory manner by organisations to identify, isolate and address any concerns about compliance that could expose them to the risk of some kind of intervention or sanction.  This trend will be exacerbated by the introduction of an increasingly complex and aggressive regulatory environment, exemplified by the French Anti-Corruption laws adopted in November 2016.

  1. New sources of evidence will move into the spotlight

Enterprises are creating more data than ever before. Data can be found anywhere that there are storage devices to hold it, whether that is a data centre, laptop, mobile, on wearable devices or the Cloud. Channels to move data from one place to another are also proliferating. As a result we are seeing a diversification of evidence sources being used to build up a picture of what has happened in a legal matter. Whilst email and structured data remain the most common sources of evidence, other data sources such as social media, satellite navigation systems are gaining in importance and providing key insights into many cases. Clients are increasingly choosing ediscovery providers who can integrate a wider variety of data sources into one platform for analysis.

  1. The robots are coming.

Savvy law firms and corporate counsel will benefit from bringing the latest technologies including artificial intelligence (AI) to the attention of their clients. A long line of court decisions in the US, and now also in the UK and Ireland has already driven greater interest in and adoption of predictive coding.

  1. The ediscovery industry will continue to evolve

The past few years have seen huge changes in the ediscovery industry itself as it seeks to provide the technologies that organisations need to keep up with more stringent regulation in data governance. Only larger, international partners now have the resources and capabilities required to provide local services and data processing centres where organisations need them, together with cutting edge tools and technologies to manage huge volumes of data and channels moving forwards.

  1. Big data will take centre stage in competition and data privacy matters

Regulators are becoming increasingly aware of the competition and data privacy implications of big data. From a competition point of view, big data held by companies can trigger both Articles 101 (relating to antitrust cases) and 102 TFEU (abuse of dominance cases). This is highlighted by the joint report of May 2016 from the French and German Competition Authorities entitled Competition Law and Data which explains that big data can trigger article 101 TFEU and thus be considered a cartel. Companies that handle substantial data volumes on a day-to-day basis will need to factor it into their compliance strategies and embrace technological solutions to aid in investigations and redactions.

  1. There will be a greater need for electronic documents

Despite evidence becoming mostly electronic, until recently regulatory authorities still required the submission hard copies of RFI forms, merger filings and other investigatory materials. However, the introduction of the European Commission’s eQuestionnaire for merger control and antitrust cases means parties must now submit all information electronically.

In December 2016, the EC has also recently published guidelines entitled “Recommendations for the Use of Electronic Document Submissions in Antitrust and Cartel Case Proceedings”. It is important to note that the EC strongly encourages the use of electronic formats even for paper documents which means they have to be scanned and made readable.

Tim Philips, Managing Director at Kroll Ontrack, said: “Ediscovery continues to provide essential tools and technologies for all manner of legal matters and allows companies to efficiently navigate through this era of big data, regulatory scrutiny and more stringent data protection requirements. 2017 is set to be another landmark year in terms of the adoption of ediscovery technology and the evolution of ediscovery technology itself.”

Predictive coding: taking the world by storm!

First the United States, then Ireland and England, and now Australia.

On a day to day basis, many of Kroll Ontrack’s clients use predictive coding to speed up their review and find key documents quickly in investigations and disputes. Predictive coding is a machine learning technology used in document review exercises, which learns from the decisions made about documents and applies the learning to documents which have not yet been reviewed to suggest (or “predict”) which ones are most likely to be relevant.

This technology has been used in US litigation for a number of years and has been approved by the US courts for almost as long. Other common law countries are now following suit.  Although European companies have been using predictive coding for just as long as Americans, until recently we didn’t have court approval to confirm that litigants can cut their review sets for discovery or disclosure purposes by relying on predictive coding to say which documents are unlikely to yield anything of interest. I was excited when this changed in March 2015, when the High Court of Ireland provided the first European approval of predictive coding in Irish Bank Resolution Corporation Ltd & v Quinn.   I was sure that the UK would be quick to follow and approve the use of such technology, which it did in February this year when the England and Wales High Court in Pyrrho Investments Ltd v MWB Property Ltd.

When I wrote about the Australian jurisdiction in Kroll Ontrack’s recent New Frontiers in Ediscovery report, released in September this year, I said:

“Popular newcomers, such as predictive coding are not nearly as common in Australia as they are in other jurisdictions. However, since predictive coding has become to be more accepted and judicially approved in several jurisdictions over the last few years, including Ireland and the United Kingdom, Australia is likely to follow.”

It followed quicker than we could have expected. Earlier this month, the Supreme Court of the Australian State of Victoria, in the case of McConnell Dowell Constructors v Santam, used the above cases in other jurisdictions as persuasive authorities to approve the use of predictive coding to reduce the number of documents to be reviewed for discovery in a dispute. This was based on a recommendation from the appointed Special Referee rather than a motion by one of the parties.

We should expect to see further use of predictive coding in Australia, as an operating procedure for predictive coding has been integrated into a new Technology in Civil Litigation Practice Note SC Gen 5, which will come into effect in January 2017.

I’m proud that Supreme Court of Victoria (my own home jurisdiction) is leading the way in Australia and has delivered an early Christmas present to us legal technologists out there!

For a more detailed look at how predictive coding is used in practice, check out our recent video.

About Daniel Kavan

Daniel Kavan leads Kroll Ontrack’s Electronic Evidence Consultancy team in Europe. He and his team of experts advise lawyers and their clients on how to manage and analyse evidence from emails and other electronically-stored documents in legal matters including litigation, arbitration, internal audits and regulatory investigations.

A practical guide to predictive coding

Did you miss out on our practical predictive coding event? Not to worry! We’ve created a twenty minute tutorial video that will guide you through the basics of using predictive coding technology.

Presented by Kroll Ontrack’s predicitive coding gurus and using real life case studies as examples, you will learn how predictive coding technology works and how you can use predictive coding technology in your own cases.

We hope you enjoy the video and find it illuminating, but if you have any further questions please get in touch in the comments or by emailing

Practical Predictive Coding



Predictive coding: a little less conversation, a little more action                 

Predictive coding has been the hot topic of conversation for a while now. Both legal technology providers and industry thought leaders have waxed lyrical about its efficacy and this year marked the first time a UK court had approved the technology for use in a case. Yet despite this, one topic of conversation has remained untouched; how do you use the technology?

We decided to rectify this situation by hosting a unique seminar:-  Predictive Coding: Getting it Done. Held in the Museum of the Order of St John’s Chapter Hall, the seminar was led by Kroll Ontrack’s predictive coding experts Jim Sullivan and Leon Major. We were also delighted to welcome guest speakers Emily Maxwell of DLA Piper  and Ilaria de Lisa, Gleiss Lutz. As Kroll Ontrack clients, Emily and Ilaria were able to provide their unique insights into using predictive coding.

The seminar’s jam-packed agenda covered all the practical predictive coding basics including a breakdown of common terminology, an overview of the scenarios in which predictive coding can be used and, a step-by-step guide to using predictive coding using real life case studies as examples. Guests also had the opportunity to have their questions answered by our experts.

Following the presentation, guests gathered in the Museum’s medieval cloister gardens to enjoy a champagne reception and to make the most out of the unusually pleasant summer weather! Originally used by the Order of St John for growing medicinal herbs, the Cloister gardens is one of London’s hidden gems; a rose and lavender-scented oasis which proved to be the perfect location for relaxing after a very informative workshop.


UK High Court approves use of Predictive Coding in litigation

Last week legal technology providers in the UK had a lot to celebrate as the English High Court approved the use of predictive coding for disclosure in litigation.

The judgement, handed down by Master Matthews, gave official judicial authorisation for the use of predictive coding in High Court proceedings. Summing up his decision, Master Matthews stated that predictive coding is just as accurate, if not more so than a manual review using keyword searches. He also estimated that predictive coding would offer significant cost savings in this particular case and that the possible disclosure of over 3 million documents done via traditional manual review would be disproportionate and ‘unreasonable’.

To read the judgement in full, please click here.

How does predictive coding work?

Predictive coding is an advanced machine-learning technology which allows computers to predict how documents should be coded (i.e., should a document be tagged ‘responsive’ or ‘privileged’) based on decisions made by human subject matter experts. Put simply, an experienced lawyer trains the computer by coding a sample set of documents, and the computer then learns what to look for based on this training. In the context of edisclosure and other investigative exercises involving electronic evidence, this technology can find key documents faster and with fewer human reviewers, thereby saving on cost and review time.

Who uses predictive coding?

Other jurisdictions, such as the USA and Ireland, have led the way in giving judicial approval to predictive coding, and the UK judgement references these cases in detail. Despite these cases as well as the ever-increasing sophistication of the technology itself, the UK law community has been somewhat reluctant to make use of the technology, as explored in this study by Kroll Ontrack Legal Consultant and former litigation lawyer, Hitesh Chowdhry.

In Chowdhry’s white paper, ‘Rage Against the Machine; Attitudes to Predictive Coding Amongst UK Lawyers’, he notes that his study revealed that the main barriers to adopting predictive coding technology were:

  • Risk aversion and mistrust of the technology’s accuracy
  • Belief that predictive coding would have a negative effect on revenue
  • Satisfaction with existing methods and a belief that existing practices offered more accuracy than studies have suggested
  • Insufficient understanding and knowledge of the complex predictive coding process
  • Diffusion amongst professionals

The UK judgement counters much of the fears uncovered in Chowdhry’s study by stating that the technology is accurate and offers cost savings.

Predictive coding and the Civil Procedure Rules

As data volumes continue to grow and traditional manual reviews using keyword searches become less feasible, predictive coding may be the best path toward complying with the Civil Procedure Rules.

Jeff Shapiro, a lawyer who has written frequently on costs in edisclosure, offered this comment:  “The judgementapproving predictive coding for the disclosure of documents highlights the judiciary’s continued march to proportionate costs in litigation via application of the overriding objective. Review amounts to approximately 70% of total disclosure costs. With the ubiquity of electronic document creation and storage, litigators have an ever-increasing costs’ burden in order to fulfil their CPR disclosure obligations. The judiciary, recognising the realities of modern disclosure where millions upon millions of documents may need ‘to be considered for relevance and possible disclosure’, has proclaimed that predictive coding may be used as a substitute for manual review.”

The cost savings offered by predictive coding will undoubtedly be popular with clients and potentially will give a competitive edge in winning work.

We hope that this judgement will encourage more UK firms to take advantage of the benefits offered by predictive coding.

For more information about this technology, please click here.

Why Predictive Coding Technology should be used

I am celebrating the decision in Irish Bank Resolution Corporation Ltd & ors v Quinn & ors [2015] IEHC 175.  The use of this machine learning technology in discovery has been sanctioned in the US for some time. For the first time, a court closer to home has agreed to the validity of using this technology and the benefits being reaped from it.  The ruling addresses major concerns expressed about predictive coding and seeks to sway the sceptics. It unequivocally states that predictive coding will save time and money. The methodology underpinning the use of this technology has been declared sound, as has the benefits of using it.  This case is a landmark decision in Europe and the judgement tackles with ease the concerns often articulated about predictive Coding. It also provides a solid foundation for a protocol on the use of this kind of technology in the disclosure process and states with conviction that predictive coding will save time and money.

For a full analysis of the judgement and the implications please read the article I have written for the Litigation Futures website here:

Mergers & Acquisitions: Ediscovery takes centre stage

Ediscovery technology has a long association with litigation, so you may be forgiven for wondering about the link to mergers & acquisitions, traditionally the domain of corporate deal-makers.

However, as regulatory scrutiny has increased on a national and international level, more law firms and in-house counsel are using ediscovery technology to swiftly dispense with formal Requests for Information (RFIs).

At the same time, anything that threatens the successful closure of a deal, or the integration of merging businesses is something that is generally investigated using ediscovery and forensic procedures.

Our clients come to us for assistance with matters that stem from the M&A process. Pre- and post-merger audits and merger control RFIs from regulatory bodies such as the European Commission, the UK Competition and Markets Authority, the French Autorité de la concurrence, the German Bundeskartellamt as well as the US Department of Justice are at the top of the menu.

Using ediscovery to enhance due diligence

The time prior to a merger or acquisition deal being finalised is critical, and data from entities being merged or acquired must be assessed as part of due diligence duties. In the past, these reviews typically focused on data in the form of financial reports and accounts; legal documents such contracts and intellectual property; asset valuations and company policies.

However, in this digital age, examining surface level information may not be enough to confidently be sure the deal is not risky or that combining with another company will not create risks.

If the company being acquired operates within markets that have seen anti-competitive behaviour or in countries with a greater incidence of corruption and bribery, it may be prudent to conduct a broader investigation into the company’s activities by examining a selection of unstructured data in audits.

What is unstructured data and why is it important for mergers and acquisitions?

Unstructured data largely consists of personal correspondence in the form of emails, text messages, voice mails and web-based messaging systems such as WhatsApp. Within even a medium-sized organisation the amount of data generated by these applications is enormous. For a global firm, the volume of data is almost unimaginably large. Yet just a handful of incriminating emails containing evidence of cartel activities that have serious repercussions at a later date should a regulatory body decide to investigate concerns relating to dominance.

By the same token, structured data (which is normally transactional data, stored in tables to record things like customers, products, orders and payments) may also be examined to look for anomalies that might signal a compliance risk using specialised data analysis tools and visualisation software.

Intelligent review technology is aiding strategic decision-making

Ediscovery technology can make short work of huge data sets both collecting, filtering and analysing data to get to the key information as quickly as possible. Armed with potential risks or given a clean bill of health, informed decisions can be made surrounding the deal, which can then proceed in a compliant and timely manner.

If this kind of investigation has not been possible prior to the merger or a company has doubts about an entity it has acquired or merged with, clients also come to us for post-merger compliance investigations which vary in scope from the very focused to the very broad. Ediscovery technology can also assist on an operational level by harmonising data estates of the merged companies.

Taking the pain out of Phase Two Requests for Information

If the European Commission is worried about the possible effects of a merger on competition, it may conduct an in-depth analysis of the merger in the form of a Phase II Investigation.

This is involves a more extensive information gathering exercise, working to a strict time-table, similar to ediscovery in the US or edisclosure in the UK. Looking at the deal from a variety of angles, (e.g. whether the proposed merger would create a monopoly, whether it will impact on the supply chain or increase the likelihood of price-fixing cartels forming between competitors), Phase II Investigations can be data intensive exercises, needing ediscovery expertise to ease the deal through.

Ediscovery services can help ensure this process runs more efficiently for the parties involved by:

  • Assessing the likely complexity and cost of the data retrieval exercise, to support efforts to reduce the scope of an RFI.
  • Assisting internal IT teams in the collation and collection of the data requested
  • Ensuring this data is stored securely and processed quickly
  • Providing analytical tools to check documents are relevant to the request and do not fall under privilege
  • Working in a timely fashion to ensure the request for information deadline is met.

Phase II Investigations are often time pressured and delays can threaten the completion of a deal, so it is important to ensure that all teams are focused on the overall goal of the proposed merger.

Working with an ediscovery provider can expedite the submission of requested information, potentially speed up any decisions or remedies and get the deal through.

If you would like to find out more about how Kroll Ontrack can assist with mergers and acquisitions, please contact Rob Jones.

About Rob Jones

Robert Jones is the manager of Kroll Ontrack’s team of Legal Consultants in Continental Europe, the Middle East and Africa.

Predictive coding and Benedict Cumberbatch

Predictive Coding

Artificial Intelligence has clearly become a provocative topic in popular culture once again – you’ve only to watch ‘Her’ and ‘Transcendence’ to see that. However, the most recent movie to catch my eye is ‘The Imitation Game’, and not just because the lovely Benedict Cumberbatch has the starring role, but rather because Alan Turing is the central character.

Mr Turing is known as not only the grandfather of computers, but the grandfather of artificial intelligence. In his seminal paper he questioned: “Can machines think?” But what is “thinking”? What would it mean for a machine to “think”? He refined the question and looked instead to the Imitation Game (hence the title of the film…): A machine can be deemed to “think” if it is indistinguishable from a human in its answers to questions. This became fundamental in the philosophy of artificial intelligence.

This question was so over-whelming it became obvious that there would be no-one way to make this happen. And so several sub-fields began to develop in the field of artificial intelligence – data mining, image processing, natural language processing, speech recognition, machine learning etc. With recent web-developments, a culmination of all these techniques means that we are closer than ever before to the Holy Grail of imitation – you just need to watch IBM’s Watson on Jeopardy to see that.

But it’s one particular sub-field that is of interest to me: machine learning. This is the underlying technology used in Predictive Coding. It is for machine learning that the question of machine thinking is incredibly pertinent. To start at the beginning, not all predictive coding technologies were created equal. All the technologies use different algorithms, meaning that the approach to machine thinking is different, and ultimately has different results, to appropriately varying degrees of success.

When reduced to its core parts, predictive coding is a two-step process: First, the machine learns through human intervention. This means the human provides to the machine the criteria a document needs to conform to in order to be considered X. For this process, a small subset of the data corpus is used. Secondly, the machine thinks by applying that learning, and predicting whether unreviewed documents (the remaining data corpus, not used in the first step) meet that criteria.

The learning element is similar for all predictive coding technologies – a human must review documents to input the relevant criteria and teach the machine. Although there are differing schools of thought as to the best approach to this – automatic versus manual training, for example – the fundamentals are the same. A human must input good information for the machine to learn.

It is the thinking element that is the defining factor. How well can the machine think and how well do they play their ‘Imitation Game’? For effective thinking, we really want the machine to be able to ‘actively learn’. Active learning allows the machine to interactively query the user to obtain further information: It allows the machine to say “I’m confused about this shade of grey.”

Why are the shades of grey important? Well an algorithm that is powered by – perhaps an analytics engine – is something more akin to passive, rather than active learning. When documents are processed, an analytics engine will cluster them together based on items like topics. The human will teach the system, the machine will learn – but the thinking element is lacking. The machine will predict that if a document belonging to a particular cluster is X, then all documents in this cluster are X. This is fine if everything were black and white – but there are always shades of grey.

In a case involving baking – sponge cakes belong to one cluster and the human can state that sponge cakes are X. The human can also teach the machine that chocolate biscuits are Y. Then what of chocolate cakes? This is a shade of grey, that results in a cross over between criteria for X and Y. The machine cannot think past black and white to consider the cross over.

On the other hand, active learning can push documents that relate to chocolate cakes forward to the human for clarification. It is smart enough to think and understand the cross over in criterion that requires clarity. By being able to expressly ask for clarity, the machine is far better at joining the dots of the criteria, understanding the subtleties and is ultimately able to make better predictions. For that reason, it is far closer to being able to imitate the thought processes of humans and a step towards being able to think for itself.

So, when choosing predictive coding providers to support your legal document review, consider Alan Turing and ask the question: Can machines think?

Want to learn more? Katie Fitzgerald has a Predictive Coding webinar on Wednesday 3rd December – register here…

The future of ediscovery

The Future of Ediscovery

Last week saw the 9th Annual Information Governance & eDiscovery Summit take place at London’s Marriott Hotel, Grosvenor Square. The ever important issues surrounding information governance and ediscovery were thrashed out by industry leaders over two days.

A wide ranging panel of speakers led the summit from judges to authors, directors to barristers, and lawyers to company presidents, from a wide range of leading corporate and legal organisations. Day one’s keynote panel included senior representatives from three regulatory bodies discussing current enforcement priorities and expectations- Allison C. Stanton from the US Department of Justice, John David Kuchta, Federal Bureau of Investigations, and Keith Foggon of the Financial Conduct Authority. The increasingly important topic of information governance was discussed, argued and dissected throughout the majority of day one, split up by a series of afternoon breakout sessions.

Day two saw the attention shift to the world of ediscovery and current issues engulfing the industry. Technology visionary, Richard Susskind, kicked off the morning with an insight into how technology has redefined the in-house legal process, which led perfectly into Kroll Ontrack’s main stage appearance. Aiming to do things differently, Kroll Ontrack opted for a 15 minute power-play, exploring the past, present and future of ediscovery, specifically discussing the hot topic of predictive coding technology. Leon Major offered an insight into how ediscovery in Europe has changed over last five years, Katie Fitzgerald spoke about how predictive coding is making a massive impact in European legal practice, and Daniel Kavan predicted how ediscovery may be done very differently within the next few years, including ediscovery in the cloud, voice control and app access.

Throughout the event we had lots of interesting conversations, met many new people and reconnected with a few old friends. Kroll Ontrack’s return as one of the event’s sponsors aligned with our recent launch of www. our new website and a new suite of products including Review, our new document review platform. Our (rather bright) electronic exhibition stand allowed delegates an insight into our recent developments and the chance to meet some of our experts.

The Future of Ediscovery

You can find out more about predictive coding, our new document review platform or anything else we’ve been up to at Kroll Ontrack by visiting or following our Twitter feed.

About Anthony Roberts

As a Legal Consultant at Kroll Ontrack, I work with a number of leading law firms and corporate clients advising how best they can achieve their electronic discovery and computer forensic goals, within set time frames and to budget, by evaluating the best available technologies and ensuring the right solution is found for each potential matter.

Kroll Ontrack leads discussion of UK litigators

On Monday Kroll Ontrack sponsored the annual conference of the Commercial Litigation Association (CLAN). The day saw commercial litigators from around England come together in the auditorium of Hogan Lovells’ London office to discuss the latest issues in commercial litigation, share knowledge and hear from subject matter experts in various fields.

The day kicked off with the Honourable Sir Terence Etherton, Chancellor of the High Court updating litigators on developments in the Chancery Division of the High Court. His keynote speech is summarised by John Hyde on the Law Society Gazette website here.

Discussions progressed to cover the latest updates and topics such as cost budgeting and litigation funding were, as expected, at the forefront of litigators’ minds.

Kroll Ontrack’s Daniel Kavan presented an interactive session with Hogan Lovells partner Neil Mirchandani, bringing litigators up to date on the latest best practices and technologies to manage their litigation cases using principles of legal project management.

Using live feedback technology, the audience was able to anonymously vote on various issues in this session. When Daniel asked litigators on the use of predictive coding in the UK, results showed that lawyers were still finding their way:


Daniel shared information about how the technology works and how it can be applied, and most litigators fed back that they would consider using the technology in their next case. Further information about Kroll Ontrack’s predictive coding technology can be found here.

The day ended with giving litigators in-house counsel’s perspective on using social media in edisclosure, followed by drinks at which lawyers were able to ask Kroll Ontrack representatives more about how predictive coding is changing the face of litigation and investigations in the UK and across Europe.

You can follow further information from Kroll Ontrack about recent trends and see reports from industry conferences this week including the Information Governance & Ediscovery Summit on our Twitter feed.

About Graham Jackson

As a Legal Consultant at Kroll Ontrack, I promote our computer forensic and ediscovery services to both corporate companies and law firms. This is to support any form of their electronic evidence needs, whether that is advising our clients to help prepare in advance of an electronic incident occurring, a real time incident such as data theft, or advise on the best course of action in dealing with post incident response to better protect against future occurrence.