All posts tagged TAR

Predictive Coding – How it’s Bringing Innovation to Legal Practice

Kroll Ontrack was proud to host a breakfast seminar last week on Predictive Coding and how it’s bringing innovation to legal practice.

Over 90 legal professionals from law firms and corporations across the UK gathered at the early hour of 08:00 for a light (and far too healthy!) breakfast and to hear our very special guest speakers: Ralph Losey, a partner in the Orlando office of Jackson Lewis LLP and serving as the firm’s National e-Discovery Counsel, who had flown in especially from Florida for the event and Neil Mirchandani, a partner at Hogan Lovells in London specialising in financial services disputes.

Daniel Kavan of Kroll Ontrack helped moderate the session for what turned out to be a very interactive debate with members of the audience.

We found that there are a lot of terms used for describing predictive coding technology, such as “technology assisted review,” “computer-assisted review,” “computer-aided review,” and “content based advanced analytics.” Ralph helpfully pointed us to a very useful definition of predictive coding by The Grossman-Cormack Glossary of Technology Assisted Review, written by Maura R. Grossman and Gordon V. Cormack:

An industry-specific term generally used to describe a Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s)’ Coding of a Training Set of Documents. See Supervised Learning and Active Learning.

Ralph Losey, who is seen as leading global expert on predictive coding, opened the session by providing a very helpful summary of what predictive coding is and how he has seen it applied. Neil Mirchandani was then on hand to provide a UK perspective and outline his experiences of the use of the technology.

I have highlighted some of the key points that were raised:

  • We heard some interesting and surprising stats as to the consistency of a human review, for instance, studies undertaken have shown that one reviewer has a consistency of 77% when reviewing documents, this figure drops to 45% when there are two reviewers and plummets down to 30% when there are three or more reviewers. So perhaps a human review should not be seen as the gold standard for completing review exercises.
  • Predictive coding is not a substitute for a human review and should be seen as a supplement. Predictive coding is very reliant on the input of subject matter experts (SME) via the review of a sample set of documents to “train“ the system and for this to be an on-going and iterative process as the system evolves.
  • There were some lively debates as to whether the initial training should be completed by one SME or a few.
  • Predictive coding can be utilised as an invaluable quality assurance mechanism for a human review and even if predictive coding is used for tagging, any documents that are deemed to be relevant can still be reviewed by a human team.
  • A number of audience members had queries about whether it has been challenging to reach agreement with the other side if this this type of technology were to be used. The consensus from the panel seemed to say that that it would be difficult to try and “force” another party in litigation to deploy this technology, but it would be very unlikely (and difficult) that an opposing party could object to the technology being used.
  • The technology should not be viewed as exclusive to large and litigious cases – there were some great examples of the technology being deployed, and successful, in internal investigation and regulatory exercises and for cases consisting of say 40,000 documents.

There were many other very useful insights to come out of the workshop, but unfortunately there isn’t space to fully cover it on this blog. If this topic is of interest, you will certainly find Ralph Losey’s blog to be helpful, as it goes into full detail about his various studies.

Feel free to get in touch if you would like to have a chat about the application of this technology in more depth.

About Costa Kypre

Having worked in the litigation support industry for over nine years, Costa has a vast amount of experience managing and consulting on a range of electronic disclosure projects, including high profile and complex multi jurisdictional matters, often involving a large number of parties. Costa is legally trained and has a (BA) Hons. Double Major in Law and Economics and completed the Legal Practice Course at the College of Law.

Back to Basics – Proper Planning

A trawl of the various blogs and articles on eDisclosure finds plenty of articles on predictive coding, Technology Assisted Review (TAR), big data, analytics, the Jackson Reforms and cost budgeting.  Indeed, even our own blog to date has focused a great deal on these issues, as the tags on the left show.  All of these topics are essential reading for anyone involved in eDisclosure, but they all assume one thing – everyone knows the basics.  No doubt all of our readers are fully aware of the new rules regarding the submission of budgets.  Anyone who is following the Plebgate saga cannot fail to be aware of Andrew Mitchell’s predicament due to his budget not being submitted at least seven days ahead of the CMC.  As a consequence, the court said Mr Mitchell “would be limited to a budget consisting of the applicable court fees for his claim”.  The judge also went on to say:

“Budgeting is something which all solicitors by now ought to know is intended to be integral to the process from the start, and it ought not to be especially onerous to prepare a final budget for a CMC even at relatively short notice if proper planning has been done.”

From our perspective, the key words here are “proper planning”.  One of the most costly aspects of litigation is the actual review of the documents due to the hours that this can potentially take.  But if you are inexperienced at eDisclosure, or don’t know your megabytes from your gigabytes, or both, where do you start?  Hopefully here.

The first thing to think about when your client rings is where to find the information relevant to the case.  The answer to that question will lie with your clients, or if you work for a corporation, with key personnel in IT and management.  The Electronic Documents Questionnaire contained within the Schedule of Practice Direction 31B is a useful template (, but here are the key questions that will help us to help you:

  • How many individuals are potentially involved?
    • Individuals are referred to as custodians.
  • Where is the relevant data for these custodians stored?
    • Their data may be on multiple sources, e.g.:
      • Desktop computer
      • Laptop computer
      • External device
      • Smart phone
      • Server
      • Backup tapes
  • Is it necessary to collect all the data from all the sources to avoid the possibility of having to return, thus incurring additional costs?
  • How much data might there be?
    • This is very important as it will eventually help determine the number of potential documents for review.
    • The unit used for data in these circumstances is a Gigabyte (
  • What type of data is there?
    • What type of email does your client use, e.g. Microsoft Outlook, Lotus Notes?
    • Any databases or proprietary software?
    • Any messaging data, e.g. Bloomberg Messaging?
    • Any audio data?
  • What languages are contained within the data?
    • Do you have reviewers with the necessary language skills?
    • Is machine translation, whereby your review platform carries out a basic translation, appropriate for your initial review?
  • Who should collect the data and how should it be collected?
    • Where is the data geographically?
    • Do you require an independent third party to collect the data in a defensibly sound manner?
  • What are the data privacy implications, if any?

Whilst these questions are not exhaustive, if you have thought about them, you will be in a position to start your conversation with your eDisclosure providers.  Ideally, relationships ought already to have been built up with technology experts as in most cases there will be little time to conduct a “beauty parade”.

We can help you collect the information you need.  Together we can then begin to plan how you are going to retrieve the data, how long that may take, and what the costs may be.  You will also need to start thinking about the actual data: what happens when it is processed before review, how can you reduce the volume of data to review, and what technology do you want to use to help you as it is likely that some sort of data filtering technology and review platform is going to be required.

These topics will be covered in the next Back to Basics post.

Next week, Rob Jones will be writing a blog post on what you need to know about Technology Assisted Review (TAR). You can see a preview below.

Reporting on Change

ReInvent Law London

My colleague Rob Jones delivered a six minute talk at Re-invent Law London, a novel crowd-sourced conference which took place in London on Friday 14 June.    His presentation “Wax Up, Not Wipe Out!”  was about seismic changes taking place in the legal profession.  According to Rob, “Change brought by technology is a wave and lawyers are like surfers out in the open waters. ‘Wiping Out’ (to fail) is an ever present risk that can lead to embarrassment or worse. It is better to ‘wax up’ the board and tackle the waves with enthusiasm and a little intelligence, to make sure that you stay on top of them”.    Rob’s talk was videoed and we will post a link to it soon.  For now suffice it to say that he looked into the crystal ball at a world driven by technology where justice will perhaps be obtained from the cloud through an app available on your tablet. He also looked at the tsunami of information surrounding us and how to extract meaning from it in legal disputes using new technologies like Technology Assisted Review.  Recognising that justice comes at a cost and the legal system is creaking and groaning under its own weight, Rob spoke about smart computers rescuing the situation by allowing leanness, efficiency and case winning power to enter the legal process. Referring to our experience using TAR on over 250 projects, Rob said that the computer is already looking over the shoulders of humans to build intelligence and suggested that it may not be long before similar algorithms are used to create a legal super brain that can predict outcomes, forecast fees and aid strategic decision making which could turn human lawyers into formidable competitors and opponents.

If you would like to read more about the event you can use the following hashtag on Twitter #ReInventLaw.

#ReInvent Law London

We found it to be a very refreshing conference with stimulating content and very high calibre of speakers on law and technology disciplines.

The Not So New Rules of Court

In the two months since the changes to the rules of court governing disclosure and cost management in litigation there have been no reported cases and very few anecdotal reports about how the new rules are affecting cases.  At this stage it seems that there are still more questions than answers about how cost management, proportionality and tailor-made disclosure will play out in practice.  We have been tracking the changes closely and have hosted two seminars on the impact of the new rules for the legal community in London and Manchester.  These enlightening panel discussions have involved members of the judiciary, experts from legal practice, and providers of disclosure-related services.  We have prepared a detailed note The Jackson Reforms on Disclosure and Costs Management:  FAQ, on some of the key questions lawyers are asking about the new rules along with insights we have gained about them.

As Mark Surguy, a partner at Eversheds and a respected voice on edisclosure recently pointed out, the weighing up of options, solutions and costs is a best practice approach to any dispute.  With that approach in mind, the new reforms should not present any client, lawyer or technology service provider with any difficulty.  Judge Waksman echoed this sentiment when he said in Manchester that there is no need for litigators to be afraid of the changes.  By far the best tactic according to Mark will be to get to the heart of a case quickly, using technology so that the client can understand the prospects of success and make the right decisions about settlement or further investment in the litigation.

On a Company Note

On the topic of change, we have seen some ourselves recently at Kroll Ontrack. Tim Phillips, has been appointed as the new Managing Director for our Legal Technologies business. Tim has been with us at Kroll Ontrack since 2007, serving as Sales Director for the European Region.  As MD, he will have responsibility for operations and business development throughout EMEA, reporting to Dean Hager, president and CEO of Kroll Ontrack.  Tim says, “I’m delighted to take on this new role. In the EMEA region, we are focused on steady growth and development geographically and in terms of new products and services to specifically address data privacy requirements.  Our vision is to leverage the extensive European footprint we have through our existing facilities to provide a full suite of electronic evidence handling software and services to our clients across the region. Our focus in EMEA integrates well with Kroll Ontrack’s broader strategy to help companies manage edisclosure strategically by making it a repeatable process that is managed at a portfolio level, not just at the one-off project level.”

About Tracey Stretton

Tracey Stretton is a legal Consultant at Kroll Ontrack in the UK. Her role is to advise lawyers and their clients on the use of technology in legal practice. Her experience in legal technologies has evolved from exposure to its use as a lawyer and consultant on a large number of cases in a variety of international jurisdictions.

Limber up for the Big Data Marathon

The Data Craze for Sports Fanatics and Lawyers

One of my colleagues has just run the Reading Half Marathon and I am expecting any minute to see his race stats published on Facebook.   Well done Rob Jones, a GPS time of 2:21:19.  Budding athletes and intrepid cyclists are downloading various  apps to their phones (like Endomondo Sports Tracker or, relying on the information they gather to track distance travelled, time taken and  energy expended and using this to not only subtly show-off on social networking sites but also to plot and plan their race strategies. Of course, a positive spin-off is that the rest of us, having shared their pain and gain, feel inspired to do something similar and before you know it the data craze has turned into a sports craze and a new way of doing things. This phenomenon highlights how data can be transformed into intelligence, can inform decision making and strategy and possibly even have an unintended impact.  It got me thinking again about the influence that big data and predictive analytics is having on business and on the legal profession and how edisclosure fits into the picture.

Big data in business

Initially it was only big companies like telecommunications companies, banks and government agencies that could afford to store and analyse big data.  Thanks to advancements in hardware and databases you no longer need supercomputers to carry out complex analytics across large data sets.  Many businesses are finding that for a reasonable investment they can collect data and make it relevant to their business; by measuring consumer behaviour and using pattern detection they can respond to customer needs and market conditions and make data-driven decisions.   Supermarkets, healthcare providers, gaming companies, insurance companies and even florists are jumping on the bandwagon and tapping into the intelligence running through the big data stream and finding ways to monetise the data they hold.

But (and it’s a big but) what about law firms? 

Can lawyers, who have tended to shy away from technological innovation really harness big data to predict case outcomes and legal costs?   We know that big data can be exploited to predict the outbreak of diseases, but can it be used to predict the outcome of a litigation case?  In an interesting article by Mike Wheatley on Silicone Angle it appears that databases of legal history are being built up and algorithms are being developed to help predict case outcomes.  Apparently, companies are also developing mobile apps that predict the average legal cost of different types of cases in the US.

As we enter a new era of cost management in the UK and the need to stick to case budgets becomes more important, we will need all the help we can get to estimate costs and guess what impact variables like the number of witnesses or extent of disclosure might have, not only on costs, but also on the outcome of a case.  Of course the data that needs to be collected, analysed and correlated to make sensible predictions includes not just the key features and facts of the case itself but also the results recorded in subsequent court decisions.   When it comes to costs, law firms and e-disclosure providers are all holding a lot of valuable billing data that could be analysed to assist with cost estimating.   This might all be feasible but has not yet been done.

On the edisclosure front, data analytics has been used for some time.  We have had email analytic tools that can be used to visualize who has been communicating with whom, when and about what.  Similarly, Technology Assisted Review (TAR) (also known as Computer Assisted Review or Predictive Coding)  analyses decisions made by humans on a sub-set of documents, and then look for similar patterns in a much larger document universe to predict which documents are relevant to a case and top priority.    At this stage most of us know about TAR and some are testing the water. Here are some tips on analytics from the sports scene:

Sports analytics and the CIO: Five lessons from the sports data craze

Collect the right data to start with, both qualitatively and quantitatively.  In edisclosure this means targeting the right sources of data and is an area where experts can help.  Is it better to present a raw unfiltered set of data (to teach the system in a balanced way) or a set of results based on a carefully crafted search, or is that somewhat prejudicial. Until there are better statistics and more guidelines from real cases, the ultimate decision is likely to be a strategic one.

Start with statistically significant data.  This refers to the selection of your seed set of documents that will be reviewed by humans and used to train the prediction software.   You cannot expect the software to achieve peak performance on 1,000 documents.

Remember that the ability to contextualise data is important.  There are incalculable factors that come into play with prediction and this is where human quality control is vital.

Perhaps, as we use these predictive tools more in legal cases and share our practical experiences and results, their use will become widespread and a status symbol just like Nike + is.

About Tracey Stretton

Tracey Stretton is a legal Consultant at Kroll Ontrack in the UK. Her role is to advise lawyers and their clients on the use of technology in legal practice. Her experience in legal technologies has evolved from exposure to its use as a lawyer and consultant on a large number of cases in a variety of international jurisdictions.