GPT-4 and the Future of Legal Drafting

Setting the scene

Let’s face it. Artificial Intelligence (AI) is not that great.

AI has been around for years, and it has consistently failed to live up to expectations. Despite all the hype, and despite the trillions of dollars/euros/pounds/… that have been poured into making the dream a reality, self-driving cars still don’t work as intended, we still can’t explain how it works (the “black box” problem), and it still struggles with anything it hasn’t seen a million times over.

The prophets of doom like to proclaim the end of the legal profession as we know it, and the replacement of lawyers by robots. In reality, AI is simply not yet good enough to replace lawyers, however long they may take to deliver legal services.

Or at least, that is what we used to say about AI.

Then, in November of 2022, the Large Language Model (LLM) ChatGPT burst onto the scene. The traditional sentiment around AI – “impressive but not revolutionary” – fractured then but didn’t break entirely.

A few months later, GPT-4 came out.

Now, lawyers are starting to worry.

So what changed?

One of the main reasons why, up to now, most legal professionals have not been particularly impressed by AI-driven legal tech products, has been the poor language understanding of those products.

The law is a profession of words, and pre-GPT AI products had a very hard time dealing with that. In fact, legal tech in many cases actively avoided language processing. The results simply weren’t there, or in more technical terms: the "signal-to-noise" ratio was not good enough.

Then GPT-3 came along, which was already promising, but still had a signal-to-noise ratio of about 6/10 – good enough to be impressive, but bad enough for critical lawyers to comfortably dismiss it.

GPT-4, however, blows GPT-3 out of the water with a signal-to-noise ratio of about 8/10 – still not perfect, but its language capabilities have begun to outclass the average lawyer.

How do LLMs work?

LLMs are similar to human brains. Like us, they are trained to learn associations between billions of words and are able to retain and recall them, in the same way our neurons fire when we think about something. Every time an LLM is provided with a prompt, it will search through all those associations and generate an outcome from scratch.

In order to come to this point, LLMs are trained for months, during which they essentially play billions of guessing games to predict the right word, by splitting itself in half. One part of the LLM will provide the task and the other part will try to predict the next word based on the associations it has learned.

It is a months-long training process of continuous trial and error, where if the LLM predicts correctly, it learns. If it is incorrect, it will continue the process of trial and error. As seen in the image above, the LLM will train until it is able to correctly piece together the words to form the following sentence: “Supplier will not be liable for any damages caused by errors during the term”.

This process requires thousands of computers being linked together and costs millions of dollars/euros/pounds/…, so only the biggest companies such as Microsoft and Google are capable of training LLMs.

Overall, in comparison to traditional databases, LLMs more closely resemble human brains. They do not store facts; they store associations and reconstruct facts from nearby associations.

It is similar to being asked “What did you eat for breakfast five days ago?” You wouldn’t be able to perfectly recall what you ate. Your brain will try to do so by asking questions such as: “When was five days ago? What was I doing that day?” It will try to reconstruct that day, but the answer will differ based on external factors when the question was asked. If it smells like pancakes, your brain might make a nearby association that you had eaten something sweet for breakfast five days ago.

Fun fact: another similarity between LLMs and most human brains is that LLMs have a very small short-term memory. They have this huge long-term memory where they store all those associations, but their short-term memory lasts about one minute on average. As a result, they do not remember previous outcomes they may have produced. Even ChatGPT, which gives you the impression of actually chatting with you over the span of perhaps several hours, does not remember what it has said to you. Once a prompt is given, it will have to review its previous prompts and answers in order to make an outcome based on a previous conversation you had with it. So, in a way, ChatGPT is more like a person with amnesia who has to write down everything that happens in small notebook (the chat log) because he will almost immediately forget it, even while you are still actively communicating with him.

Applying LLMs in legal drafting

Legal drafting, unsurprisingly, revolves in large part around tweaking and manipulating language – the thing that LLMs are particularly good at it. The possibilities to improve this process are therefore huge, but LLMs are not yet flawless. The trick is to allow them to work their magic on the language manipulation part where they are strongest, but leave the parts where they are weakest up to the lawyers.

One of those weaknesses is logical reasoning. Below is a thought experiment that clearly exemplifies this.

When given a fairly straightforward question, it applies convoluted logic to arrive at a clearly wrong answer. This is to be expected. Again – LLMs are simply playing a guessing game trying to put the right word in front of the other. They don’t understand the output they produce in the way humans do.

Another, closely-related, weakness is factual incorrectness or the tendency of LLMs to be error-prone. As GPT is constantly making associations, it does not understand on a fundamental level what it is producing, which may lead to errors.

Take the query below, where we leverage ClauseBuddy’s GPT-4 integration to produce a dispute resolution clause under Belgian law.

It does a lot of things right – it inserts a “cooling-off” period before the parties are able to litigate a dispute (unprompted, but useful), and it correctly assigns governing law and competent court.

It also messes up on a few important points – it identifies the language of the proceedings as English (which will almost never be the case under Belgian law due to the specific legislation Belgium has on that topic), and it includes an entire agreement clause for a dispute resolution clause. Especially the latter doesn’t make sense to a trained lawyer. An entire agreement clause is – quite literally – used for entire agreements, not for individual clauses.

The point of highlighting these flaws is not to dismiss AI as being useless, but to emphasize the necessity of being aware of its shortcomings so that we can focus on applying their strengths.

Here are a few rules of thumb to keep in mind:

Results depend on the requested task, the legal domain and the language.
Emphasis should be on language manipulation and less on factual or strategic knowledge.
English works better than any other language.
“Prompt drafting” is a very easy interface but not a particularly good user interface, because you have to constantly type in what needs to be done, instead of simply clicking a button. So LLMs work best when limited interactions are necessary.

Translating that to the legal drafting process, we can ascertain that GPT-4 can be used in the following ways (in descending order of effectiveness):

Pitch drafting
Email drafting, particularly when there is limited legal advice in said emails.
Guided clause drafting. Check out “FlexiClause” below, a way to provide the “ingredients” of a clause to GPT and let it cook.

Guided ad hoc contract drafting. Check out “Hybrid Drafting”, a new addition to ClauseBuddy that allows lawyers to draft entire contracts ad hoc by leveraging the best of their own clauses and machine-generated clauses.
Legal memo drafting. Here we bump into the most pervasive problem of LLMs: hallucinations, or the tendency to make up facts, as well as limited awareness of different legislation and jurisdictions.
Creating predictable documents. By default, GPT-4’s output will always be unpredictable. So, for example, if you are an in-house counsel drafting the same type of document for your organisation on a daily basis, you will want to make sure there is some consistency in these documents; GPT-4 cannot give you that.

Impact on the legal sector

If there is anything you take away from the above, let it be this: an LLM like GPT-4, applied in the right way, provides results that are about an 8/10. Not perfect, but with a production capability that is a hundred times faster and a thousand times cheaper than any lawyer.

Still, it will take time for this technology to trickle through to all the places where it could improve the status quo. Lawyers are risk-averse by nature, so you can assume there will be a lot of focus on what doesn’t work. Expect several national bar associations to object. Expect to see embarrassing errors made by prestigious law firms, caused by LLMs. Expect a period of “disillusionment”, as predicted by the Gartner hype cycle for new technologies.

And give it time. Change management is hard. IT departments need time to adapt. But the clock is ticking, and you probably have 2 years before the transformation of the legal industry is complete.

Below, we set out our expectations for the impact more concretely.

The baseline will move

Instead of asking yourself “what can LLMs do for lawyers?”, consider what it is already doing for clients.

LLMs may not be able to draft a perfect contract, but they are quickly approaching “good enough” territory and are able to do so in seconds, basically for free. Ask yourself: if your client believes they can get a 6/10 contract for free in a few seconds or they can go to you for a 9/10 contract (no lawyer is perfect) that will set them back a few thousand dollars/euros/pounds and takes a week to produce, which option do you think they will go for?

Ultimately, as these models get better and better, the main advantage lawyers will have is: (1) knowledge gained from prior experience in their domain of choice (typically wrapped up in an unorganised set of previously drafted contracts and redlines) and (2) the strategic expertise and knowledge of industry standard practice, as locked inside the heads of the individual lawyers.

Creation vs Production

Today, lawyers are paid by the hour and whenever legal drafting is involved, the majority of those hours go to the low-value work of wrangling a MS Word document into shape. The real value is the knowledge that is already in the lawyer’s head. Someone (typically a junior lawyer) just needs to get that knowledge into a document.

In 2019, legal visionary Jaap Bosman already laid out this dichotomy as the Creation vs Production divide in his book “Data & Dialogue”.

Creation is the exercise of legal creativity and expertise – coming up with strategic and commercially savvy solutions to legal problems. It is typically the domain of the experienced lawyer like a partner or a senior associate and it is what a client really pays for.

Production on the other hand, is everything that needs to happen around that. The kind of work that partners cannot do because their rates are too high; the kind of work that law firms retain armies of junior associates for.

Our prediction is that this dichotomy is only going to get more pronounced. Yes, the death of the billable hour has been predicted many times. But now that there’s a technology that’s really good at Production work, there’s a good chance everything will suddenly change. Charging for that kind of work will become difficult if it can be automated in such an easy way.

Impact on legal service delivery

Concretely, we see the impact on legal service delivery play out in three major areas:

Speed – clients will expect ever faster turnarounds. Law firms are competing with a tool that can produce what they produce in a lower quality, but in the space of minutes when it would take them hours or days. Yes, hopefully clients will realize the difference in quality. But they will also expect that you are using it yourself and will no longer accept the same turnaround time from the pre-GPT era.
Everything at the click of button – From experience in talking to hundreds of law firms and inhouse legal teams, we know that most legal teams simply don’t have good, readily useable templates (anonymised precedents don’t count). Due to the emerging competition with GPT-4, templates will suddenly be necessary if law firms want to be able to offer consistent, high-quality documents quickly. As a result, we also expect self-service platforms to take off in ways they haven’t in the past.
Cost – with the billable hour still being the dominant model for legal services pricing, expect the cost of document production go down. Traditional “bet-the-farm” work will likely be priced at a premium for a while still. But for non-critical work, the question can be asked if Production work will even be chargeable at all in the medium- to long-term.

Should I be concerned as a lawyer?

Now we arrive at the crucial lawyerly concern – should I be concerned?

To give a lawyerly answer – it depends.

If a significant part of your legal practice consists of drafting in a way in a way that adds little knowledge gained through personal experience or interpersonal learning (such as strategy, commercial awareness, industry insights, best practices, etc.) then yes, you should be worried.

Lawyers whose bread and butter is Production work that isn’t overly complicated never had to face a disruptor before and therefore weren’t concerned with their work potentially being automated.

That time has passed. The disruptor is here, and for these lawyers, it is time to consider how they will incorporate those uniquely human strengths into their service delivery.

Consumerisation

By now, you're probably thinking that times have already changed, so that the amount of Production work has already gone down significantly compared to fifteen years ago — clients don't want long memos anymore. Text drafting is only a part of the package; your legal service is so much more than that.

Or maybe you are thinking that the type of work you do, is not readily available in public texts available on the Internet (e.g., because you are located in a small jurisdiction or active in a very special legal subject matter), so that GPT-4 is simply not trained to compete with you.

Or maybe you are thinking that cool new technologies are periodically hyped, but have never had a real measurable impact on legal work. This was definitely the case with blockchain, so-called "smart contracts" and the Web 3.0 fad, and to a significant extent also with past applications for artificial intelligence.

Fair enough. But probably the even bigger threat from GPT-4 is not so much the direct impact on the legal drafting, but instead the indirect impact that it will have on young lawyers, clients and the value of texts in general.

Lawyers, particularly the younger generation, have been demanding more legal technology for years.

Now, with GPT-4, they will have the option to use a highly powerful drafting tool that they will also be using in their private time (to organise parties, create grocery lists, draft love letters or find birthday gifts), or that they even used draft their thesis/dissertation in law school. It is only natural that they will continue applying that technology to their professional legal work .

The knock-on effect will likely be a “consumerisation of legal work” or the way that lawyers deal with technology as consumers will drip into the way they work. Law firms may ban tools like GPT today (for very valid concerns), but the genie is out of the bottle, and its lawyers will find ways to work with this new technology, in much the same way as lawyers used to find ways to work with their Blackberries back when that technology was first banned in law firms.

Your clients will also participate in this consumerisation trend. Probably they will still see value in texts drafted by lawyers, but the value they attach to texts will go down significantly. When the cost to produce texts suddenly approximates zero, everyone will drown in texts, in all aspects of life. The value that an average citizen will attach to texts in just two or three years time, will go down significantly — similar to how the value attached to songs, TV-series and movies has gone down significantly with the abundance offered by Spotify and Netflix. Sure, clients will still want to pay for truly great texts that have to be produced in highly important cases, but the average value of good legal texts will go down immensely, and the number of situations where clients will still want to pay thousands of euros/dollars, will be narrowed down.

Impact on staffing

Another major impact is going to be the training process for young lawyers. Skills like contract drafting take years of trial and error to properly learn. How will the new generation acquire their skills? Quite frankly, we have no clue how this will evolve as we see law firms taking all kinds of different approaches. Some lean heavily into the technology, some ban it until an appropriate level of expertise is built up the old-fashioned way.

Finally, in just a few weeks, we’ve heard several law firm partners explain that, due to GPT-4, they are reducing the amount of new lawyers they hire this year. Yes, disruptive technology like this typically cuts jobs but creates more in return. The question remains, however, if that will happen for law firms as well and what those new jobs would look like.

Recommendations

To conclude, we set out the following recommendations to navigate the uncertainty of the coming months and years.

Of these recommendations, we consider the last one to be the most important for the following reasons:

Act now – change management is hard and non-believers are everywhere, especially in legal teams. Expect the work that is done today to only bear fruit within a year from now at the earliest. Implementing technology is one thing, but changing people and processes are entirely different challenges.
Work on your knowledge management – legal teams across the world are looking at how to leverage their own existing knowledge database and combine the factual knowledge there with the language capabilities of LLMs. We’ve written a deep dive on this topic but the summary can be seen below.
Start using LLMs today — Even though a few technological hurdles still exist today (see below) to combine your own legal knowledge with LLMs, don't wait until those hurdles are solved. LLMs are very capable of various types of legal work today, and even if you don't use them for explicit tasks today, get experience with them asap. New LLM-related technology is — literally — released every day, and the hurdles that still exist will be solved much sooner than the average legal team will have done its knowledge management cleanup.

Integrating your own knowledge into LLMs

It will take time for these kinds of applications of LLM-technology to become widespread (or even feasible for smaller organisations). So what can law firms do now, while they are still in the transitionary period? We see three key steps:

Start cleaning your data – collect your case law, doctrine, etc. and create your templates and clause libraries. Don’t rely on your unorganised digital drawer of previously drafted documents. Not only will the process of traditional drafting with this material remain too slow, combining it with LLMs will cause the typical “garbage in, garbage out” problem. It may seem like a paradox, but remember that LLMs are good with language, but bad with facts, so they are particularly affected by poor quality content.
Organise your cleaned content – every legal department should clean its legal content:
     - split memos into sections
      - label/tag documents & folders
      - add comments and hints
      - add proper layout to template or automate them

Begin experimenting with LLM-integration methods – here, we are still looking at a timeframe of 6-12 months until the technology spectrum has stabilised. After that, we expect knowledge database integration with LLMs to become more and more prevalent, likely through semantic search, finetuning or via plugin.