Keeping pace with generative AI 11 min read
Since the initial launch of ChatGPT in December 2022, there have been exponential developments in the generative AI market. Depending on who you ask, the generative AI market is projected to grow to US$20.6 billion by 2032, US$51.8 billion by 2028, US$109.37 billion by 2030 or US$200.73 billion by 2032. There has been a flurry of activity as companies have rushed in recent months to publicise their generative AI credentials, there have been calls for a pause in further development until the full extent of the impact can be understood and mitigated, and AI-specific regulation is now being hurriedly developed around the globe.
With these rapid developments, it is proving increasingly difficult to stay across the full range of implications for the use and regulation of generative AI tools. There are many important questions. What is generative AI, and how is it developing? What is it capable of? What are the emerging legal challenges? How is it currently regulated and what new regulation is around the corner? What legal and governance considerations are required for a developer or user of these technologies?
We hope to help answer these questions in a series of Insights. This first Insight is intended to provide an update on the developments in generative AI explaining:
- what is generative AI relative to other artificial intelligence technologies;
- recent developments in generative AI, including regulatory developments and legal actions against operators of LLMs; and
- key and emerging legal considerations of which both developers and customers of generative AI solutions should be aware.
Our next Insights will cover best practice corporate governance for AI solutions and the new legislative and regulatory developments around the globe.
Key takeaways
- The term 'generative AI' refers to AI systems that can generate new content when prompted by an external source. This includes new text, images, software code, music and sounds.
- Following in the footsteps of OpenAI's ChatGPT and the GPT4 LLM, a number of tech firms are bringing generative AI solutions based on large language models (LLMs) to market. Some of these solutions have the capacity to operate in an increasingly autonomous way. At the same time, there has been growing debate regarding the safety and ethics of generative AI, and the need for urgent regulation and governance of their use.
- Various privacy, intellectual property, defamation and ESG risks have emerged as key legal considerations for both developers and customers of generative AI tools.
- A number of privacy investigations have been launched by data protection authorities in the EU, following action taken by Garante, the Italian data protection authority.
- The EU is readying a new 'AI Act' to regulate AI solutions.
- In addition, there are increasing calls for the operators of LLMs to make royalty payments to copyright owners for the use of material in training their models. This could include publishers of news content.
- However, the benefits of generative AI systems are likely to be transformative across multiple sectors. The risks of generative AI should be capable of being appropriately managed by users through a combination of targeted due diligence on vendors of solutions, contractual protections, software safeguards and well-considered internal governance controls.
Who in your organisation needs to know about this?
Legal teams, IT personnel, innovation and procurement teams.
What is generative AI?
Generative AI is a category of artificial intelligence systems that can generate new data and content when prompted by an external source. This distinguishes it from traditional AI systems that are designed to recognise patterns in, classify and/or make predictions based on existing data. The types of content that generative AI can create include text (eg ChatGPT and Bard), images (eg Dall-E and Midjourney), software code (Co-Pilot), music and sounds (eg Jukebox, Music LM, Aimi and VALL-E), and much more.
These generative AI tools are powered by LLMs. These LLMs are trained on enormous amounts of data scraped from sources such as books, articles, images, videos, web pages and code depositories, using neural networks and deep learning techniques to generate the desired content type. To give an idea of the size of these LLMs, GPT-3, the LLM that originally powered ChatGPT, has around 175 billion parameters and access to 300 billion words. GPT-4, the latest LLM from Open AI (the developer of ChatGPT), is rumoured to have up to 1 trillion parameters.
Recent developments in generative AI
Since our last Insight, there has been a whirlwind of activity in the generative AI space. To assist in keeping on top of it all, here are some of the key developments.
Market developments
Since the release of ChatGPT in December 2022, a number of large tech firms have hurried to market their own competing generative AI tools and LLMs. Some prominent examples are:
- On 6 February 2023, Google unveiled Bard – a generative AI chat service powered by its Language Model for Dialogue Application. The announcement of Bard did not go smoothly. A public demo designed to showcase its capability contained a factual error, or 'hallucination', which subsequently led to a significant drop in Google's shares the very same day. On 10 May 2023, Google released its new LLM, PaLM 2, which purportedly has improved multilingual, reasoning and coding capabilities.
- On 24 February 2023, Meta, Facebook's parent company, released LLaMA (Large Language Model Meta AI), an open-source LLM designed to 'further democratise access' to generative AI and spur research into its problems. Although originally released purely for researchers, the LLM was subsequently leaked to the public via the forum site 4chan.
- On 28 March 2023, NVIDIA released its AI Foundations generative AI service, designed to allow users to customise generative AI to meet their specific business, research or artistic needs.
A number of developers have also adopted GPT-4 to enhance their customer offerings:
- Microsoft has incorporated GPT-4 into its Bing Chat AI and Azure Open AI service.
- Duolingo, the language learning app, has integrated GPT-4 into its new premium subscription, 'Duolingo Max'. Users are able to converse in their chosen language with an AI chatbot, which can provided targeted suggestions on how a specific user can improve their learning.
- Stripe, the payments fintech startup, is now using GPT-4 to assist clients to analyse and explain data generated from use of Stripe's platform, as well as help answer user questions related to Stripe's technical documentation.
Autonomous self-development using AI 'agents' and steps towards artificial general intelligence
In recent weeks, developers have been creating and experimenting with autonomous AI agents to create a systematic series of tasks to be worked on by one or more LLMs, until a pre-instructed 'goal' or end-point has been achieved. This means that instead of a human providing a series of connected or cascading prompts to a generative AI tool in order to get to an end-point, an AI agent can operate autonomously to achieve the same result. This is achieved by using software loops and functions to guide the LLM to provide a full answer.
Also referred to as 'recursive' agents, this development has the potential to further reduce the degree of human interaction with generative AI tools until an end outcome is achieved. It could potentially allow AI agents to create new sub-tasks, and even additional agents, by automatically generating and executing code. Examples of these sorts of 'agents' are BabyAGI and Auto-GPT.
Increasing concerns about the safety of AI
With the rise in prominence of publicly accessible generative AI tools, and the potential for greater autonomy in the operation of AI tools, discourse surrounding the safety of AI technology has come to the fore, particularly among tech industry figures. Eg:
- In March of this year, the Future for Life Institute, a non-profit organisation concerned with the reduction and mitigation of catastrophic and existential risks to humanity, published an open letter calling for a moratorium on the training of powerful AI systems. This letter was signed by prominent tech personalities including Apple co-founder Steve Wozniak, Skype co-founder Jaan Tallinn, and Tesla and Twitter owner Elon Musk (who was also a founder of OpenAI). Mr Musk has subsequently announced his own generative AI initiative, TruthGPT.
- On 2 May 2023, Geoffery Hinton, the so-called 'Godfather of AI', left his position at Google, in part to enable him to speak more freely about the perils of AI technology. In an interview with the BBC, Mr Hinton said that the danger of AI chatbots was 'quite scary' and that the bots '(are) not more intelligent than us, as far as I can tell. But I think they soon may be'. Mr Hinton's comments are perhaps linked to his experience of the 'emergent properties' of AI, such as how AI programs inexplicably teach themselves skills for which they were not designed. In April 2023, Google disclosed in a US 60 Minutes report that one of its Google AI programs could converse in fluent Bengali despite having had no training in that language. In that report, Sundar Pichai, Google's CEO, admitted that even Google did not understand how the AI had managed to teach itself these new skills.
Increasing regulatory scrutiny
Regulators and governments have also increasingly turned their attention to AI, partly in response to the wave of public concern about the potentially detrimental impacts.
- In April 2023, Italy's privacy regulator, Garante, imposed a temporary block on ChatGPT being accessible within Italy, and a temporary ban on OpenAI being able to process the personal information of Italian residents. Garante accused OpenAI of, among other things:
- lacking a legal basis to justify its massive collection and storage of personal data to train its LLMs; and
- failing to check the age of ChatGPT users.
Access to ChatGPT for Italian residents was restored on 28 April, after OpenAI implemented improved privacy controls. However, ChatGPT is also currently the subject of probes by the privacy regulators of Spain, France and Germany, as well as a taskforce launched by Europe's central privacy watchdog, the European Data Protection Board.
- On 28 April 2023, the EU Parliament reached a preliminary agreement on the terms of the 'AI Act'. The Act would see generative AI developers be required, among other things, to:
- subject their models to extensive safety and compliance assessments;
- be highly transparent in their operations; and
- disclose whether any copyrighted material was used to train their systems.
- On 23 March 2023, the Federal Minister of Industry and Science, Ed Husic, announced that he had requested the National Science and Technology Council to provide a report that would inform the Government's approach to the regulation of generative AI.
We will address in greater detail Australian and international regulation of generative AI in a subsequent Insight in this series.
Key legal considerations for developers and their customers
Privacy
Developers of generative AI tools need to be mindful of privacy compliance risks, not only at the development stage of their models but also once these models are being used by their customers. The manner in which generative AI models use data for training purposes, and can also then generate new information as part of their outputs, poses novel questions for privacy regulation around the world.
Development stage
Developers will need to ensure that their collection and use of the massive amounts of data necessary to train their LLMs conform with the requirements under state and federal privacy legislation, particularly the Privacy Act 1998 (Cth). The data scraped to train LLMs are likely to include a number of sources that contain personal information. This scraping of personal information generally occurs without the knowledge or consent of individuals. The fact that it is already publicly available does not generally alter the way in which it is treated under Australian privacy legislation.
In undertaking these activities, developers will need to have the requirements of the Australian Privacy Principles (APPs) (particularly current APPs 3 and 6) built into the design of their data gathering and training processes, and be aware of the ramifications of any potential breaches of those APPs. This is particularly important given that the changes to the Privacy Act enacted late last year mean that the potential penalty for bodies corporate can now extend to the greater of $50 million; three times the value of the benefit obtained attributable to the breach; or, if the court cannot determine the value of the benefit, 30% of the adjusted turnover of the body corporate during the breach turnover period for the contravention.
Australia's privacy regulator, the Office of the Australian Information Commissioner (the OAIC), has previously penalised AI companies for this data scraping practice. In 2021, the OAIC issued a determination finding multiple breaches of the Privacy Act were committed by Clearview AI, in relation to practices including the scraping of more than 3 billion images from various social media platforms and websites. Clearview AI was therefore ordered to cease the scraping of images and delete all images it held relating to individuals in Australia.
The OAIC's determination that Clearview has an 'Australian link' has this week been upheld by the Administrative Appeals Tribunal on the basis that if it obtains data from servers located in Australia, it therefore carries on business in Australia.
Once accessible to customers
Developers will need to ensure that they have adequate controls in place to:
- manage the ingestion of personal information submitted by customers to their generative AI tools. This should include providing clear privacy policies and collection notices that inform users how their personal information may be used, such as to train the LLM, and how long their personal information may be retained. Developers will also need to consider what protocols can be implemented to ensure personal information that developers collect from customer inputs is destroyed or de-identified when it is no longer required. In this vein, OpenAI last month introduced a number of data privacy controls, including allowing users to turn off their chat history. Chats that are marked by users as 'history disabled' will not be used to train OpenAI's models, and will still be stored on the company’s servers but only be reviewed on an as-needed basis for abuse, and will be deleted after 30 days. OpenAI's API data usage policy also provides that user data will not, by default, be used for model training; and
- protect against improper disclosure of personal information via outputs from models. Developers should consider implementing guardrails that prohibit their AIs from generating outputs that include personal information. Eg where ChatGPT is prompted to provide information concerning an individual (other than individuals with a public profile, such as historical figures and celebrities), it will provide a stock response to the effect of: 'I apologize, but as an AI language model, I cannot provide real-time or up-to-date information on specific individuals. Additionally, due to privacy concerns, it's not appropriate for me to provide information on non-public figures.'
As noted in our last Insight, generative AI tools operated and hosted by a technology vendor, like any software solution hosted by a third party, create privacy compliance requirements for organisations. If employees or end-user customers submit personal information to an AI tool, that may – depending on the technical solution – constitute a 'disclosure' to the third-party provider of the system that must comply with applicable privacy laws, such as APP 6.
Customers of generative AI tools should:
- ensure they are aware of how their chosen AI solutions deal with personal information submitted to them, particularly by their own end-user customers – including whether it is collected by the third-party provider, and used for further training and enhancement of the underlying model;
- verify whether they are permitted under applicable privacy laws and the terms of their existing privacy policies to disclose personal information to the operators of the third-party generative AI tools; and
- take steps either to uplift their privacy disclosures and options for their personnel and end-users, or implement controls that bar personnel or end-user customers from including such personal information in their prompts to generative AI tools.
In addition, if Australian companies are subject to the General Data Protection Regulation (the GDPR), they will need to consider how they ensure their use of generative AI tools (and other AI systems) complies with the requirements in Article 22 of the GDPR for individuals to have the right not to be subject to a decision based solely on automated processing that produces legal effects concerning them or, similarly, significantly affects them.
Intellectual property
As LLMs are generally trained on data scraped from the internet, there is inevitably a risk that both the LLMs and their training datasets may infringe copyright, by reproducing significant amounts of third-party images, audio-visual materials or texts in which copyright may subsist. Eg image-producing generative AI companies Stable Diffusion and Midjourney have created their models based on the LAION-5B dataset, which contains almost 6 billion tagged images indiscriminately scraped from the web, irrespective of whether they are the subject of copyright protection. Other LLMs have extensively used news publication content in their training sets.
Failing to appropriately address the potential copyright infringement risks of training an AI tool on third-party materials can be significant, and could mean that developers have to stop the use of the materials on short notice, strip them out of the AI system (which may not be practicable), and/or be liable to pay substantial damages (including, potentially, aggravated damages for flagrant and deliberate infringement).
As reported in our previous Insight, Getty Images, a stock photo company, has brought a copyright infringement action in the UK against Stability AI, the developer of AI image generator Stable Diffusion, claiming that the processing, for the purpose of training Stable Diffusion, of images in which Getty Images owns the copyright, infringed the copyright in those works. In the US, a class action has now also been launched by software developers against Microsoft, GitHub and OpenAI, claiming that the creation of AI-powered coding assistant GitHub Copilot constitutes 'software piracy on an unprecedented scale'.
Operators of generative AI systems are so far vigorously defending these cases and have adopted the position that their use does not amount to copyright infringement. However, in our view, it is possible that the continued use by LLMs of third-party materials may ultimately give rise to a new form of royalty payment to third-party copyright owners, whether that is established by case law or by new legislation. There have also been specific calls that it should give rise to a new form of payment to news publishers, similar to the payments now made for news content by search engine operators.
If developers wish to avoid or mitigate the risk of third-party copyright infringement cases, they will need to:
- invest in ways to have their tools identify copyright materials included within their training data and the owners of such copyright materials; and
- develop processes either to seek clearance to use those copyrighted works for the purpose of training their models (to the extent that is practical), or to be able to subsequently make royalty payments to those copyright owners.
An example of how this can be achieved is demonstrated by OpenAI's partnership with Shutterstock, another stock photo company. The partnership provides that OpenAI licenses images from Shutterstock's libraries to train 'Dall-E 2', OpenAI’s artificial intelligence image-generating platform. Dall-E 2 will then be integrated into Shutterstock's Creative Flow online design platform, allowing users to create images based on text prompts. Shutterstock will then compensate contributors of the images in those libraries used to train Dall-E 2.
Where a LLM was trained on third-party copyrighted materials and the generative AI produces an output that copies a substantial part of any such material, then a customer who reproduces or distributes that output without permission from the copyright owner may also be infringing copyright. To the extent that the output of a generative AI system does not reproduce, but instead creates entirely new materials, this risk may be removed or reduced.
Additionally, customers should be aware that copyright will likely not protect the output of generative AI tools. In Australia, copyright protects certain subject matters that are expressions of ideas, including literary works and artistic works, provided they are 'original'. That is, the work must have originated from a human 'author' who has applied some 'creative spark', 'independent intellectual effort', or 'skill and judgement'.
Therefore, customers should be aware of the risk that they may not be able to prevent third parties from using or copying content that the user has created using generative AI systems, potentially undermining the commercial value of that content. While this is likely to be the case where the content simply comprises the system's responses, copyright could still protect a new work created by a user from the generative AI output: eg by adding to or editing that output, to the extent such additions or changes are 'original'.
Customers of generative AI services should therefore consider:
- including contractual protections in their agreements with generative AI developers that confirm that proper licences to use third-party copyright materials in the training datasets have been obtained, and/or provide indemnities protecting the user from third-party copyright infringement claims resulting from the user's use of the generative AI tool; and
- applying their own skills and judgment to create a new work using the output of a generative AI tool, and ensure that the creation of the new work is well documented, so as to maximise the prospects of obtaining copyright protection in any work product that has been produced using the tool.
Defamation
Generative AI tools create novel issues for Australian defamation law. Although Australia is notoriously a defamation plaintiff-friendly jurisdiction, it is unclear whether developers could be held liable for defamation for content produced by their tools, for the following reasons:
- Publication standard – Only 'publishers' of defamatory content can be held liable for defamation under Australian law. Given the limited degree of human intervention developers have in the specific content produced by their tools in response to user prompts, it is questionable whether developers can be considered 'publishers' of content produced by their generative AI tools, for the purposes of Australian defamation law.
- Responsibility of users to verify accuracy of AI output – It is well understood that generative AI tools are vulnerable to 'hallucinations' (ie they may fabricate facts and sources where they do not have sufficient relevant data), and therefore that the accuracy of the content produced by these tools should not be accepted at face value. The terms of use of these tools also generally place the responsibility for evaluating the accuracy of any output onto the user. Hence, reasonable users of these tools should be expected not to accept at face value the veracity of potentially defamatory imputations of AI-generated content. This differs from where those imputations had been conveyed by a human author in a more authoritative context (eg in a book or a journal article).
- Serious harm threshold – A 'serious harm' threshold for defamation claims was introduced in all states and territories, except for Western Australia and the Northern Territory, on 1 July 2021. The threshold requires plaintiffs in those jurisdictions to establish that any defamatory publication has caused, or is likely to cause, serious harm to their reputation. Outputs of generative AI tools are generally unique to the individual users and the specific prompts the user selects. It is difficult to envision a scenario where defamatory imputations produced to an audience of a single individual could be considered likely to cause the subject of those imputations 'serious harm', particularly in light of the responsibility of users to verify the accuracy of AI-generated content, discussed above.
Nevertheless, until such time as the liability of developers for defamation is clarified by Australian courts or legislation, it may be prudent for developers to consider:
- monitoring outputs for instances of 'hallucinations' being produced by their AI product that may be defamatory of individuals;
- providing disclaimers to users, and including provisions in their terms of use, to the effect that users are responsible for verifying the accuracy of outputs and that users rely on generative AI output at their own risk; and
- implementing guardrails on their models that restrict users' capacity to prompt their AIs to produce potentially defamatory content.
Customers should be wary that there is some risk they may be liable if they publish defamatory content generated by a generative AI tool, even if they are not aware of its defamatory nature. This scenario may arise where a generative AI 'hallucinates' defamatory 'facts' about an individual in response to a user prompt, and a customer subsequently publishes the output to a public forum without verifying its veracity.
Users should therefore be careful to vet content from generative AI that could damage individuals' reputations, before publishing the output elsewhere.
ESG
Generative AI has already been shown to present significant ESG risks. Some examples are:
- Embedded or inserted bias – As generative AI outputs are inevitably a product of their data set, bias produced from incomplete or unbalanced data sets, or data sets that have been deliberately manipulated, can be reproduced in the AI's responses. This may result in their outputs perpetuating gendered or racist viewpoints, or discriminatory outcomes. It can also create false or inaccurate outcomes. OpenAI has stated that it is committed to addressing these issues and, to that end, has published a portion of its guidelines on how reviewers training its models are instructed to deal with potentially biased outputs.
- Hallucinations – As noted above, generative AIs are vulnerable to 'hallucinations': ie they fabricate facts and sources where they do not have sufficient relevant data. This tendency can be particularly problematic, as generative AI tools may not indicate where they have fabricated information and will often display the same level of confidence as when they are providing a factually correct answer. Depending on the circumstances where these hallucinations occur, they may lead to adverse reputational impacts for developers and adverse outcomes for their customers (eg where generative AI use leads to misleading advertisements being published or market disclosures being made, or incorrectly denies worthy applicants a good or service). It is hoped that as LLMs become more sophisticated, the risks of these hallucinations occurring may be reduced.
- Inability to explain an outcome – An ongoing concern with generative AI and supporting technologies, such as neural networks, is that they are often 'black boxes'. This means that it is not always discernible to humans, including those who have developed the technology, how these tools have arrived at a particular answer. This is problematic where humans need to understand why a particular output or decision has been made by AI: eg when providing legal or medical advice in order to comply with ethical and regulatory obligations, or when a government agency is using AI to make a decision on a particular application in order to comply with law. To overcome this, developers will need to make their tools transparent. This is also known as 'Interpretable' or 'White-box AIs'. An example is the way in which Bing AI chat provides users with links to sources used to generate a particular output. However, with the advent of the autonomous 'agents' referred to above, and even chains of autonomous 'agents' operating together, the task of ensuring explicability may become harder.
- The role of human judgment – More broadly, the quick emergence of generative AI solutions is posing a fundamental question about the value, importance and necessity of the application of human judgment. Is it necessary? To what extent? What does it mean if human judgment is no longer required in the undertaking of particular tasks?
- Supply chain / human slavery risks – It has been documented that underpaid workers in places such as Africa and Southeast- Asia have played a crucial part in the supply chain of developing certain LLMs. These workers are often used to train LLMs and to engage in content moderation.
- Carbon emissions – Due to the tremendous amount of computing power required to develop, train, and use these tools, AI tools have a large carbon footprint. An MIT study concluded that the process of simply training AI models may produce around 626,000 pounds of CO2 equivalent. It should be noted, however, that this figure is dwarfed by comparison with the amount of greenhouse gas emissions associated with cryptocurrency mining. A tracker run by the University of Cambridge estimates that the mining of new bitcoin will release about 62 megatons of CO2 equivalent per annum. That is equal to the amount emitted by the entire country of Serbia in 2019.
As an industry, generative AI developers must – and, as cited above, in some ways already do – consider ways to tackle these risks. These steps include:
- vetting training data to ensure it does not include biased viewpoints;
- implementing guardrails and review guidelines to limit the capacity of their generative AIs for producing biased viewpoints and hallucinations;
- implementing mechanisms to make generative AI tools more transparent about how outputs are generated;
- applying greater due diligence to their development supply chain; and
- placing greater reliance on renewable energy in the development process of LLMs, to reduce their carbon footprint.
Considering these potentially significant ESG risks of generative AI, and the fact that these risks could potentially create detrimental impacts for end-users and for organisations using the tools, we think it is imperative that customers conduct due diligence on the ethical and governance frameworks of prospective generative AI suppliers before engaging their services. Existing sustainable supply chain management policies may therefore need to be adapted to cover specific risk areas relating to AI.
We are recommending that customers also establish internal governance processes in order to appropriately and holistically assess, manage and mitigate the risks of acquiring and/or deploying generative AI tools. A robust governance process will be necessary to identify and assess the risks that arise at the crucial point where there is interaction between end-users (be they personnel or members of the public) and an AI tool. We will address what these governance process could look like in the next article in this series, to be issued shortly.
What's next?
We will be delving deeper into the transformative potential of AI for clients, law firms and the economy more broadly, and will provide further updates on the market through a series of Insights and toolkits as legal issues develop. Our upcoming publications will cover the following:
- AI Governance Toolkit for Boards and General Counsel;
- a guide to current and future international and Australian regulation of AI; and
- a guide to the procurement of generative AI solutions.