5 min read

Copyright and AI: Big Tech are the Little Guys

Generative AI is often cast as big tech plundering the work of creators, but in this fight the so-called giants are the challengers.
Copyright and AI: Big Tech are the Little Guys
Photo by Jessica Ruscello / Unsplash

A few weeks ago, the Productivity Commission released five interim reports under its Five Pillars of Productivity review. Productivity growth in Australia has been sluggish for years, but concrete proposals to fix it are thin on the ground.

One idea in the reports is a potential carve‑out in Australian copyright law to allow the text‑and‑data mining needed to develop large language models (LLMs). Without such an exemption, the legal uncertainty could discourage investment and deal us out of opportunities to build AI technology or host data centres.

The PC’s interim report into Harnessing Data and Digital Technology quickly sparked broader debate about how these LLMs were developed. That reaction makes sense once you consider the prevailing narrative: that tech giants like Microsoft and Google are exploiting the works of Australian writers and authors.

Or is it really that straightforward?

There’s no question that companies like Microsoft, Google, and Amazon are giants in their respective sectors, but in the field of AI, they're the upstarts. The technology only exists because of the massive investments these companies have put up, and every time you use it, at least one of them is losing money.

The copyright lobby is out in force across newspapers and websites, presenting Australian authors warning that Australian stories by Australian writers are being appropriated by foreign tech giants and recycled into AI slop. You can read plenty of these stories in august publications like The Australian, owned by American media giant News Corp, which also owns HarperCollins, and the ABC, which charges creators $88 per second for archival footage which the taxpayer already paid to produce.

The reality of this campaign against generative AI is that the loudest voices are the major copyright incumbents, seeking new royalties from a technology they didn’t create. They can’t point to lost revenue – unlike piracy, no paying customer has been displaced – but they want a cut all the same. True, the technology depends on training material, but much of it comes from sources that will never generate royalties: open-licence content like that released under Creative Commons, or hobbyist material such as an enthusiast’s blog. Incumbents want to get a cut by virtue of them being incumbents, because the hobbyist bloggers or posters on Reddit or Stack Overflow have neither the ability, time nor inclination to get their cut.

Copyright law is intended to encourage creativity, but in practice – especially given its repeated extensions – often serves to let incumbents extract rent from the same work. It is well known, for example, that copyright length was repeatedly extended with retroactive effect to prevent the original "Steamboat Willy" cartoons from entering the public domain. There are other examples that illustrate the same pattern.

Google Books faced fierce opposition from publishers and the Authors Guild for digitising and making searchable millions of books, many of them out of print but still under copyright. Google's opponents wanted them to pay licencing fees even on books which the publishers and authors were not commercialising themselves. The courts ultimately ruled the project was Fair Use, but it would be unlikely to survive Australia’s more restrictive Fair Dealing laws.

Another example is the affair concerning the Dallas Buyers Club, whose distributor tried to bring speculative invoicing into Australia. This is a practice where, if you pirate a film (which you shouldn’t do, by the way), rather than simply seeking the cost of the movie – the actual loss – the copyright holder pushes an inflated invoice designed to scare you into paying. In other words, a shakedown. Can this sort of practice be justified as supporting the creative industries or is it just a way for the copyright holder to get as much leverage and profit as possible – on a film which I might add was deliberately held back for release in Australia and hence wasn't even available for purchase for the first four months.

I don’t raise those two examples to dismiss copyright or paint enforcement as malevolent, but to underline that copyright holders are ultimately businesses aiming to make money, and we shouldn’t pretend tech companies are the only ones out to turn a buck. It's also worth remembering the contrast between copyright and other forms of intellectual property: a drug company could spend billions bringing a life‑saving treatment to market and yet the patent will only last 20 years – far less than the protection this article will enjoy.

I’m not saying that tech companies should be allowed to just do whatever they want until the law is changed – but, I mean, could we just let them cook for a bit?

Take Uber. Yes, it is dodgy and underhanded that they came here and essentially operated an illegal and unlicensed taxi company in Australia for years, undercutting the people who played by the rules. But ask any Sydneysider this: when was the last time you hopped in a taxi and the driver didn’t try to scam you in some way? Maybe they refused a short fare or didn’t use the meter, maybe they double charged you on tolls or just blatantly put a higher number on the credit card terminal. Uber’s conduct was wrong, but I’m happier living in the world where Uber exists than the one where they couldn’t even have a crack at it, particularly given how taxis were at the time and still are.

Imagine if, before OpenAI could begin training its LLM, it had to licence every copyrighted work it wanted to use. Or, even if it just relied on deals with the big book and news publishers rather than scraping the web. If that had been the requirement, the outcome wouldn’t be a renaissance of royalties and creativity – more likely, we would have missed out on the technology altogether, or been left with something trained only on Project Gutenberg, a collection limited to older public domain works that would leave the AI unable to handle modern language, culture, or technical topics.

I don’t really know what the solution is when it comes to the training material for these LLMs. What I do know is that the technology is sick; I was blown away when it first came out, and it still finds ways to surprise me. The technology is going to affect creatives, but the impact is going to be felt not by the people writing bestsellers, but by the people who write corporate copy or routine news articles. Unplugging the machine, or funnelling money to the large publishers, isn’t going to solve those problems. Everyone wants to turn a buck, and that’s admirable – but copyright law needs to strike a balance between encouraging people to make new works and enabling rent-seeking off the old.

"Just an image to go with the article thanks mate"