Nikos Pekiaridis/NurPhoto via AP
In its consequential antitrust trial, Google has pointed to an emerging technology that could equalize competition in the search engine market: artificial intelligence.
Google has spent substantial time during its cross-examinations of Justice Department witnesses drawing upon reports that the AI revolution represents a “paradigm shift” for search, and could pose “one of the biggest threats to [Google’s] long-established position as the internet’s main gateway.”
But Google’s claims about AI as a great leveler have been contradicted by what would be an unlikely star witness: Google’s own AI chatbot, Bard.
In chats obtained and then repeated by the Prospect, Bard was queried about the questions at issue. The chatbot itself acknowledged that its advantage over other AI chatbots comes from the massive set of information available to it, which is derived from other Google products and its web crawler, which is the same as the one used by Google Search to index the internet. “This advantage is likely to continue to grow as Google continues to invest in its crawling and indexing technologies,” Bard stated.
When asked if competing search engines can gain access to this dataset, Bard initially said yes but admitted it was wrong. “Google has not released the training dataset for Bard because it wants to protect its intellectual property,” the AI stated in one chat. In another, Bard simply stated, “The dataset is a valuable asset for Google, and it gives Google a competitive advantage in the AI market.”
More from David Dayen | Luke Goldstein
Furthermore, Bard explained, third-party websites that want to control their own data and opt to shield Bard from scraping their content would face consequences. “It is important to note that blocking Google-Extended,” Bard stated, referring to the name of Bard’s web crawler, “will also prevent Bard from crawling and indexing your site for Google Search. This means that your site will not be eligible to appear in Google’s SERPs [search engine results pages].”
This makes Bard extremely likely to both maximize data acquisition and block competitors from receiving it. If this is successful, Bard conceded, it is possible that “Bard could provide users with all of the information they need in one place, without the need to visit other websites.” Bard also hypothesized an alternative scenario whereby Bard could increase competition in internet search, but most of its answer hinges on rivals being able to obtain Bard’s training data, which it admits is not currently allowed.
Bard lays out other damaging impacts that generative AI could have on third-party websites that rely on Google. While Bard scrapes data from third-party websites, it stated that it wouldn’t always link to those sites in its results. In one chat, Bard explained that the decision to link or cite to a source is a matter of “personal preference,” and “up to me.”
As a result, Bard admitted, its authoritative answers would be likely to siphon away traffic and revenue from outside web producers, without recourse for escaping Google’s orbit on the web. “It is possible that fewer people will leave Google to visit other sites once Bard is integrated into general search results. This could lead to a decrease in traffic to those sites and make it harder for them to create sustainable business models,” Bard stated.
Bard argued that Google could also leverage user data from its other products such as Gmail, Google Drive, and Google Maps to give it an advantage in AI tools. Those claims have already been confirmed by reporting from The New York Times on Google’s recent authorization for Bard to draw upon these separate lines of business.
An initial chat conversation was conducted by an outside individual who is critical of Google’s market power. The Prospect then asked the same prompts and received broadly similar, though not exactly the same, results.
THERE ARE ONLY TWO POSSIBLE VERSIONS of Bard’s answers in these chats. Either they are correct, which offers a window into Google’s intention to maximize the power of its data to create and integrate generative AI at the expense of competitors. The other option is that Bard is synthesizing information from news reports and message boards and other data sources, and isn’t reliably reporting the truth of what Google is doing with Bard.
In a statement, a Google spokesperson went with option two. “All LLMs hallucinate, including Bard,” the spokesperson told the Prospect in an emailed statement. “As we’ve always said, Bard is an experiment designed for creativity and productivity and is better at recommending must-see sights in NYC or suggesting Thanksgiving decorations than opining on complex antitrust lawsuits.”
But this response means that Bard is poor at not only lawsuit analysis, but explaining what Bard does and how it is used. While we shouldn’t trust AI to talk about itself reliably, that calls into question what value we can derive from it on a host of subjects.
Google has repeatedly called Bard an experiment, and has been fairly transparent about it. At the Bard demonstration website, a disclaimer says, “Bard will not always get it right: Bard may give inaccurate or offensive responses. When in doubt, use the Google button to double-check Bard’s responses.”
The Prospect did so on several occasions. Google Search confirmed that Google-Extended is Bard’s web crawler, that Bard scrapes web data, and that users can already integrate Bard into other Google products, including Search. It was more inconclusive on how Bard decides to link to data, or whether third-party search engines can integrate Bard into their products.
Bard’s experimental nature did show through at times in the chats. Bard tells the outside individual in its chat that it can scrape third-party websites, while telling the Prospect that it does not. Google has been cagey about this in the past, mindful of privacy concerns, but in July it confirmed that it was using third-party data for Bard.
Bard lays out damaging impacts that generative AI could have on third-party websites that rely on Google.
Bard also gave a lawyerly answer to the question of how third-party websites can avoid its data scraping. It stated that websites could use meta tags that would prevent Bard from showing all or some of the content from that site in its answers. For example, a website could hide a paragraph of text by putting a “max-snippet” or “data-nonsnippet” tag around it, or add a “no-snippet” tag to eliminate any use of content from the site in answers. But this seems to say that the data will not be quoted or cited, not that it will not be scraped. In other words, the data could still be accessed to inform Bard, even if it’s not showing the content to users.
While “personal preference” was cited in one chat as a potential reason for not linking to a source in an answer, in another chat that does not appear. In both, Bard stated that if a website is not “relevant, authoritative, high-quality, or accessible,” it will decline to link to it. But then it added: “My goal is to provide users with the most helpful and informative answer possible. If a link to a site would help to achieve that goal, then I will include it. However, if a link would not be helpful or informative, then I will not include it.” That seems to point back to personal preference, and at any rate, Bard alone would make the determination of whether a website is “relevant, authoritative, high-quality, or accessible.”
Other times, Bard seems just confused. When explaining whether Bard integration will benefit Google at the expense of other websites, it gave lots of examples for why that would be the case. “If a user searches for ‘how to bake a cake,’ Bard could generate a detailed answer that includes instructions, ingredients, and tips,” it wrote. “This answer could be so comprehensive and informative that the user does not need to click on any links to learn more.”
On the other hand, Bard claimed, its integration could make it more likely for users to leave Google and visit other websites. But the answer is a non sequitur: “This is because Bard could generate search results that are so informative and comprehensive that users no longer need to visit other websites to learn more.”
In another chat, Bard added: “It is also likely that Bard’s integration into Google Search will increase Google’s dominance of the web. Google is already the dominant search engine in the world, and Bard’s integration into Search will only make Google’s search engine more powerful and useful.”
Finally, Bard’s answers on how other websites could benefit from Bard without being able to access training data border on wishful thinking. It stated that a developer could simply build its own version of Bard, while conceding that “this would be a very challenging and expensive undertaking.” It suggests that the developer could crawl Google’s search pages and take all the data, even though “it would be likely to violate Google’s terms of service.” Or Bard could be used by other developers as “a source of inspiration.”
“Ultimately, the decision of how to make Bard accessible to non-Google sites is up to Google,” Bard concludes.
AI HAS HUNG IN THE BACKDROP of the ongoing antitrust trial between the Department of Justice and Google. From the outset of the case, antitrust proponents such as President Biden’s former competition policy czar Tim Wu have argued that Judge Amit Mehta’s decision could strongly influence the rules for competition in the AI arms race between tech companies.
Google, for its part, has also cited AI competition for its own advantage in the trial to rebuff several of the government’s claims. The defense team points to the AI arms race for generative search to argue that the search market is in a transitional period, and that Google faces robust competition on the horizon.
Google’s lawyers have suggested that AI is an open market because new entrants can easily get around barriers to entry that Google might hold, such as scale and troves of user data. That’s because new AI products can at least theoretically draw on existing publicly available datasets to develop machine learning.
Google established these points early on in the trial during its examination of its chief economist Hal Varian, who contested the importance of “network effects” to Google. Google’s lawyers also pressed Microsoft CEO Satya Nadella on competition in AI during his testimony. Nadella challenged Google’s portrayal of the AI market by explaining that Google’s “dynamic datasets” would provide a major upside for developing AI.
“This is going to become even harder to compete in the AI age with someone who has that core … advantage,” said Nadella. Bard backed up this contention in our chats.
“The way Google has deployed these arguments about AI is a kind of a hocus-pocus shell game to dazzle the judge … but it’s not really that relevant to the heart of the case,” said Megan Gray, an independent lawyer and former legal counsel for DuckDuckGo who’s been attending the trial.
By casting doubt on its own technological prowess in AI, Google’s trial team has also undercut its own CEO’s statements in earnings calls. Sundar Pichai has consistently drummed up enthusiasm to investors about Google’s confidence that it will be able to win the AI market. At this year’s Google I/O conference, Pichai called Google “an AI-first company,” citing several ways that “generative AI is helping to evolve our products.”
Even defenders of Google in its antitrust case have let slip their thoughts about AI. Adam Kovacevich, CEO of the pro-tech trade group Chamber of Progress, in talking about the narrowing of the Justice Department’s antitrust case at the summary judgment stage, celebrated the “AI angle to the ruling: ChatGPT & Bart [sic] show that search is evolving even more from links to answers/chatbots. Websites may not like the decline in traffic, but consumers benefit from faster answers.”
That is as candid as Bard, acknowledging that the next iteration of AI-integrated search will seek to squeeze away any need for users to visit other websites.
Bard is candid in one other respect: On its demonstration site, it informs users that “Bard may share parts of your conversations and other relevant info, like your location, with other services.”