AI has exploded in the last two years, both in public adoption and in company valuations. So far in 2024, the AI sector has the highest valuations of any sector, even beating fintech, one of the most innovative and investment-heavy sectors around.
The first six months of 2024 saw 13 new unicorns—a company valued at $1 billion or more—in the AI sector.
New generative AI (gen AI) tools have improved the mainstream visibility of AI. Tools such as ChatGPT and Claude allow users to easily data-crunch numbers and get answers to complex questions (even though the answers aren’t always accurate). Image generation tools such as Midjourney and Adobe’s Firefly are also driving user adoption. B2B use cases are likewise increasing quite rapidly.
Similar to cloud adoption, AI adoption has moved so rapidly that cybersecurity has been treated as an afterthought, leaving organizations open to risks they might not be aware of. For example, Microsoft’s CoPilot+ Recall feature, which takes screenshots of everything users do on their computers, is a cybersecurity nightmare. Generative AI chatbots are also widely known to be susceptible to prompt injections, which can potentially leak internal organizational documents.
Making matters worse, AI companies are openly flouting the rules to get a big chunk of the venture capital pie before regulations set in, further opening clients to risk.
In the face of all this, here’s what you must know about AI vendors before entering into a contract with one.
It’s an open secret that the training data for many AI companies comes from public information, likely copyrighted. OpenAI has been sued by The New York Times and multiple international authors for allegedly stealing their copyrighted work. When The Wall Street Journal asked OpenAI’s long-term CTO (Chief Technology Officer) what data ChatGPT was trained on, she claimed ignorance, then became defensive and refused to answer. The video went viral.
This need for training data to feed AI models has made data the new gold. As AI chips become faster, new data is needed to make AI useful. This has put a premium on data.
One example that demonstrates just how valuable data has become is when Reddit went public. Before its IPO, Reddit revealed a $200-million deal to give an undisclosed AI company access to its treasure trove of user-generated data. Reddit’s IPO closed at 48% higher than expected, valuing the company at $9.5 billion, which experts believe was largely due to the AI data deal it struck.
The plot thickens when you consider that Reddit is now actively blocking AI crawlers from companies that haven’t paid for access to its data. It’s doing this using an age-old internet standard called robots.txt—a file that tells crawlers which part of a site it’s allowed to crawl, and which it isn’t.
Robots.txt files aren’t enforceable, and unethical crawlers typically completely ignore these files. The AI search engine Perplexity was recently accused of doing just that—accessing swathes of data from Wired and other websites that have specifically blocked it from crawling their websites. The evidence against Perplexity is fairly damning, as is the evidence against OpenAI and other AI companies for how they transcribed and crawled millions of potentially copyrighted YouTube videos.
However, no precedent exists in these cases, and few have been tried in court. For those that have been, the decisions have largely swayed in AI’s favor, leaving copyright holders stunned.
The takeaway is this: If you have data you don’t want stored in an AI database, a robots.txt file is insufficient to block it.
Any misconfiguration that leads to data exposure means that data will very likely end up in an AI model. For example, misconfigured AWS buckets have accounted for over 43 terabytes of leaked data to date. This brings us to the following issue:
The sheer quantity of data sitting in AI servers makes them a juicy target for hackers.
The hallmark of a supply chain hack is that the entire chain is only as strong as its weakest link. Attacking vulnerable points of a supply chain is a well-known and highly successful tactic used by hackers.
In addition to the data itself, bad actors are also interested in penetrating the infrastructure that makes AI work because the infrastructure uses extremely powerful computers and servers that hackers can leverage for their nefarious tasks. In one recent attack, hackers breached hundreds of AI servers and installed cryptocurrency mining software on them. Crypto mining requires immense amounts of power, which state-of-the-art AI infrastructure servers can provide.
Cyber criminals’ extreme interest in AI companies means any customer of that AI company is at risk of a data breach. This is especially troubling for companies that are using an AI vendor for an internal AI application that is trained on confidential company data or HR information.
Another risk is the potential for hackers to poison the dataset, which would then affect the output of that data.
Even top AI vendors can be hit by a hack. OpenAI was recently compromised but didn’t report it to the FBI or the public. The data was instead leaked by two AI employees. OpenAI said it didn’t report the breach because it didn’t involve customer data. Considering the financial impact of reporting data breaches, we can only guess that OpenAI chose to withhold this information to protect the immense financial interest the AI sector is currently experiencing, which brings us to the final cybersecurity risk regarding AI:
AI is moving at a lightning pace. In Q3 2023, one in five unicorns—a company valued at over $1 billion—was an AI startup, totaling $21 billion in value. In such an explosive ecosystem, companies must favor growth above all else, even if it means cutting ethical and cybersecurity corners.
Microsoft rushed to release a ChatGPT-enabled Bing just months after ChatGPT’s release, which had catastrophic consequences as the chatbot became aggressive and downright threatening. Similarly, Google launched a not-quite-ready AI search product that, among other things, told users to use glue to stick their cheese to pizza.
Still, the products remain out there. AI companies’ hunger to get and keep users sits above everything else, even cybersecurity, even if that means releasing unpolished products.
Both startups and established companies are clearly cutting corners in the AI race, as we’ve seen in all the above examples. It’s unlikely that this will change without regulations coming into place, but regulations are always slower than tech advances.
When companies are less concerned about risks and threats than they should be, the risk passes down to their customers. Not only are companies facing the threats we covered above, but they may also be facing implementation and data security hazards due to sloppy or negligent processes, resulting in a risky third-party.
The only way to approach AI safely is to adopt a framework that minimizes your potential cybersecurity risk. We suggest taking the following four steps when approaching AI vendors:
It’s easy to get caught up in the hype. We recommend looking at security first and performing thorough due diligence on any AI vendor before committing to a long-term relationship.
Even if one AI company looks shinier but less secure, you’re more likely to achieve sustainable, long-term success by going with the more security-conscious firm.
The AI sector is all about go-to-market and speed at the moment. It’s necessary to maintain a mindset independent from this tactic as you approach vendors. Don’t make the same mistake they’re making and sacrifice security for speed.
Talk to your department leaders internally about potential risks and see what prospective AI companies are doing to mitigate risks. By getting feedback from different departments in your organization, you’ll be able to get a broader perspective on what the issues really are.
Knowing all the cybersecurity pitfalls your company can run into is challenging even in normal times. Adding a new technology such as AI increases the risk exponentially.
To improve your supply chain security, you can work with a managed partner who’s familiar with the scene and can help guide these conversations. The right managed partner can ensure you’re working with a safe vendor and can also assess your overall third-party risk.
At SolCyber, we have security specialists of all levels who are ready to assist you with evaluating your AI vendor risk. We also have programs to help you build a strong cybersecurity posture at a fraction of the cost of creating one in-house.
Using AI vendors securely is possible, provided you do all the necessary due diligence, which is far easier if you work with a competent managed security partner.
To learn more, reach out to us today.