Google adds generative AI threats to its bug bounty program

Key Takeaways:

– Google has expanded its vulnerability rewards program (VRP) to include attack scenarios specific to generative AI.
– The VRP incentivizes research around AI safety and security to make AI safer for everyone.
– The program pays ethical hackers for finding and responsibly disclosing security flaws.
– Google is using findings from its AI Red Team to categorize and report bugs in generative AI technology.
– The team found that large language models are vulnerable to prompt injection attacks and training-data extraction.
– Prompt injection attacks can influence the behavior of the model, while training-data extraction allows hackers to extract sensitive information.
– Model manipulation and model theft attacks are also covered in the scope of Google’s expanded VRP.
– The rewards for discovering vulnerabilities vary based on the severity, with a maximum reward of $31,337 for highly sensitive applications.
– Google paid out over $12 million in rewards to security researchers in 2022.

TechCrunch:

Google has expanded its vulnerability rewards program (VRP) to include attack scenarios specific to generative AI.

In an announcement shared with TechCrunch ahead of publication, Google said: “We believe expanding the VRP will incentivize research around AI safety and security and bring potential issues to light that will ultimately make AI safer for everyone,” 

Google’s vulnerability rewards program (or bug bounty) pays ethical hackers for finding and responsibly disclosing security flaws. 

Given that generative AI brings to light new security issues, such as the potential for unfair bias or model manipulation, Google said it sought to rethink how bugs it receives should be categorized and reported. 

The tech giant says it’s doing this by using findings from its newly formed AI Red Team, a group of hackers that simulate a variety of adversaries, ranging from nation-states and government-backed groups to hacktivists and malicious insiders to hunt down security weaknesses in technology. The team recently conducted an exercise to determine the biggest threats to the technology behind generative AI products like ChatGPT and Google Bard.

The team found that large language models (or LLMs) are vulnerable to prompt injection attacks, for example, whereby a hacker crafts adversarial prompts that can influence the behavior of the model. An attacker could use this type of attack to generate text that is harmful or offensive or to leak sensitive information. They also warned of another type of attack called training-data extraction, which allows hackers to reconstruct verbatim training examples to extract personally identifiable information or passwords from the data. 

Both of these types of attacks are covered in the scope of Google’s expanded VRP, along with model manipulation and model theft attacks, but Google says it will not offer rewards to researchers who uncover bugs related to copyright issues or data extraction that reconstructs non-sensitive or public information.

The monetary rewards will vary on the severity of the vulnerability discovered. Researchers can currently earn $31,337 if they find command injection attacks and deserialization bugs in highly sensitive applications, such as Google Search or Google Play. If the flaws affect apps that have a lower priority, the maximum reward is $5,000.

Google says that it paid out more than $12 million in rewards to security researchers in 2022. 

Source link

AI Eclipse TLDR:

Google has expanded its vulnerability rewards program (VRP) to include attack scenarios specific to generative AI. The VRP pays ethical hackers for finding and responsibly disclosing security flaws. With generative AI introducing new security issues such as unfair bias and model manipulation, Google has rethought how bugs should be categorized and reported. To do this, Google’s AI Red Team, a group of hackers simulating different adversaries, conducted an exercise to identify threats to generative AI technology. They found that large language models are vulnerable to prompt injection attacks and training-data extraction. These types of attacks, along with model manipulation and model theft, are now covered under the expanded VRP. However, Google will not reward researchers for bugs related to copyright issues or non-sensitive data extraction. The monetary rewards vary based on the severity of the vulnerability, with researchers able to earn up to $31,337 for highly sensitive applications. In 2022, Google paid out over $12 million in rewards to security researchers.