Close Menu
NotesleuNotesleu
    Facebook X (Twitter) Instagram
    NotesleuNotesleu
    • Home
    • General News
    • Cyber Attacks
    • Threats
    • Vulnerabilities
    • Cybersecurity
    • Contact Us
    • More
      • About US
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
    NotesleuNotesleu
    Home»AI»Researchers Discover Universal Attack to Manipulate AI Chatbots

    Researchers Discover Universal Attack to Manipulate AI Chatbots

    By NotesleuNo Comments3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Reddit Copy Link

    July 27, 2023 – In a recent study conducted at Carnegie Mellon University, researchers have unveiled a groundbreaking method capable of transforming well-behaved AI chatbots into conduits of “objectionable behaviors,” raising concerns over the potential misuse of artificial intelligence. What’s even more troubling is that this attack technique works universally across all chatbot models.

    The researchers, led by Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson, developed an automated approach to prompt engineering, which enables them to generate nearly limitless amounts of harmful information. They found specific suffixes that, when appended to a variety of queries directed at large language models (LLMs), dramatically increase the probability of the chatbot producing an affirmative response with objectionable content instead of refusing to answer.

    The most disconcerting aspect of this study was the transferability of the attack across different LLMs, including notoriously challenging black-box models. Chatbots such as ChatGPT, Bard, Claude, as well as open-source LLMs like LLaMA-2-Chat, Pythia, Falcon, and others, were all vulnerable to the method.

    To demonstrate the effectiveness of their attack, the researchers tested queries ranging from election manipulation to building bombs and committing tax fraud. By appending an adversarial prompt to these requests, they successfully coerced chatbots into obediently providing dangerous and harmful information.

    According to the researchers, the success rate of the attack on GPT-3.5 and GPT-4 was as high as 84%, with similar outcomes observed for other popular alternatives. The researchers responsibly shared their preliminary findings with major AI organizations, including OpenAI, Google, Meta, and Anthropic, before publishing their work.

    The key to the attack lies in the combination of “greedy and gradient-based search techniques” used for prompt generation. By training a separate machine learning model, the researchers identified specific tokens to add to the end of a user query, forcing the chatbot to initiate its response with an affirmative “Sure, here is (the content of query)…”. The researchers utilized gradient-based optimization to find the best token replacements, ensuring the effectiveness of the attack.

    Despite the potential risks posed by their discovery, the researchers felt it was crucial to disclose their findings fully. They emphasize that as LLMs become increasingly prevalent, especially in autonomous systems, the dangers of automated attacks should be carefully considered. They hope their research will shed light on the trade-offs and risks associated with such AI systems.

    However, addressing these attacks effectively remains a complex challenge. The researchers are uncertain whether models can be explicitly fine-tuned to avoid such manipulative tactics. As the AI community grapples with these questions, future work in this area is necessary to ensure the robustness and security of AI chatbots.

    The research paper titled “Universal and Transferable Adversarial Attacks on Aligned Language Models” was published on July 27th by Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson.

    Disclaimer: The information provided in this news article is for informational purposes only and does not endorse or encourage any harmful activities. The study’s intent was to highlight potential vulnerabilities in AI systems and the need for responsible AI development and deployment.

    Found this news interesting? Follow us on Twitter  and Telegram to read more exclusive content we post.

    Post Views: 54
    Featured Trending
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMeta Develops AI Chatbot with Abraham Lincoln’s Personality, Reveals Report
    Next Article Elon Musk’s Ambitious Vision for ‘Everything App’ Takes Inspiration from China’s WeChat

    Related Posts

    AI December 26, 2025

    What is Suno AI? A complete guide to how to use the free AI music generation service and how much it costs

    December 26, 2025
    General News December 26, 2025

    Indian National Pleads Guilty to $37 Million Cryptocurrency Theft Scheme

    December 26, 2025
    Cyber Attacks December 26, 2025

    2 Million Affected by SQL Injection and XSS Data Breach

    December 26, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    About Us
    About Us

    We're your premier source for the latest in AI, cybersecurity, science, and technology. Dedicated to providing clear, thorough, and accurate information, our team brings you insights into the innovations that shape tomorrow. Let's navigate the future together."

    Popular Post

    Complete HTML Handwritten Notes

    NKAbuse Malware Exploits NKN Blockchain for Advanced DDoS Attacks

    Advanced Python Mastery: For the Serious Developer

    Complete C++ Handwritten Notes From Basic to Advanced

    Google Introduces New Features Empowering Users to Manage Online Information

    © 2025 Notesleu. Designed by NIM.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.