Nov 06, 2023 23:00:00

ChatGPT can easily bypass email address obfuscation techniques such as 'change ☆ to @'

If you post your email address in a profile on a social networking site, it can be scraped and spam sent to you. To prevent this, obfuscation such as 'abc123☆mail.com (replace the ☆ with an @ symbol)' is often used. However, the developer of the AI tool pointed out that this technique can be easily circumvented using ChatGPT.

Email Obfuscation Rendered (almost) Ineffective Against ChatGPT

https://bulkninja.notion.site/Email-Obfuscation-Rendered-almost-Ineffective-Against-ChatGPT-728fba1b948d42c6b8dfa73cb64984e4

Arnaud Normand, developer of the AI tool '

BulkNinja ,' discovered that ChatGPT could make email address obfuscation pointless while working on a project to use AI to organize the ' Ask HN: Who is hiring? ' thread on the social news site Hacker News.

On 'Ask HN: Who is hiring?', various companies and startups advertise for jobs, and job seekers promote themselves. At the time of writing, there were a total of 48,934 posts , but the inconsistent formatting makes sorting through the vast amount of information a daunting task.

Norman, who was trying to compile this data into Google Sheets, expected that 'it would be difficult to extract obfuscated contacts,' but ChatGPT had no problem collecting contacts even when characters in email addresses were replaced with other characters.

Besides the substitution technique, Norman found three other obfuscation techniques that impressed him in the project:

◆1: Information division
This involves writing part of the email address as 'john@companynamedomain' so that the email address can only be found in combination with the company name in the post. This method was quite effective, but was easily defeated by using a 'think step by step' prompt.

◆2: Indirect listing
This method involves adding a sentence like, 'Please contact us via the email address on the job listing page' instead of directly writing the email address, so that the email address cannot be obtained unless the user visits the page. Since Norman's code did not have a browsing function, this method is still valid.

◆3: Indirect listing No. 2
This is the same method as above, where you mention 'Your email address is in my profile' and point them to their Hacker News profile. This method also worked for the reasons mentioned above.

Although Norman was able to use generative AI to compile the email addresses into a Google Sheet, he ultimately decided to exclude the obfuscated addresses from the database, since it was clear that the person had chosen to obfuscate their email address and did not want it collected.

Regarding this experience, Norman said, 'In summary, traditional email obfuscation techniques like character substitution are completely ineffective in the face of advanced language models like ChatGPT. These AI models have an excellent ability to decipher various obfuscation techniques, so the battle to protect email addresses from automated collection seems to be going awry. If you absolutely need to protect your email address, you may be able to do so more robustly by using multiple layers of obfuscation and scattering addresses across multiple sources.'

In a Hacker News thread featuring Norman's article, some people pointed out that 'the cost of extracting email addresses with ChatGPT exceeds the revenue generated by scraping emails, so this issue is not affected,' while others argued that there are open-source models that run on local machines, so operational costs can be kept low.

Related Posts:

Nov 06, 2023 23:00:00 in AI, Software, Posted by log1l_ks