Fighting instagram spam
Instagram has become one of the most popular social media platforms, with over 1 billion active users worldwide. However, its wide popularity has turned this social media platform a target for spam campaigns. From fake advertisements to spam comments, Instagram is reported to be one of the leading platforms in user complaints for unwanted spam.
Traditionally, one of the leading methods for sending spam was email. Junk mail is often sent for tricking users into giving away personal information, money, or access to online accounts. However, email providers have included spam filtering algorithms that automatically analyze the emails and flag them.
With the rise of social media and messaging apps, spam has followed the users and spread to these platforms. This spread has made spam even more difficult to detect, as it can now come in different forms such as direct messages, post comments, or stories. Additionally, spammers have also become more sophisticated in their tactics, making it harder to track spam campaigns. On the other hand, artificial intelligence has also become more powerful, boasting human-like performance for different tasks such as object detection, time series forecasting, or question answering.
The recent advancements of AI make it possible to automatically analyze post comments and flag them before users can see them. Transformer models such as the one behind ChatGPT can learn from large amounts of text and identify the patterns behind spam comments. These models can understand the meaning, the context, and the intent behind each comment which allows them to identify even the most sophisticated spam comments. Because of this, community managers can take advantage of text processing AI and easily moderate unwanted comments without spending hours reviewing the comment section of each post.
In this case study we describe a web service for automated moderation of instagram spam comments. We will explore the different strategies that spammers use to evade detection algorithms, show how AI algorithms can successfully learn these tricks, and explain how this information can be easily handled using a web dashboard.
Comment spam
One of the first steps of every machine learning journey is gathering data to train and evaluate the analysis algorithms. When trying to programmatically manage Instagram comments, we are faced with three main alternatives: Consuming the official Graph API, scraping the web application, or sending requests to the Private API.
Instagram, as other Meta products, provides an official API called the Graph API. This API enables managing an account for which you have obtained an access token, however, this approach is too restrictive for fighting spam effectively. The main shortcoming of the Graph API is that developers need permission from the creator of a post to download its comments, this permission comes in the form of “An access token from a User who created the IG Media object, with the following permissions: instagram_basic, pages_show_list, pages_read_engagement”. This restriction shouldn't be a problem when moderating the account of a client that wants to use our service, however, it becomes a real issue for tracking spam campaigns on other users. Since spam is continuously evolving, spam filters are usually built using spam messages targeting any user in a platform and; because of this, we would have to ask permission to each user we want to read comments from, so the Graph API is not a good candidate for publicly monitoring spam.
Another common option for retrieving social media data is web scraping. Popular scrapers such as Selenium can be used to retrieve websites and interact with them. This enables crawling social networks such as Instagram by programmatically handling web forms. However, web scraping can be a less reliable method for retrieving social media data, as Instagram can make small tweaks to the layout and render the scraping bot useless. One clear example of this, are the anti-adblock measures that Facebook has implemented in recent years. These systems create different HTML tags to fool the filtering rules of popular ad-blockers. This makes web scrapers incredibly difficult to maintain, as it requires developers to continuously update the bot to keep up with the platform’s tweaks.
One third option is using the API that the website and mobile applications consume. Instagram's Private API is a set of endpoints that are not publicly documented, but developers can use to retrieve data from the social network. This Private API is the same API that the web and mobile apps consume to get stories, posts, and comments, so it seems like a good candidate. There's one issue though, using this API directly goes against Instagram’s terms of service, and they may try to block developers that try to use it.
With this information, we are left with no clear solution since the official method for accessing comment information is too restrictive, and the ones that can be used to mass download comments may get us banned. A good approach may be to use the Private API to build an initial dataset of spam comments and monitor spam campaigns, and manage individual client accounts using the official Graph API.
For this project we managed to download over 20000 comments, of which around 10% were spam. After taking a closer look at the downloaded data, we can group spam comments in four main groups: 1) Sex spam, 2) Financial spam, 3) Spiritual spam, 4) Health spam.
Sex spam, also known as adult spam, is a type of spam that is related to sexual content and adult services. Spammers often use fake profiles with photos of attractive individuals to lure users into visiting their sites or sharing personal information. In order to gain the attention of their victims, spammers often post suggestive comments on popular posts which may describe what they may be wearing, ask users to engage in their services, or include playful emojis such as these: 🍆 💦 🍑 👅. This type of spam is one of the most common on Instagram, since profile pictures are easy to steal and use to create new fake accounts to lure new victims.
Financial spam is another common type of spam that involves luring users into engaging with financial growth opportunities. These opportunities usually come in the form of investment strategies, cash giveaways, or crypto scams. In order to attract victims, scammers often post fake success stories on the comment section of popular posts and then, when the user visits their profile, they are presented a link to a fraudulent website that asks for money. The accounts that share the financial opportunity often have “crypto”, “fx”, or a similar term on their usernames, which can be used as an indicator of fraudulent activity.
spiritual spam
Health spam (fat loss viagra...)
Each of these main spam types
Manually labeled spam comments using a low threshold for spam labeling. Comments that looked like they could have malicious intent were also labeled as spam, even if one couldn’t be sure 100% without engaging with the other account.
Artificial intelligence to fight comment spam
Traditionally, there have been three anti-spam strategies to mitigate unwanted content: Prevention, demotion, and detection. Prevention systems make it hard for spammers to post new content by using CAPTCHA, imposing rate limit restrictions, or blocking suspicious behavior. On the other hand, demotion strategies promote content based on certain quality metrics, and reduce the visibility of lower quality content; this way spam is less likely to reach users. Finally, detection based approaches identify spam, either by automated identification or by manual reporting, and remove it. Since we cannot change the order of the comments in a post, and cannot detect suspicious behavior outside our posts, the detection based approach seems like the logical way to follow.
Spam detection has, for a very long time, been tackled using machine learning techniques. Most approaches however, rely on traditional techniques such as computing feature vectors using the bag of words algorithm and using simple classifiers such as naive bayes. The naive bayes approach is quite “naive” since it only considers word occurrence, ignoring semantics, context, and other important information. The way this approach works is that the initial database for spam detection is tokenized, and the frequency of each word is computed. By knowing the relative frequency (probability) of each world, whether or not each word belongs to a spam comment, and the frequency of spam comments, the technique assumes each word to be independent of each other, and computes the spam likelihood as a product of the individual probability of observing each word in a spam comment.
Naive bayes equation image
Traditional approaches tend to be overly simplistic and can be easily evaded by spammers. Some tricks spammers use is using contractions, slang, or misspelled words that are underrepresented or don’t show at all in the training dataset of spam filters. They can also rely on emojis and text-like unicode symbols to evade filters. Bag of words approaches also lack intent understanding, and often don’t know if a user is tagged because the comment is offering a scam service or for a legitimate reason.
In order to avoid these issues, we proposed using a character level transformer for spam classification. Transformer networks were proposed in the paper “Attention is all you need” in 2017 and have been achieving state of the art results in text, image, and audio processing benchmarks ever since. This model introduces the concept of attention, a way of creating word embeddings that convey contextual meaning by using a mathematical implementation of the key-value paradigm of information retrieval systems. Specifically, attention can be described as a function that maps a query, and a set of key-value pairs to an output. Here, queries, keys, and values are vectors resulting from passing simple word embeddings through fully connected feed-fordward networks. The query vectors are then multiplied to the value vectors, resulting in a square matrix that represents relative attention between each pair of words in a sentence. Finally, this square matrix is multiplied to the value vectors, resulting in a new set of embeddings that convey contextual information. Additionally, instead of computing embeddings from word tokens, we decided to use unicode characters in order to better mitigate emoji and letterlike attacks. To achieve this we used the CANINE model, an architecture that uses strided convolutions to downsample the input sequence so that it has a manageable length. Finally, in order to reduce the variability of the comments, we replaced bold, italic, and emoji letters with their ASCII counterparts.
During training, we had to take a few extra steps to ensure successful results. Luckily for Instagram users, spam comments are significantly less frequent than regular comments, however, this means that the dataset is quite imbalanced. In order to handle that imbalance, spam comments in the training set were oversampled to ensure a 50/50 balance. Additionally, we used imbalance-robust metrics such as precision, recall, and the F1 score to track validation results.
MLFlow results screen capture
With this, we managed to build a model that achieves a F1 score of 91%, with a precision of 92% and a recall of 89%. These metrics were computed using the spam class as the target, meaning that 89% of the spam comments were detected.
Moderation dashboard
In order to visualize and manage the detected spam, we built a web dashboard. This dashboard shows the spam trend over the last 30 days, the percentage of comments labeled as spam, and one card for each post. When clicking on a post, the moderator will see the post media and the comments sorted by the spam score; here, the moderator can delete individual comments, or delete all comments above a certain spam score.
The moderation dashboard frontend was created using Next.js, Tailwind CSS, and ECharts. Next.js offers some interesting features out of the box, such as routing, server side rendering, or image optimization. While some developers find Next.js to be too opinionated, we usually prefer it over plain React as it reduces boilerplate code and helps us ship web apps faster. Another technology that we found very useful for saving development time is Tailwind CSS; this framework enables developers to create unique user interfaces without the hassle of writing individual CSS rules. Finally, the spam trend and pie charts were created using ECharts; this charting library strikes a nice balance between flexibility and ease of development, while delivering a wide range of chart types to choose from.
Dashboard image
On the other hand, the backend was developed using FastAPI, SQLAlchemy, and PostgreSQL. Since the comment processing is performed using Python, we decided to use this language as the main one for building the backend. There have been many popular Python web frameworks over the last decade such as Django, Flask, or FastAPI, however, we found that FastAPI has a great developer experience when focusing on building REST APIs. Flask, and especially Django, offer a more “batteries included” approach, aimed at building complete web apps, however, we are focused on just creating an API for accessing the comment data. Additionally, we decided to use the SQLAlchemy ORM with PostgreSQL since we found that relational models are better supported with Pydantic, and we had a clear relational design from the start.
Architecture / tech stack image
Conclusion
This case study featured a web application for automated moderation of instagram comments using AI. Transformers have proven to be an effective solution for identifying spam comments at scale, without relying on cumbersome manual reporting. By leveraging advanced AI algorithms, and providing smart analytics on an easy to use web application, the moderation team can save valuable time while safeguarding their community. Overall, the success of this web app for automated spam moderation showcases the potential of AI for solving real-world problems; if you believe you could benefit from a similar application, don’t hesitate to contact us.