Twitter Bot Investigation. I started to dive into the shit show… | by Daryan Hanshew | Jun, 2022
I started to dive into the shit show that involves Twitter misleading Texans on the number of bots on the website. If you want to read all the details here’s the report, however I am uncertain how successful this attempt will be after a read through Exhibit C depending on the details that can get released.
Note: If you look at the URL of that document why is a PDF file placed under a images directory, thought that was funny.
Twitter reported that roughly less than 5% of users on the platform are fake accounts. The reason for this investigation came from Elon Musk’s top priority of removing spam bots. However, the real reason is due to advertiser uncertainty of spending for advertisements. Understandably bots don’t represent potential customers which leads to hesitancy of future spending.
Side Note: I will say it’s incredibly easy to build a bot on Twitter, I created a page that runs on a cronjob within a day.
As shown in the tweet above this spun up an immediate investigation from Texas Attorney General to challenge the validity of the reported findings. Under the Texas Deceptive Trade Practices — Consumer Protection Act Twitter will be required to disclose all information in Exhibit C of the report by June 27th, 2022. Twitter came under massive scrutiny after releasing the 5% figure when the number could be up to 20%. This number is extremely important to businesses who utilize advertising to bring in new customers, as this reported number directly influences cost of advertising. The statement by Paxton summarizes why this is taking place, “if Twitter is misrepresenting how many accounts are fake to drive up their revenue, I have a duty to protect Texans”.
Side Note: Paxton has a controversial past, however this article won’t be a Hanshew classic rant on that, but instead reviewing the information provided in the report.
Reviewing Exhibit C
Now I can get into more of my opinion about what I think of the information being requested by Paxton. Exhibit C clearly states documents for 23 different categories of information, which heavily involve numbers of accounts for certain category type of users.
Some of these are pretty straight forward numbers that can be provided, but let’s get into the nuanced numbers that will be difficult to prove anything with.
First piece is users for connected from Texas, which is difficult to prove the accuracy of that.
5. Documents sufficient to show the number of monthly active users of Twitter in Texas for each month from 2017 to the present. 6. Documents sufficient to show the highest number of daily active users of Twitter in Texas for each month from 2017 to the present. 7. Documents sufficient to show Twitter’s “monetizable daily active users” in Texas for each month from 2017 to the present.
Now while it’s very feasible to get all IP addresses of connections and geo locate them down to a single state, I find it very doubtful this number will mean anything. The popularity of VPNs make this extremely difficult to get a accurate number for this result. I can’t imagine it’s a huge sway, but significant enough to make this invalid. The best you could do is find IP addresses for specific VPN providers and drop them from the list to produce a more accurate number, but it reduces the data (could be significantly).
Second piece I don’t want to copy half of the document, but mostly around proving users are indeed bots.
13. Documents that contradict Your public statements that fewer than 5% of “false or spam accounts” are included in Your “monetizable daily active users” metric.14. Documents evidencing, showing, or tending to show that more than 5% of Your users are Inauthentic Twitter Accounts.
While the 14th seems a little bit like bait, proving a user a bot is actually incredibly difficult at a high volume. As someone who’s worked with Threat Intelligence professionals, figuring out what is a threat actor programmatically to get 5% is insanely difficult. Sure you could find common patterns which could eliminate obvious threat actors, but only so much can be known. On top of that methods change which could make the bots more sophisticated on each iteration. I don’t know whether this is good for Twitter that it’s difficult to figure this out as they’ve put the most effort in or bad because without fail I know you could find discrepancies of all bot users provided by Twitter versus what others can find.
One thing I’d be interested to see is number of click farm type users Twitter could parse out as well. Since these are technically real users, but paid to do specific tasks, I wonder where that falls in technically in terms of business advertisement.
Third and final piece I’ll look over is around placement of advertisements.
20. Documents sufficient to show the amount of revenue generated by Twitter (by year) for the placement of advertisements targeting Persons in Texas on Twitter using Twitter Ads since 201721. Documents sufficient to show the number of Texas advertisers in 2021 and 2022 to date (by month) that placed advertisements on Twitter’s social network platform using Twitter Ads.22. Documents sufficient to show the number of advertisements (by month) placed by Texas Advertisers on Twitter’s social network platform using Twitter Ads.
This could be useful information, but I don’t know how comparing number of advertisements placed on a site is super helpful for an individual. I would of requested number of times per business are advertisements placed onto a site. This way you can find out based on how much a business pays, how the advertisement numbers look for them.
Overall I’d honestly like to see more from this document, unfortunately I don’t know what is legally viable, but going after that magical 5% of bots is a incredibly difficult task as nobody has access to that much data about faked users. The nice piece is they are going after documents of what defines a bot, so if a separate investigation was launched to find one case of a bot that wasn’t defined in the document, you could force Twitter to recalculate based on the newly discovered information.
Personally I lean on the side that botting is still significant on several platforms, but no where as easy as before. Twitter for sure has checks and balances in play to make creation of bots difficult, which makes me believe that the number isn’t 5% but not close to 20%. My guess is more than anything click farms are being used
I’d urge you to read the report yourself and determine if enough information was requested to make anything useful from this. My guess is this is a play to piss of social media companies as conservatives disputes the censorship on several platforms recently. I doubt this is for the greater good effort to help save small businesses, as advertising online has been effective so far (otherwise Facebook would of never made any money). Interesting to see what Twitters response will be, and see if any insight can be pulled into how they use user data for marketing.
Credit: Source link