The terrifying reality of GitHub's fake star economy



On GitHub, users can bookmark repositories by 'starring' them. The number of stars is seen as an indicator of a repository's popularity, and investors sometimes use this as a criterion for selecting investment targets. However, the industry is concerned about the proliferation of businesses that artificially inflate these star counts.

Inside GitHub's Fake Star Economy | Awesome Agents

https://awesomeagents.ai/news/github-fake-stars-investigation/

GitHub describes stars as 'making it easier to find repositories and topics later,' and adds that 'starring a repository is also a way of showing appreciation to the maintainer for their work.' Repositories that receive many stars are also displayed in popularity rankings, so users sometimes decide whether or not to use a repository based on the number of stars.

Some investors also use the number of stars as a criterion for selecting investment targets. According to investor and analyst Jordan Segal, the median number of stars for projects that have reached the early stage of fundraising, the 'seed round,' is 2850.

AI journalist Elena Marchetti pointed out that there are businesses that artificially inflate the number of stars, stating, 'In seed rounds, you can typically raise over $1 million (approximately 160 million yen). Considering that 2,850 stars can be purchased for a maximum of $300 (approximately 48,000 yen), this generates an incredible return on investment.'

The conclusive evidence of star inflation came from a study by Carnegie Mellon University and North Carolina State University. The research team analyzed data from 2019 to 2024 and found that 15,835 repositories had artificially inflated their star counts, resulting in a total of 4.5 million stars being awarded.

The practice of artificially inflating stars on GitHub is rampant, and a study suggests that approximately 70% of the inflated repositories are linked to malware - GIGAZINE



Marchetti also conducted his own analysis of 20 repositories and shared how to spot signs of inflated data.

According to journalist Marchetti, repositories unaffected by the star inflation scheme tend to be starred by developers who have been using GitHub for many years, have their own projects, and follow other users. In these repositories, 'ghost accounts'—those with zero projects, zero followers, and no self-introduction—account for only about 1% of the total stars.

On the other hand, repositories suspected of inflating their subscriber count tend to have a higher proportion of ghost accounts. Among the repositories investigated by journalist Marchetti, one particularly typical example had a ghost account ratio of up to 28.7%.

These accounts weren't 'newly created'; on the median, they were over 1,000 days old. However, their content was fictitious, with one-third having zero original repositories, one-half to four-fifths having zero followers, and one-quarter being complete ghost accounts with zero followers each. Marchetti reports, 'These are old accounts that were bought or cultivated to inflate star numbers.'



Another strong signal is the ratio of forks to stars. For example, the 'Flask' repository, which had no suspected inflated star counts, had 235 forks per 1,000 stars, while 'FreeDomain,' which had 28% ghost accounts, only had 17 forks per 1,000 stars. Marchetti pointed out, 'It's strange that a repository with 157,000 stars has hardly been forked.'

Similarly, the ratio of watchers to stars is also a useful indicator. As mentioned earlier, FreeDomain has only one watcher per 1000 stars, but Flask has 29.

Marchetti reports that at least 12 websites are selling GitHub stars. These sites offer a variety of products, including 'newly created accounts' and 'accounts that have been created for several years and have a contribution history,' with some selling for as much as $0.90 (approximately 140 yen) per star.

While download numbers are sometimes seen as an indicator of popularity in addition to the number of stars, developer Andy Richardson demonstrated that he could boost his repository's download count to nearly 1 million in a week for free, suggesting that downloads are not a reliable indicator either.



In the United States, the buying and selling of influence generated by bots and fake accounts is prohibited by law, and journalist Marchetti points out that 'GitHub stars also fall under this framework.' He stated that if a startup buys fake GitHub stars while raising funds, and investors trust those metrics and invest their money, the framework of communications fraud should apply.

While GitHub's terms of service prohibit the awarding of fake stars, journalist Marchetti questioned the fact that enforcement is reactive and there are no transparency reports regarding star manipulation. He urged people not to blindly trust star counts and to verify their reliability.

in Note,   Software, Posted by log1p_kr