An Empirical Analysis of Typosquatted Models and Datasets in Open ML Repositories

Open machine learning repositories such as the Hugging Face ecosystem play a central role in enabling the reuse and rapid deployment of pre-trained models and datasets. However, their openness also introduces emerging software supply-chain security risks. Recent research, including the paper “Exploring Typosquatting Threats in the Hugging Face Ecosystem” has demonstrated that typosquatting, i.e. the publication of deceptively named artifacts resembling popular resources, already occurs. While this prior work highlights the prevalence of suspicious models and datasets, it remains unclear what these typosquatted artifacts actually do in practice and whether they pose concrete functional, security, or privacy risks to users.

The goal of this thesis topic is to address this gap through an empirical comparative analysis of typosquatted and original artifacts within the Hugging Face ecosystem.