Paper accepted at MSR 2024 conference

Feb 16, 2024

We are thrilled to announce that our latest research paper, titled "Analyzing the Evolution and Maintenance of ML Models on Hugging Face" has been accepted for presentation at the International Conference on Mining Software Repositories (MSR2024).

Thrilled to announce our paper, "Analyzing the Evolution and Maintenance of ML Models on Hugging Face," has been accepted for #MSR2024! This research dives deep into the evolution and maintenance dynamics of ML models on the Hugging Face platform, offering crucial insights for the ML community's future.

Authors: Joel Castaño Fernández, Silverio Martínez Fernández, Xavier Franch, and Justus Bogner.

Our research centered around two pivotal questions:

1. What is the current status and evolution of the HF community?
2. How can we evaluate and categorize the maintenance status of ML models on HF through their commit information?

Our findings revealed significant trends:

1. 'Transformers' & 'PyTorch' stand out as the most used frameworks, with 'PyTorch' retaining its dominance agains other frameworks such as 'TensorFlow' or 'Jax'. A keen focus on Generative AI & NLP underscores the community's evolving interests.
2. A notable concentration of model popularity within a select group of authors emphasizes the power of collaboration in propelling ML innovation.
3. Investigation into model maintenance revealed diverse, right-skewed commit patterns, predominantly shaped by automated processes, and a prevailing focus on perfective maintenance.
4. Analysis of editing stages showed that a decrease in model file edits marks a transition to stability, while edits in README.md or config.json indicate final tuning. Synchronized edits across files underscore the interconnectedness of model development.
5. By categorizing models into 'High' and 'Low' maintenance, we provide a framework for discerning model upkeep, aiding users in selecting reliable and actively supported models.

This study not only sheds light on HF but also offers valuable lessons for the broader ML community on maintenance frameworks and the importance of collaborative development. We advocate for structured, transparent practices to enhance the field.

We're eager to foster discussions on the evolution and maintenance of ML models and their impact on the AI community. Let's drive forward the conversation on fostering effective ML development practices.

Paper: https://arxiv.org/abs/2311.13380
Data & Code: https://zenodo.org/records/10652101