Co-founder & Chief Scientist at @HuggingFace – I lead the Open-Source & Science teams – 🤗Transformers & 🤗Datasets libraries – @BigScienceW research workshop


Thomas Wolf    @Thom_Wolf    11/25/2021      

I see people commenting that this is either an unfortunate reality or better than releasing nothing I disagree I think we can find ways to share code/datasets/models I also tend to think industry papers without any of these are closer to press-release than science reports

Thomas Wolf    @Thom_Wolf    11/25/2021      

Not singling out this paper but I’m worried the field of large scale multimodal training has recently become a Wild Wild West of unshared training code running on large unreleased datasets to give private models It’s far from ideal for reproducibility and good science practices
Thomas Wolf    @Thom_Wolf    11/24/2021      

A strange form of tunneling vision I see often in AI consists in beginning to think that humans work just like ML models, that we are in the end just RL agents or language models (depending what one works on) The human experience is much more diverse & complex than any of these
Thomas Wolf    @Thom_Wolf    11/18/2021      

The public supercomputer used by @BigscienceW is doubling its size🤩 BigScience helped a lot to make this a reality and I’m very excited about this outcome. Public compute clusters are critical to reduce the divide between industry & academic AI research
Thomas Wolf    @Thom_Wolf    8/19/2021      

A few years ago I was mostly interested in models, creating 🤗transformers, adding BERT, GPT, T5… Over time I’ve seen my interests shift to data (sharing, evaluation, processing) leading to 🤗datasets And I see many people around me follow a similar path We are slowly maturing
