Thomas Wolf    @Thom_Wolf    11/25/2021      

I see people commenting that this is either an unfortunate reality or better than releasing nothing I disagree I think we can find ways to share code/datasets/models I also tend to think industry papers without any of these are closer to press-release than science reports
  
    7         88










 
  Related  

Thomas Wolf    @Thom_Wolf    11/25/2021      

Not singling out this paper but I’m worried the field of large scale multimodal training has recently become a Wild Wild West of unshared training code running on large unreleased datasets to give private models It’s far from ideal for reproducibility and good science practices
  
    6         53



Hugging Face    @huggingface    10/4/2021      

The @PyTorch-based pipelines in 🤗 Transformers now support native torch datasets. GPUs were often underutilized: 30-50%. They now automatically use torch's `DataLoader` when possible leading to much better GPU utilization (90+% on most models)! 🤯
  
    8         59



Thomas Wolf    @Thom_Wolf    8/19/2021      

A few years ago I was mostly interested in models, creating 🤗transformers, adding BERT, GPT, T5… Over time I’ve seen my interests shift to data (sharing, evaluation, processing) leading to 🤗datasets And I see many people around me follow a similar path We are slowly maturing
  
    65         420



François Chollet    @fchollet    12/4/2021      

Adoption as a research validation mechanism is in fact far preferable, because it de-incentivizes bullshitting. Objective reality is a better grounding function than what people think of your method upon reading your paper. https://t.co/nZoaGpnvrQ
  
    4         63