Anthony Goldbloom    @antgoldbloom   ·   9/14/2021
Kaggle's dataset platform passed 100K datasets last month If you're looking for datasets for your ML and data science projects, this is a great place to look
Yee Whye Teh    @yeewhye   ·   9/6/2021
The Turing-JBC-RSS Lab led by @cholmesuk, Peter Diggle & Sylvia Richardson is looking for researchers to be seconded to join the various interesting projects on using statistics, data science and ML to help with understanding the pandemic!
Weights & Biases    @weights_biases   ·   9/8/2021
See how quickly you can view and interactively explore language datasets with W&B Tables! In this video, @metaphdor demos some queries on the GoEmotions dataset 🎭
Thomas Wolf    @Thom_Wolf   ·   9/15/2021
Starting a big project takes a lot more time & energy than people expect. I've been pushing mostly one project per year: -2019 🤗Transformers -2020 🤗Datasets -2021 @BigscienceW I used to find it frustratingly slow now I accept it Give your projects the time they need to grow
Michal Wolski    @michalwols   ·   8/27/2021
With all of this #FoundationModels talk is anyone working on foundation datasets? A shared continuously growing public dataset to rival google's JFT would be much more valuable to the field than gigantic transformers trained on random text from reddit.
Sasha Rush    @srush_nlp   ·   9/8/2021
New preprint: Datasets ( documents the Hugging Face Datasets project, now containing more than 700 NLP datasets from over 300 contributors. NLP models haven't changed much recently, but datasets, and how we use and document them, have changed a lot ...
Hugging Face    @huggingface   ·   8/27/2021
The 🤗Hugging Face Hub Python wrapper makes it super easy to work with 🤗 Hub model & dataset repos. Search for, upload, download models & datasets without leaving your python runtime! We just released v0.0.16 w/ huge QOL upgrades 👏🎉 Come contribute!
Nandan Thakur    @Nthakur20   ·   9/9/2021
🚨New 🍻BEIR preprint on Arxiv🚨 What’s 🆕? Evaluated latest SOTA🚀 reranking (MiniLM), dense (TAS-B), and sparse (DeepCT, docT5query) models. Added a 🆕 dataset: Robust04. Using hole@10, found a few datasets with annotation biases. w/ @Nils_Reimers. pdf:
Y Combinator    @ycombinator   ·   8/17/2021
Welcome to S21, Unbox! Unbox is building a collaborative QA platform for the next generation of machine learning models. They make it easy to track and version all your models and datasets, allowing the team to focus on building production-ready models.
Anna Rogers    @annargrs   ·   7/28/2021
#NLPaperAlert: QA Dataset Explosion!🔥 A survey of 200+ QA/RC datasets proposing a taxonomy of formats & reasoning skills. Also in the bag: modalities, conversational QA, domains & beyond-English data. Honored to work on this with @nlpmattg & @IAugenstein
Nils Reimers    @Nils_Reimers   ·   9/8/2021
Easy access to datasets is a big game changer: It allows anyone to train on many datasets, leading to strong & robust models. Would be great to see more people contributing datasets, as this really is the limiting factor for many Machine Learning applications.
Michael Poli    @MichaelPoli6   ·   7/26/2021
Took a while. Get in touch if you're interested in contributing to open-source for neural diff eqs and implicit models! We have a lot of other interesting projects and collaborations underway.
Benedict Evans    @benedictevans   ·   8/28/2021
I've said this before, but if you write about Theranos as a 'Silicon Valley startup that raised from VCs' you will not understand what you're looking at. The fact that it didn't follow the SV model and didn't raise from VCs was an important part of the story.
Rob Salomone    @SalomoneRob   ·   7/7/2021
New version of @PyroAi just came out, exciting! Been looking forward to teaching classes using it for implementing Normalizing Flows and Deep Latent Variable models (just a small amount of what Pyro can do!) at the @DiscoverAMSI Data Science Winter School starting next week!
Nils Reimers    @Nils_Reimers   ·   9/8/2021
📂1.2 Billion Training Pairs Training on large datasets is essential to generalize well across domains and tasks. Previous models were trained on rather small datasets of a few 100k train pairs and had issues on specialized topics. We collected 1.2B training pairs from ...
