Sylvain Gugger    @GuggerSylvain    9/27/2021      

Tired: Looking at the training loss curve while the model is training. Wired: Testing the model on any data you want while it's still training, thanks to the inference widgets on and see it getting better!
    20         112


Alex Smola    @smolix    11/30/2021      

Trn1 training instances in preview today! @awscloud Thanks team!
    2         2

Aran Komatsuzaki    @arankomatsuzaki    11/30/2021      

Fine-tuning on question-answer pairs struggles with math word problems, which are better solved with training on question-explanation-answer (QEA) pairs and generating E from Q and then A from QE pair. Now, can we solve other hard tasks better by building a dataset of QEA pairs?

Clare Lyle    @clarelyle    12/6/2021      

Early stopping validation loss (e.g. after 5 epochs of a 200E training budget), correlates _OK_ with modelsā€™ true final test performance ranking, but looking at the area under their training curve gets significantly higher correlation with final performance. šŸ“ˆ (2/4)

Alfredo Canziani    @alfcnz    11/24/2021      

Boltzmann machines are stochastic Hopfield nets with hidden units and can be used to learn the regularities of our data. The added noise allows the model to climb energy walls and land at wider lower minima. Restricted BMs allow us to speed up inference and trainingā€¦
    1         8

Ross Wightman    @wightmanr    11/23/2021      

No, because the baselines. There was a paper written about that ;) They used the same code, they used aug scheme that looks roughly based on DeiT (co authors of said paper) and yet didn't quote better RN scores (with same training setup) that they should be aware of.