robocrunch
Tom Goldstein
@tomgoldsteincs
Associate Professor at Maryland. Interested in security and privacy in machine learning, algorithmic bias, foundations of ML, optimization, applied math.
Tweets by Tom Goldstein
Is it possible to train an object detector from scratch without ImageNet pre-training? If so, what do you loose by doing this? Any caveats to be aware of?
Shared by
Tom Goldstein
at
5/2/2022
Our recent #CVPR2022 paper investigates how neural net architecture impacts decision boundaries. We find that two runs with the same architecture yield very similar boundaries. Two different architectures yield radically different boundaries (yet similar test acc). https://t.co/Lt92Dtwv5k
Shared by
Tom Goldstein
at
3/16/2022
My lab has been focusing on neural networks that learn planning algorithms end-to-end. Neural nets can solve reasoning and planning problems (e.g., mazes, chess) using "thinking systems" without any hand-crafted human search. https://t.co/1EoshQiIvn https://t.co/QyGEyzihIm
Shared by
Tom Goldstein
at
3/9/2022
There's been a lot of inflation since the 80s. Nesterov's 1983 paper on accelerated gradient methods is 4.5 pages long. The bibliography contains exactly 1 citation. https://t.co/fQd85kK4Eu
Shared by
Tom Goldstein
at
2/18/2022
To a theoretical computer scientist, $1 and $1,000,000 are equivalent because they are both O(1). This equivalence explains the career decisions that theoretical computer scientists have made.
Shared by
Tom Goldstein
at
11/23/2021
We have some idea what neural nets see, but what do they *hear*? We break down a speech recognition net and sonify (rather than visualize) what each neuron responds to. You can see (err...I mean hear) the process proceed from shallow to deep features. https://t.co/DoDGyEexSj
Shared by
Tom Goldstein
at
10/27/2021
This awesome @distillpub article will make you feel like you understand Gaussian Processes 🧐 https://t.co/3hkL4eJqD6 Then, this awesome book will make you realize that you just don’t 😵💫 https://t.co/HfeY0N5tm2 h
Shared by
Tom Goldstein
at
10/18/2021
Bigger ≠ better. I’d like this chart more if the Y axis displayed performance on downstream tasks.
Shared by
Tom Goldstein
at
10/12/2021
It's hard for me to know whether transformers are an improvement for vision. The amount of GPU firepower being used to tune them is orders of magnitude greater than what we had during the resnet renaissance. It's good to see some pushback. https://t.co/7T4PLEOk11
Shared by
Tom Goldstein
at
10/7/2021
Deep learning theories that rely heavily on the implicit regularization of SGD to explain generalization are likely incomplete, or fail to capture major phenomena behind neural nets, as strong generalization is still observed without SGD. Thanks: @jonasgeiping @micahgoldblum
Shared by
Tom Goldstein
at
10/1/2021
There's no evidence that SGD plays a fundamental role in generalization. With totally deterministic full-batch gradient descent, Resnet18 still gets >95% accuracy on CIFAR10. With data augmentation, full-batch Resnet152 gets 96.76%. https://t.co/iwIqQd7U1O
Shared by
Tom Goldstein
at
10/1/2021
We spent millions of GPU hours tuning small batch SGD because of its fast runtime and low memory. As a result, standard hyper-parameters consistently work better on small batches. Theorists have overfit their hypothesis to this data.
Shared by
Tom Goldstein
at
10/1/2021
We demonstrate this recurrent approach on three problems that are traditionally solved by classical algorithms: prefix sum computation, mazes, and chess. A system trained only on “easy” instances can solve “hard” instances at test time, provided it can think for longer.
Shared by
Tom Goldstein
at
7/9/2021
After training on small/easy problems, the power of the network can be increased at test time just by turning up the number of iterations of the recurrent unit. By “thinking for longer” the network assembles its knowledge into more complex strategies.
Shared by
Tom Goldstein
at
7/9/2021
My lab is building neural nets that emulate human “thinking” processes. They increase their intelligence after training by increasing their compute budget. By doing so, a net trained only on “easy” chess puzzles can solve “hard” chess puzzles without having ever seen one…
Shared by
Tom Goldstein
at
7/9/2021
In my grad class on optimization I teach my students to write unit tests for all these nasty math issues.
Shared by
Tom Goldstein
at
7/3/2021
I wish it was...Looks like a great study. I've done benchmarking of optimizers for their impacts on model robustness, but not for generalization, and definitely not at this scale! Phew...I'm happy that Adam is still a reasonable choice because I don't want to change my code 😬
Shared by
Tom Goldstein
at
6/16/2021
Code is now available for SAINT - attention networks for tabular data. https://t.co/1IcjHJWCQr
Shared by
Tom Goldstein
at
6/12/2021
If Kurt Gödel was an ML paper. Reviewer 2: "...extraordinary in the particular field of mathematical logic but that is a very limited field". Weak reject.
Shared by
Tom Goldstein
at
6/11/2021