robocrunch
Lucas Beyer
@giffmana
Researcher (@GoogleAI Brain Team in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian
Tweets by Lucas Beyer
What's more, they can further improve sample-efficiency of the fine-tuning by first running "pre-fine-tuning" (my word) w SimCLR on unlabeled me ...
Shared by
Lucas Beyer
at
5/20/2022
0-4
https://twitter.com/arankomatsuzaki/status/1527449409641885696
In many talks about our "pre-train large model on large web data" line of work, I've been asked "but this only works on ImageNet right? It will ...
Shared by
Lucas Beyer
at
5/20/2022
0-4
Computer vision in the wild! Every shop/restaurant/mall in Bangkok is equipped with this system that checks if you're wearing a mask and takes y ...
Shared by
Lucas Beyer
at
5/20/2022
0-3
https://twitter.com/anselmlevskaya/status/1526475487236661248
💯 This 👇 DL, and especially dense-matmul heavy models like transformer and mixer, are thin layers on top of the most optimized piece of code e ...
Shared by
Lucas Beyer
at
5/17/2022
0-6
Genuine question: what is a currently very successful model that could be seen as energy based without bending over backwards?
Shared by
Lucas Beyer
at
5/15/2022
1-1
https://twitter.com/giffmana/status/1214240746095730688
Wholeheartedly disagree. Transfer in few shots is *exactly* where scaled up pre-training outshines everything else! We already showed this in 20 ...
Shared by
Lucas Beyer
at
5/15/2022
2-1
https://twitter.com/NandoDF/status/1525397036325019649
I agree with Nando 💯 on everything, except AGI. But he has to, that's DM's whole pitch :) We've already been scaling for the past 4 years in CV ...
Shared by
Lucas Beyer
at
5/15/2022
0-3
I now see service robots actually deployed in malls in Bangkok, while 3-4y ago there was not a single one. This one carries food around; waiters ...
Shared by
Lucas Beyer
at
5/15/2022
1-2
My kid's book testing the limits of visual recognition. How on earth can you recognize this is a fish if not from context!? Makes me think the ...
Shared by
Lucas Beyer
at
5/13/2022
1-3
https://twitter.com/MrZiruiWang/status/1522039964103430144
There are many nice things about this contrastive-generative image-text paper. But one little detail I am especially happy about is that folks d ...
Shared by
Lucas Beyer
at
5/5/2022
0-7
https://twitter.com/XiaohuaZhai/status/1521754471730143233
Now, regarding the improved ViT baseline. It has long been thought that ViT on "small" scale like ImageNet-1k does not work well, requires a lo ...
Shared by
Lucas Beyer
at
5/4/2022
0-3
https://twitter.com/TerraBlvns/status/1519728062505623554
Surprised that "good xlang transfer" is a surprise at all: it's not like the English language is unique: de/fr/es/... share A LOT of word stems. ...
Shared by
Lucas Beyer
at
4/30/2022
1-1
https://youtu.be/mlXzufEk-2E
Something incredible happened: I know how the brain works! It cuts inputs into non-overlapping patches and processes them via non-convolutional ...
Shared by
Lucas Beyer
at
4/21/2022
0-3
And also MLP-Mixer, whose point it is that attention itself is not magic, and we actually show this at scale.
Shared by
Lucas Beyer
at
4/19/2022
0-4
@CVPR community, please re-tweet. Hypothetically, what change to the "quiet period" policy would you vote/wish for? Reminder, currently: arxiv ...
Shared by
Lucas Beyer
at
4/19/2022
1-3
Reminder of our 1st "Transformers for Vision" workshop at CVPR We got stellar speakers, but we're also eager for "extended abstract" (4pg) subm ...
Shared by
Lucas Beyer
at
4/8/2022
1-7
https://twitter.com/__kolesnikov__/status/1512073996870688775
heh, it seems like ViT has become the de-facto architecture for big vision models, and people now genuinely just forget to cite it. Maybe the ar ...
Shared by
Lucas Beyer
at
4/7/2022
0-10
clarification: - I was mostly thinking of software/algorithms, and mostly open-source one. (except TPUs) Additions: - SGD - Momentum - Backprop ...
Shared by
Lucas Beyer
at
4/2/2022
0-3
https://twitter.com/ethancaballero/status/1508880761314807824
hahaha, another one! What's going on? Is this also a massive grant application? If yes, why publish it? If no, go spend the time training model ...
Shared by
Lucas Beyer
at
3/29/2022
1-7
This new setting (LiT) halves the gap between zero-shot transfer and fine-tuning with fully-labeled dataset. By the way, it achieves a new zero- ...
Shared by
Lucas Beyer
at
3/28/2022
0-3
I hear you scream "but the model saw labels during pre-training!" In a controlled study, we show LiT works as well with unsup. image encoders (D ...
Shared by
Lucas Beyer
at
3/28/2022
0-3
Third, and coolest of all, a fixed good image embedding can be a bridge across languages, letting us train multilingual image-text models, witho ...
Shared by
Lucas Beyer
at
3/28/2022
0-3
Yes, freezing the image encoder is truly necessary, just init is not enough. It's key in several aspects. First, the pre-trained backbones are v ...
Shared by
Lucas Beyer
at
3/28/2022
0-4
Want to turn any vision backbone into an image-text model? Want to show the age-old "your model wouldn't recognize a cow on the beach" is a red ...
Shared by
Lucas Beyer
at
3/28/2022
2-2
https://twitter.com/LerrelPinto/status/1507342646427144199
This is super cool: using computer vision (MediaPipe) to transfer human's hand pose onto the robot, and then use that for imitation learning!
Shared by
Lucas Beyer
at
3/26/2022
1-3
https://twitter.com/coallaoh/status/1506972712744411140
Recordings of our ImageNet NeurIPS workshop are now online, check them out 👇 We had a great set of speakers with a mix of interesting and thou ...
Shared by
Lucas Beyer
at
3/25/2022
1-2
https://twitter.com/arankomatsuzaki/status/1503543031923945475
Pretty cool to see Mixer-like models now eating Transformer's lunch in LMs too. I'm too lazy to replace the ConvNet in this meme by LSTM and ge ...
Shared by
Lucas Beyer
at
3/15/2022
0-9
https://arxiv.org/abs/2106.05237
Sam is spot on, that's what's happening. We actually do train much much longer than usual on ImageNet for a few ablation studies in our distilla ...
Shared by
Lucas Beyer
at
3/9/2022
3-10
https://twitter.com/sandeepkaushik/status/1499109875384823809
100% agreed. Our "are we done with ImageNet?" paper is almost begging the community to stop making large test-sets full of labeling noise. That ...
Shared by
Lucas Beyer
at
3/2/2022
0-7
https://arxiv.org/abs/2003.07845
The issue is that the mismatch between minibatch statistics and full-dataset statistics becomes significant. Back then, we fixed this by switch ...
Shared by
Lucas Beyer
at
2/15/2022
0-5
(left) A very clear and interesting figure about the difference between "typical" vision (blue) and "typical" NLP (orange) stacks. When we scal ...
Shared by
Lucas Beyer
at
2/15/2022
3-16
https://twitter.com/ylecun/status/1485748926003167232
M/FAIR(?) unveils their GPU cluster, whose main motivation seems to be training.... Transformers! 😁 I hope ConvNets will be allowed to keep it ...
Shared by
Lucas Beyer
at
1/24/2022
1-25
https://arxiv.org/abs/2111.09162
It's about time: analog clock reading in the wild https://t.co/j9WmQHS92r A great example of an applied vision paper, let me walk you through w ...
Shared by
Lucas Beyer
at
11/18/2021
84-481
- If your paper is about sth specific, e.g. mobile, then measure exactly that, e.g. actual latency on device. Measure it for baselines too! - If ...
Shared by
Lucas Beyer
at
10/27/2021
1-15
https://arxiv.org/abs/2110.12894
You may know that I like to 🚮-talk model comparison that's based on number of parameters, since it's meaningless. @m__dehghani @YiTayML @anura ...
Shared by
Lucas Beyer
at
10/27/2021
34-152
And when I asked "why use caffee" the answer was always the same, and depressing "it's made for CV so it's better" - "better how?" - "faster con ...
Shared by
Lucas Beyer
at
10/18/2021
0-10
https://twitter.com/ylecun/status/1449013889304350732
I believe that the second thing that made the deep learning revolution happen, next to AlexNet results, is actually good and flexible libraries. ...
Shared by
Lucas Beyer
at
10/18/2021
3-75
🤯🥳🤯🥳 "achieves tests accuracies with vanilla deep networks that are competitive with ResNets (of the same width/depth)"
Shared by
Lucas Beyer
at
10/16/2021
2-3
https://twitter.com/dimadamen/status/1448395484410822657
Hey machine learning people 👋 This is not representative of computer vision people. Let's work together and be open-minded instead of resortin ...
Shared by
Lucas Beyer
at
10/14/2021
5-77
https://twitter.com/theshawwn/status/1446076902607888385
Evolution is just a search heuristic among many (gradient descent being another popular one). Its main advantage is a hype-able name. The searc ...
Shared by
Lucas Beyer
at
10/8/2021
0-11
I guess I'll retire before we stop doing this. Comparing different model architectures based on #params means nothing but "storage space needed ...
Shared by
Lucas Beyer
at
10/7/2021
3-39
[2/2] Again, boring but needed! However, I much prefer papers that are upfront: "Large improvement by replacing X w/ Transformer. Novelty Y hel ...
Shared by
Lucas Beyer
at
10/6/2021
0-61
[1/2] About half a decade ago, the computer vision field saw tons of domain-specific papers that show large gains replacing previous models with ...
Shared by
Lucas Beyer
at
10/6/2021
4-81
https://twitter.com/ethancaballero/status/1427679507062923268
This paper is all about large-scale pre-training of DL models. It completely lacks any mention of our work on this exact topic over the last > ...
Shared by
Lucas Beyer
at
8/18/2021
53-445
https://twitter.com/_clashluke/status/1414290906279186434
MLP-Mixer, the conceptually simplest of all recent architectures, also seems to work best, according to this independent replication. The bitte ...
Shared by
Lucas Beyer
at
7/11/2021
14-79
https://twitter.com/ak92501/status/1410761103626293250
I beg the community to please stop using parameters as x axis. It is *especially* meaningless for ViT-style models: B/32 has *more* params than ...
Shared by
Lucas Beyer
at
7/2/2021
14-149
Consider pre-training on ImageNet-21k more. It produces wildly more general features. Pre-training for 30ep on i21k costs the same compute as t ...
Shared by
Lucas Beyer
at
6/21/2021
2-18
We show overwhelming evidence that, for any practical purpose, one is better off transferring (fine-tuning) a pre-trained ViT. Using AugReg dur ...
Shared by
Lucas Beyer
at
6/21/2021
2-8
We pre-train many models in order to map out ViT's AugReg vs data size vs model size landscape and get solid insights. AugReg is not always goo ...
Shared by
Lucas Beyer
at
6/21/2021
3-8
"AugReg" is just a shorthand for "data augmentation and model regularization". We focus on the (now standard) RandAugment, Mixup, Dropout, and S ...
Shared by
Lucas Beyer
at
6/21/2021
3-12
4. Importantly, this simple strategy works on many datasets of various sizes, down to only 1020 training images, where anything else we tried ov ...
Shared by
Lucas Beyer
at
6/10/2021
0-17
2. Patience: The function matching task is HARD! We need to train *a lot* longer than typical, and actually we were not able to reach saturation ...
Shared by
Lucas Beyer
at
6/10/2021
1-14
0. Intuition: Want the student to replicate _the whole function_ represented by the teacher, everywhere that we expect data in input space. Thi ...
Shared by
Lucas Beyer
at
6/10/2021
2-12
https://arxiv.org/abs/2106.05237
So you think you know distillation; it's easy, right? We thought so too with @XiaohuaZhai @__kolesnikov__ @_arohan_ and the amazing @royaleerie ...
Shared by
Lucas Beyer
at
6/10/2021
91-430
6. Because of complex XLA optimization, one can't say upfront what will fit at the memory limit. We use an empirial "shapefinder" approach and s ...
Shared by
Lucas Beyer
at
6/9/2021
1-9
1. The scaling laws. It seems that in image classification too, Transformers follow a power-law (eg straight line in log-log), although it satur ...
Shared by
Lucas Beyer
at
6/9/2021
6-40
https://arxiv.org/abs/2106.04560
With @XiaohuaZhai @__kolesnikov__ @neilhoulsby we scale up plain old ViT on ~infinite data (3B🤯😬) We share our results (incl. scaling laws, I ...
Shared by
Lucas Beyer
at
6/9/2021
49-175
4/N What else? You could interpret A as adjacency matrix of fully-connected graph, and figure out what λ of that means. I don't know, I'm not g ...
Shared by
Lucas Beyer
at
6/2/2021
1-4
5a/N What else? A is softmax'd, so it can be interpreted as a MDP transition matrix, where the representation z is the "state". abs(λ)<=1 fo ...
Shared by
Lucas Beyer
at
6/2/2021
1-4