robocrunch
Yann LeCun
@ylecun
Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
Tweets by Yann LeCun
This is not a statement against modeling uncertainty. I'm all for that! (Though, I don't think *probabilistic* models are necessary for that). It's a statement about having given too much importance to it *at the expense* of other key questions, like representation learning.
Shared by
Yann LeCun
at
5/14/2022
This is not to say that accurate probabilistic inference is not useful. Comments? Opinions? 10/N
Shared by
Yann LeCun
at
5/14/2022
As it turned out, solving the problem of learning hierarchical representations and complex functional dependencies was a much more important issue than being able to perform accurate probabilistic inference with shallow models. 9/N
Shared by
Yann LeCun
at
5/14/2022
It's interesting to observe that almost none of this is relevant to today's top speech, vision, and NLP systems. 8/N
Shared by
Yann LeCun
at
5/14/2022
In fact, "exponential family" pretty much means "shallow": the log-likelihood can be expressed as a linearly parameterized function of features (or simple combinations thereof). Learning the parameters of the model was seen as just another variational inference problem. 7/N
Shared by
Yann LeCun
at
5/14/2022
All one had to do was to compute some sort of log likelihood by linearly combining features, and then use one of the above-mentioned sophisticated inference methods to produce marginal distributions over the unknown variables, one of which being the answer, e.g. a category. 6/N
Shared by
Yann LeCun
at
5/14/2022
But almost none of this work was concenred with the problem of learning representations. Features were assumed to be given. The structure of the graphical model, with its latent variables, was assumed to be given. 5/N
Shared by
Yann LeCun
at
5/14/2022
Much of this was riding on previous work on Bayesian networks, factor graphs and other graphical models. That's how one learned about exponential family, belief propagation, loopy belief prop, variational inference, etc, Chinese restaurant process, Indian buffet process.. 4/N
Shared by
Yann LeCun
at
5/14/2022
There were debates at computer vision workshops about "generative models vs discriminative models". There were heroic-yet-futile attempts to build object recognition systems with non-parametric Bayesian methods. 3/N
Shared by
Yann LeCun
at
5/14/2022
This led to a flurry of work on probabilstic generative models such as Hidden Markov Models in speech, Markov random fields and constellation models in vision, and probabilistic topic models in NLP, e.g. with latent Dirichlet analysis. 2/N
Shared by
Yann LeCun
at
5/14/2022
Researchers in speech recognition, computer vision, and natural language processing in the 2000s were obsessed with accurate representations of uncertainty. 1/N
Shared by
Yann LeCun
at
5/14/2022
Actually, AI R&D has *3* flavors: 1. Applied AI: developing real-world product. 2. Press Release AI: maximally-impressive splashy demos. 3. AI research: new *published* methods that make the field progress. - 3 feeds into 1 and 2. - FAIR is pretty much entirely focused on 3.
Shared by
Yann LeCun
at
5/14/2022
Attention may be all you need. But short attention span is all you get.
Shared by
Yann LeCun
at
5/11/2022
I've always agreed with Paul on this: make reasoning compatible with the neural substrate. That translates into: make reasoning compatible with gradient-based learning (i.e. compatible with deep learning).
Shared by
Yann LeCun
at
5/8/2022
Empty semantics. Obviously, *any* complete AI system, even one built around a big neural net, will have all kinds of other code around it. It's particularly true of systems designed to act in the real-world. Basic engineering, really. This is a completely vacuous debate.
Shared by
Yann LeCun
at
5/8/2022
What it takes to successfully train a large language model, in excruciating details. There was open science. That's open-open science.
Shared by
Yann LeCun
at
5/4/2022
OPT-175b: Open Pre-Trained language model with 175 billion paramaters is now available to the research community. Blog post: https://t.co/NBacfKGUHC Paper: https://t.co/tLYUoWisZA Code + small pre-trained models: https://t.co/M9nLCOf3Rx (using OPT-175b requires a registration)
Shared by
Yann LeCun
at
5/3/2022
The Adaptive Systems Research Dept, where I worked, was located in aisle 4G3 (which I believe is now called 4G2). The location is now occupied by an Nvidia group that works on autonomous driving. In fact, that group is led by Urs Muller, who used to work in the same department!
Shared by
Yann LeCun
at
4/24/2022
Nice tutorial+github about VICReg, a general method for Self-Supervised Learning. The VICReg paper by @AdrienBardes, Jean Ponce and me is to be presented at ICLR next week: https://t.co/H7crDPq1Qn
Shared by
Yann LeCun
at
4/22/2022
Very important thread. - social networks are the new battle ground of societal issues. - despite what many think, social networks don't take positions on issues. They moderate behavior. Because bad behavior hurts people. - Elon is in for a world of pain if he runs Twitter.
Shared by
Yann LeCun
at
4/16/2022
Cool demos of a vision system pre-trained with self-supervised learning (DINO method): segmentation, image retrieval, similarity search for art pieces and monuments, patch-level retrieval, patch similarity search. https://t.co/xhpq5hJiq3
Shared by
Yann LeCun
at
4/11/2022
Conversely, color jitter does not affect shape or texture-based categories (e.g. zebra), but affects color-based categories (like basket ball). Strangely, even weight decay affects different classes differently.
Shared by
Yann LeCun
at
4/8/2022
New paper: "The Effects of Regularization and Data Augmentation are Class Dependent" by Randall Balestriero, Leon Bottou, Yann LeCun TL;DR: Turns out some types of data augmentation helps some categories and hurt others... https://t.co/0K4ufsu0Na
Shared by
Yann LeCun
at
4/8/2022
It wouldn't take long for some of us. Attention is a form of multiplicative interactions. Those were around in the 1990s (even in the 1980s). You might think the main difference is the softmax normalization of coefficients. But those existed in mixture of experts networks.
Shared by
Yann LeCun
at
4/5/2022
I sat down with Tiernan Ray from ZDNet for an interview about my vision for AI, Self-Supervised Learning, world models, energy-based models, and speculations about consciousness. https://t.co/9Wzxck3MRE
Shared by
Yann LeCun
at
3/31/2022
TL;DR: neural nets with unitary matrices are interesting beasts: invertible, no vanishing/exploding gradient, computation akin to quantum computing. Training them is hard. We propose a low-rank (low-cost) update method to update unitary weight matrices with gradient descent.
Shared by
Yann LeCun
at
3/29/2022
Agreed. Also "batch". Relevant metric: number of single-sample gradient evaluations. Countless students have told me "I used 10% of the training set because 100 epochs on the full set takes too long" My answer: "there is *never* *any* *reason* to drop training samples."
Shared by
Yann LeCun
at
3/29/2022
Siamese nets, Self-Supervised Learning, & data augmentation aren't just for image recognition. They also apply to textual information retrieval. With Distributed FAISS, they enable very large-scale information retrieval systems. A good thing to ground dialog systems in reality.
Shared by
Yann LeCun
at
3/29/2022
Omnivore, FLAVA, CM3, Data2Vec: a new crop of large-scale, multi-modal, multi-task, Self-Supervised Learning systems from Meta AI / FAIR.
Shared by
Yann LeCun
at
3/29/2022
Contrastive Self-Supervised Learning can fall victim to dimensional collapse. One reason I prefer non-contrastive methods these days.
Shared by
Yann LeCun
at
3/28/2022
A biologically-plausible continuous-time combination of Barlow Twins/VICReg and Slow Feature Analysis, with connections to Hebbian learning and Bienenstock-Cooper-Munro theory.
Shared by
Yann LeCun
at
3/21/2022
I never called what I was working on AI (AI was supposed to designate "symbolic" methods). Then around 2013, the public and the media became interested in deep learning & *they* called it AI. We could not explain that AI people didn't view DL as AI. Because it made no sense.
Shared by
Yann LeCun
at
3/15/2022
The code for VICReg is open sourced. "VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning" by Adrien Bardes, Jean Ponce, Yann LeCun ICLR 2022 paper: https://t.co/H7crDPHCHV Code: https://t.co/oadSBT61P3
Shared by
Yann LeCun
at
3/3/2022
Yay! Graph Transformer Networks. Best thing ever for structured sequence prediction.
Shared by
Yann LeCun
at
3/3/2022
Any function (trained or not) that maps multiple different inputs to identical or similar outputs is subject to adversarial examples. It's particularly true of functions with high-dim inputs and discrete outputs, like image classifiers. Nothing to do with DL or CNN specifically.
Shared by
Yann LeCun
at
2/28/2022
Democracy, free expression, content moderation, disinformation, political polarization, AI, AR, VR, the Metaverse, the meaning of life.
Shared by
Yann LeCun
at
2/26/2022
- JEPAs can be stacked to make long-term/long-range predictions in more abstract representation spaces. - Hierarchical JEPAs can be used for hierarchical planning. Short explanatory blog post: https://t.co/JADOFrvocN
Shared by
Yann LeCun
at
2/25/2022
- JEPA can be trained non-contrastively by (1) making the representations of inputs maximally informative, (2) making the representations predictable from each other, (3) regularizing latent variables necessary for prediction. 4/
Shared by
Yann LeCun
at
2/25/2022
TL;DR: - autonomous AI requires predictive world models - world models must be able to perform multimodal predictions - solution: Joint Embedding Predictive Architecture (JEPA) 2/
Shared by
Yann LeCun
at
2/25/2022
JEPAs are: 1. non-generative: output is encoded, details are eliminated 2. non-probabilistic: energy-based, not normalizable. 3. trained non-contrastively (with VICReg). I think 1 and 2 would very much go against Josh's Bayesian philosophy. 3/3
Shared by
Yann LeCun
at
2/24/2022
...so they can learn hierarchical representations and enable hierarchical planning. The novelty (I think) is to use Joint Embedding Predictive Architectures (JEPA), which perform predictions in representation space. 2/
Shared by
Yann LeCun
at
2/24/2022
Yeah so, the idea that, somehow, AI is an environmental disaster in the making? Total BS! Nice study by our colleagues at Google on the projected carbon footprint of AI training. An analogous analysis for Meta would be qualitatively similar. https://t.co/v1XDnDY4NN
Shared by
Yann LeCun
at
2/24/2022
Blog post on my vision of a possible path towards autonomous AI: ML that learns more like animals & humans, machine common sense... Buids on SSL, Hierarchical Joint Embedding Predictive Architecture, Energy-Based Models, & planning under uncertainty. https://t.co/7t5TlMW9OE
Shared by
Yann LeCun
at
2/23/2022
Every time the learning process in a deep neural net picks one side of a saddle point, there is a symmetry being broken.
Shared by
Yann LeCun
at
2/22/2022
A moderately aligned super-intelligence today is called a corporation. We align their objective function to the common good using laws. Nothing new here.
Shared by
Yann LeCun
at
2/20/2022
A moderately aligned super-intelligence today is called a corporation. We align their objective function to the common good using laws. Nothing new here.
Shared by
Yann LeCun
at
2/20/2022
"Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision" from FAIR Paris & NYC Large-scale experiment with 10 billion param RegNet pre-trained by SSL with SwAV on 1 billion random public Instagram photos. https://t.co/ABSXKFSgqs
Shared by
Yann LeCun
at
2/19/2022
Unsupervised, non-linear, non-convex, non-probabilistic: words that designate domains whose enormous size and diversity are difficult to fathom because we are so familiar with their opposites.
Shared by
Yann LeCun
at
2/19/2022
Julien left FB a while back to build products around Skip (like this one: https://t.co/V2p6XP5Hi8 ) He implemented an experimental deep learning framework in Skip, based on the ArrayFire tensor engine, and inspired by FlashLight 5/Fin
Shared by
Yann LeCun
at
2/19/2022
- things that can be pre-computed and cached are. - Lisp like: closures, sequences of instructions return the last value Now, people just want to use Python at the top level. This is unlikely to change any time soon. 3/
Shared by
Yann LeCun
at
2/19/2022
- safe parallelism - strongly typed with type inference. - simple syntax - functional, but can also be object oriented - variables are unmutable by default - garbage collector is predictable 2/
Shared by
Yann LeCun
at
2/19/2022
Skip is an open-source programming language with lots of interesting features. Could it be the basis of a new breed of deep learning frameworks? Skip website: https://t.co/gerYouLPlc Prototype DL framework in Skip: https://t.co/1EmE4Xj2BM Features: 1/
Shared by
Yann LeCun
at
2/19/2022
Can large language models fine-tuned for medicine imitate doctors? Spoiler alert: NO! But that doesn't mean they can't be useful. Nice work by @NablaTech
Shared by
Yann LeCun
at
2/17/2022
Nope. Not even for true for small values of "slightly conscious" and large values of "large neural nets". I think you would need a particular kind of macro-architecture that none of the current networks possess.
Shared by
Yann LeCun
at
2/12/2022
Nope. Not even for true for small values of "slightly conscious" and large values of "large neural nets". I think you would need a particular kind of macro-architecture that none of the current networks possess.
Shared by
Yann LeCun
at
2/12/2022
The prevalence of hate speech on Facebook is 0.03%. This number has decreased in recent years due to progress in multilingual content understanding, with self-supervised learning and "few-shot learning", which enables fast adaptation to changing conditions https://t.co/NLyDzYG10X
Shared by
Yann LeCun
at
12/10/2021
PyTorch Developer Day is upon us. Starting tomorrow at 10:00 EST.
Shared by
Yann LeCun
at
11/30/2021
People who think evolution works through random mutations and selection need to explain how intelligent life appeared using nothing else. Clearly, any optimization process is more efficient if it uses some sort of gradient estimation.
Shared by
Yann LeCun
at
11/23/2021
Yoshua and I talk about what we are most excited about in the quest for better machine intelligence.
Shared by
Yann LeCun
at
11/22/2021
XLS-R: multilingual speech recognition system pre-trained with SSL on 128 languages. Smashing results. And it's open source.
Shared by
Yann LeCun
at
11/18/2021
Happy birthday human neural nets! You are nearly 1,000,000 years old now, and still utterly fooled by images like these. https://t.co/GHRpu7irQg
Shared by
Yann LeCun
at
11/16/2021
He told me that our 1989 Neural Comp paper on ConvNets shocked him because he was working on training the Neocognitron with backprop. He abandonned the project after our paper. Fukushima's work influenced me. But multilayer training (through backprop) is a pretty big deal.
Shared by
Yann LeCun
at
10/15/2021
Using deep learning to predict the (in)stability of planetary systems. Another example of training deep nets to cheaply predict the long-term outcomes and global properties of systems that can be predicted expensively by "traditional" simulation.
Shared by
Yann LeCun
at
10/15/2021
There is lot more to natural languages than text: tone, accent, expression, prosody, timbre, pitch..... "Textless NLP" represents speech through a stream of discrete tokens, automatically learned through self-supervised learning, directly fed with raw speech waveform! A new era.
Shared by
Yann LeCun
at
9/10/2021
Excellent article about fiscal policies that fairly distribute the increased wealth brought about by automation. TL;DR: Don't tax the robot, that will lead to more unemployment. Tax capital more relative to labor. Also, what's a robot, really?
Shared by
Yann LeCun
at
8/26/2021
Largest private funding round ever for a biotech startup in Canada. Congrats @frey_brendan and the @deepgenomics team.
Shared by
Yann LeCun
at
7/28/2021
2 more lectures from the Spring'21 NYU Deep Learning course. Energy-Based Model, a general framework to describe most ML/stat methods, whether probabilistic or not, supervised, weakly sup, self-sup, or unsup, contrastive or not, with latent variables or not, generative or not.
Shared by
Yann LeCun
at
7/28/2021
Modern number notation. Algebra, trigonometry, calculus. Optics. Atomic theory, thermodynamics, and chemistry. Thermal engines. Electricity and electromagnetism. Computation. Basically: first 3 years of undergraduate engineering program.
Shared by
Yann LeCun
at
7/28/2021
Another lecture of NYU Deep Learning, Spring 2021is available. @alfcnz gave me magic fingers.
Shared by
Yann LeCun
at
7/21/2021
Congrats @tydsh @endernewton and @SuryaGanguli ! The paper explains why/how some non-contrastive self-supervised methods for joint embedding architectures (e.g. BYOL) actually work and manage to avoid collapse despite having no explicit provision to do so.
Shared by
Yann LeCun
at
7/20/2021
85% of FB users have been or want to be vaccinated. Many countries that use FB as much as the US have much lower level of anti-vaxxers. The cause of anti-vaxx sentiment in the US is not social media. In fact FB has done a lot to promote vaccinations, as Guy Rosen writes here.
Shared by
Yann LeCun
at
7/18/2021
BlenderBot 2.0: the first chatbot that can hold a conversation on any topic. It has a long term memory and can search for information on the Internet.
Shared by
Yann LeCun
at
7/16/2021
Pantheon of scientific territorial hubris: Chemistry is just physics Biology is just chemistry Neuroscience is just biology Psychology is just neuroscience Economics is just psychology Computer science is just math Machine learning is just statistics All of it is just philosophy!
Shared by
Yann LeCun
at
7/15/2021
The first 2 weeks of the 2021 edition of the NYU Deep Learning course are now on line, including lectures, practicum, homework, etc.
Shared by
Yann LeCun
at
7/14/2021
Note: even in the late 1980s, I was in the "Early 2020s" category. The time it took to train the "ConvNet du jour" on the "problem du jour" has consistently been around 10 days over the last 33 years. This is independent of compute power and data.
Shared by
Yann LeCun
at
7/13/2021
Interesting tutorial next week by @HazanPrinceton. I'd say: "RL is control without gradients and (mostly) without models" Since control preceded RL, we could view RL as a "degraded" version of on-line stochastic optimal control where the objective is not differentiable.
Shared by
Yann LeCun
at
7/13/2021
ML researchers: Late 1990s: "Method X is worthless because the Matlab code takes more than 20 minutes to converge" Early 2020s: "Method X is great because with <favorite_DL_framework>, I can train it on 10 billion samples using 1000 {GPU,TPU}s in less than a week."
Shared by
Yann LeCun
at
7/13/2021
Now, by "new paradigms", I mean new learning paradigms. And there is no doubt that they will involve some sort of gradient-based optimization applied to complex architectures (aka "deep learning"). The "new" part should focus on learning world models in a task independent way.
Shared by
Yann LeCun
at
7/13/2021
We do not have an answer to that question, and the gap to bridge is enormous (how can people learn to drive a car in 20h of practice?) Decisive advances towards an answer will mark a new era in AI. That's why I work on self-supervised learning. It's our best shot at the moment.
Shared by
Yann LeCun
at
7/13/2021
So many exciting new frontiers in ML, it's hard to give a short list, particularly in new application areas (e.g. in the physical and biological sciences). But the Big Question is: "How could machines learn as efficiently as humans and animals?" This requires new paradigms.
Shared by
Yann LeCun
at
7/13/2021
A new large-scale language model that can predict the effect of mutations on the function of a protein. By the FAIR+NYU Protein research group led by @alexrives .
Shared by
Yann LeCun
at
7/12/2021
An article at Forbes about new research at FAIR on AI for robotics. https://t.co/UPC4Abo2k0
Shared by
Yann LeCun
at
7/9/2021
A blog post detailing the new work on Rapid Motor Adaptation for legged robot locomotion from FAIR+BAIR+CMU https://t.co/7l2CK54Ehs
Shared by
Yann LeCun
at
7/9/2021
Rapid Motor Adaptation: great new work on adaptive robot locomotion from FAIR+BAIR+CMU.
Shared by
Yann LeCun
at
7/9/2021
Predatory academic publishers must die. In fact, for-profit academic publishers must go away. Even non-profit publishers that are not open access must disappear, or else change their economic model. Look at JMLR, ICLR, NeurIPS, ICML: all free and open access with no fees.
Shared by
Yann LeCun
at
7/9/2021
Moral of the story: the patent system can be very counterproductive when patents are separated from the people best positioned to build on them. Patents make sense for certain things, mostly physical things. But almost never make sense for "software", broadly speaking. 9/N, N=9
Shared by
Yann LeCun
at
7/6/2021
It wasn't until I left AT&T in early 2002 that I restarted work on ConvNets. I was hoping that no one at NCR would realize they owned the patent on what I was doing. No one did. I popped the champagne when the patents expired in 2007! 🍾🥂 8/N
Shared by
Yann LeCun
at
7/6/2021
So I stopped working on ML. Neural nets were becoming unpopular anyways. I started a project on image compression for the Web called DjVu with Léon Bottou. And we wrote papers on all the stuff we did in the early 1990s. 7/N
Shared by
Yann LeCun
at
7/6/2021
A complete check reading system was eventually built that was reliable enough to be deployed. Commercial deployment in banks started in 1995. The system could read about half the checks (machine printed or handwritten) and sent the other half to human operators. 3/N
Shared by
Yann LeCun
at
7/6/2021
We started working with a development group that built OCR systems from it. Shortly thereafter, AT&T acquired NCR, which was building check imagers/sorters for banks. Images were sent to humans for transcription of the amount. Obviously, they wanted to automate that. 2/N
Shared by
Yann LeCun
at
7/6/2021
There were two patents on ConvNets: one for ConvNets with strided convolution, and one for ConvNets with separate pooling layers. They were filed in 1989 and 1990 and allowed in 1990 and 1991. 1/N
Shared by
Yann LeCun
at
7/6/2021
This is part of a 10-year project that involved some 3D data collection (with drones and all) and some 3D reconstruction wizardry from Jean Ponce and his collaborators. More info on the project here: https://t.co/4MIfdKxHIj 2/N
Shared by
Yann LeCun
at
7/4/2021
A *fantastic* interactive website showing a navigable 3D reconstruction of Villa Diomedes in Pompeii. You can navigate the archeological site and smoothly blend the current state of the remains with the way it looked at the time it was inhabited. 1/N https://t.co/eYkR5PCHMU
Shared by
Yann LeCun
at
7/4/2021
Every gradient I estimated has been improved by increasing the batch size. But it also made learning slower. Peer review is good. Spontaneous peer review (on Twitter or elsewhere) is also good. What's true in open source is also true in science: Release early, release often.
Shared by
Yann LeCun
at
7/1/2021
Habitat 2.0 is released today! A realistic simulation environment where virtual robots learn to navigate and manipulate. 1000 photo-realistic 3D models of houses with manipulable objects. Blog post: https://t.co/U60FqWomi5 Paper: https://t.co/NbgqRRpphQ
Shared by
Yann LeCun
at
6/30/2021
Making sense of common sense? A piece by Jake Browning about common sense in machines, animals and humans. Jake is a Berggruen Fellow and the resident philosopher in my lab at NYU. The spoiler? Common sense is a set of skills, not facts. It is a collection of models of the world
Shared by
Yann LeCun
at
6/29/2021
A thread on self-supervised learning, with links.
Shared by
Yann LeCun
at
6/29/2021
This is another piece that basically says "deep learning is not as impressive as you think because it's mere interpolation resulting from glorified curve fitting". But in high dimension, there is no such thing as interpolation. In high dimension, everything is extrapolation.
Shared by
Yann LeCun
at
6/29/2021
Everything you always wanted to know about self-supervised learning.
Shared by
Yann LeCun
at
6/29/2021
Exactly. Early and fast communication through arXiv and social media has *demonstrably* accelerated the progress of science. Slowing down progress for the sole purpose of counting point is unethical. #OpenScience
Shared by
Yann LeCun
at
6/28/2021
Limiting the communication of scientific information hurts progress and is contrary to ethics. This new CVPR policy is nothing short of insane. @amy_tabb is right.
Shared by
Yann LeCun
at
6/28/2021
Graph Transformer Networks: deep learning architectures whose states are not tensors but graphs. Very useful for end-to-end training of speech recognition and NLP systems. Not connected with graph NN. Not connected with transformer architectures. Predates those by over 20 years
Shared by
Yann LeCun
at
6/25/2021
"Deep Learning for AI" A paper and accompanying video by Yoshua Bengio, @geoffreyhinton and I in Communications of the ACM about our vision for the future of AI. The content is derived from our Turing Lecture.
Shared by
Yann LeCun
at
6/22/2021
A new benchmark dataset for speech processing.
Shared by
Yann LeCun
at
6/19/2021
New book on the theory of deep learning coming up soon by @ShoYaida and @danintheory .
Shared by
Yann LeCun
at
6/18/2021
DETR, the awesome ConvNet+Transformer vision pipeline by @alcinos26 (now a postdoc at @CILVRatNYU ) , is now available from @huggingface .
Shared by
Yann LeCun
at
6/17/2021
"Energy-Based Learning" Video+slides of a talk I gave on 2021-05-18 in the Mathematical Language Picture Seminar at Harvard, hosted by renowned mathematical physicist Arthur Jaffe. Slides: https://t.co/RYe7AysayK Video: https://t.co/KDMrmVWh5Y
Shared by
Yann LeCun
at
6/17/2021
Video of my talk on self-supervised learning, energy-based models, and training methods for joint-embedding architectures (e.g. Siamese nets) in contrastive and non-contrastive modes. Given at the French-German Symposium on ML. (with panel discussion). https://t.co/zCf4PBm9O7
Shared by
Yann LeCun
at
6/17/2021
AugLy: an open-source data augmentation library for text, images, audio and video. From @facebookai
Shared by
Yann LeCun
at
6/17/2021
Using a diffusion kernel to represent the neighborhood structure in a graph for processing by a transformer. Cool stuff.
Shared by
Yann LeCun
at
6/16/2021
HuBERT: turns continuous speech signals into discrete tokens in a self-supervised manner.
Shared by
Yann LeCun
at
6/15/2021
Between "gauge-invariant ML" ( https://t.co/KbdQbiLO7p ) and "gauge-equivariant ConvNets" ( https://t.co/xo8o8oGOH9 ), a flurry of work at the boundary of deep learning and {equi,in}-variance principles in physics these days from @davidwhogg, @wellingmax and others.
Shared by
Yann LeCun
at
6/15/2021
There are two objectives: - AI as dedicated tool: using AI (and computers, more generallly) to help humans with tasks they are not very good at. - AI as a science: building intelligent systems as a means to understand the principles of intelligence, biological or otherwise.
Shared by
Yann LeCun
at
6/12/2021
1. That there are many different types of problems and their solutions require different types of intelligence. 2. That human intelligence is not good at everything. Humans suck at many tasks, like playing go, chess, and poker, calculating integrals, reasoning logically. #noAGI
Shared by
Yann LeCun
at
6/12/2021
This is the kind of problems where gradient-free optimization must be applied, because the objectives are not differentiable with respect to the relevant variables. [Continued...]
Shared by
Yann LeCun
at
6/10/2021
Very nice work from Google on deep RL- based optimization for chip layout. Simulated annealing and its heirs are finally dethroned after 40 years. This uses graph NN and deConvNets, among other things. I did not imagined back in the 90s that (de)ConvNets could be used for this.
Shared by
Yann LeCun
at
6/10/2021
New architectural concepts for (very) large NLP models: - Hash mixture of experts. - Staircase transformers. Allow to disentangle number of parameters from computational complexity. Great performance on standard benchmarks.
Shared by
Yann LeCun
at
6/9/2021
Awesome new work by FAIR+ENS+Université Paris-Saclay showing that activations in large language models correlate with fMRI brain activity during language comprehension.
Shared by
Yann LeCun
at
6/9/2021
FLORES-101: a many-to-many language dataset to evaluate multilingual translation systems.
Shared by
Yann LeCun
at
6/4/2021
Wave2vec-Unsupervised is particularly useful for languages, dialects, and accents for which little or no transcribed data is available. Here is an example demo in Swahili.
Shared by
Yann LeCun
at
5/21/2021