robocrunch
Sasha Rush
@srush_nlp
Professor, Programmer in NYC. Cornell, Hugging Face 🤗
Tweets by Sasha Rush
Yesterday I expressed my naivety about the connection between Effective Altruism / Large Language Models (https://t.co/VPzhCrUF5K) I woke up to dozens of intense replies. This thread tries to summarize the articles people recommended.
Shared by
Sasha Rush
at
5/3/2022
I'm almost scared to ask, but what is EA and what does it have to do with language modeling? (out-of-touch person on the east coast)
Shared by
Sasha Rush
at
5/2/2022
The Annotated Transformer [v2022] A community refresh of the original blog with modern PyTorch and data-science tools. https://t.co/6jydMTO8if (Led by @austinvhuang, @subramen, @JonathanSumDL, @eKhalid_, @BlancheMinerva )
Shared by
Sasha Rush
at
5/2/2022
Tensor Puzzles v2 : Do you really know PyTorch? (https://t.co/B5L2bja1DN) These turned out to be harder than I expected, so I added a tensor visualization. Shows your answer with the spec on quickcheck examples. (Don't worry, there are still puppies.)
Shared by
Sasha Rush
at
4/22/2022
Dylan Madisetti (@DylanMadisetti) has extremely impressive pytorch skills (and doesn't even need the `where` function)
Shared by
Sasha Rush
at
4/19/2022
Do you (really) know PyTorch 🔥? Try out my Tensor Puzzles 🧩. 16 mini-puzzles for those ready to take off the stackoverflow training wheels. https://t.co/B5L2bja1DN
Shared by
Sasha Rush
at
4/18/2022
SSMs classify images from pixels and speech from waveforms. Yet did you know that S4 is just a linear RNN, (and DSS is a diagonal RNN)? Lots to still be understood about sequence models. Come by to talk about S4 and JAX in an hour. https://t.co/v7CjkjUEfo https://t.co/VM1yzgnQst
Shared by
Sasha Rush
at
4/8/2022
"A hapax legomenon is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text." (https://t.co/cBCbF6SWNG , A+ wiki page)
Shared by
Sasha Rush
at
3/23/2022
If any significant part of the model has a transformer involved, I will admit defeat. I don't think that is crazy. if you told me in 2016 that by 2020 no model would have a signficant RNN component I would have been surprised.
Shared by
Sasha Rush
at
3/2/2022
Request for Counterparty: I would like to make a Erlich / Buffett bet on a proposition similar to: "On 1/1/2027, a Transformer-like will continue to hold the SoTA in effectively all benchmarked NLP tasks." I would be taking the negative position.
Shared by
Sasha Rush
at
3/1/2022
Additions for 2021 * Many more unit tests * Expanded CUDA examples (matmul / reduce) * Interactive visualizations in @streamlit * NLP classification example * Speed and memory improvements * Slides from the course.
Shared by
Sasha Rush
at
12/6/2021
Minitorch🔥(https://t.co/41DIF6vAUl, v2021) Build-it-yourself deep learning. Learn about auto-diff, tensors, gpu's, and advanced nn models. Build everything from scratch in pure python with full unit test coverage and interactive visualizations.
Shared by
Sasha Rush
at
12/6/2021
Differential Inference: A Criminally Underused Tool (https://t.co/zSwhvk806r) An annotated talk about elementary probability (coins&dice) in pytorch. Nothing new, just think we should mostly do discrete inference with auto-diff. Slides: https://t.co/tJHQlLwNrp
Shared by
Sasha Rush
at
11/30/2021
Twitter was less than helpful, but the 90's neural network heros cam through! Appendix A contains all local optima of the 2-2-1 network in detail. I also learned that there are no finite local optima in this model! All obs optima are precision issues. https://t.co/5GQ4KezYzY https://t.co/Ytf9z22SS6
Shared by
Sasha Rush
at
11/19/2021
Torch Golf #5 🏌️♂️⛳️ Hidden Markov Model (Fast general inference with no dynamic programming) http
Shared by
Sasha Rush
at
11/10/2021
Torch Golf #4 ⛳️🥏 - Bayesian Linear Regression (by the book)
Shared by
Sasha Rush
at
11/5/2021
ML4H (https://t.co/HBZjbsI183) running this year on Mini-Conf, put together by Elena Sizikova and Young Joon Kwon. Still neat to see the care and effort folks put into making engaging virtual conferences with small teams.
Shared by
Sasha Rush
at
11/4/2021
"When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute" by Tao Lei @taolei15949106 - Outstanding Paper at EMNLP https://t.co/7IR25d9Sz2 (Tao's work is always must read. Combines algorithmic cleverness with practical engineering and experiments.)
Shared by
Sasha Rush
at
10/30/2021
Just to clarify, I wasn't advocating Gradient Descent, but using autodiff to compute the exact posterior / E-step in EM. https://t.co/iGb83ipDKW (I agree you might as well do the M in closed-form.)
Shared by
Sasha Rush
at
10/24/2021
Torch Golf! 🏌️♀️⛳️ - Multiclass logistic regression. https://t.co/YBIFpDY1sq http
Shared by
Sasha Rush
at
10/19/2021
Rationales for Sequential Prediction (@keyonV https://t.co/ME47tWFTmh, https://t.co/K3F6WoTpPz) What subset of words trigger the next prediction in text generation? We propose greedy rationalization as a fast combinatorial algorithm for finding a small rationale set. /1
Shared by
Sasha Rush
at
9/15/2021
New preprint: Datasets (https://t.co/8aUjNjMpYh) documents the Hugging Face Datasets project, now containing more than 700 NLP datasets from over 300 contributors. NLP models haven't changed much recently, but datasets, and how we use and document them, have changed a lot ...
Shared by
Sasha Rush
at
9/8/2021
There is an NLP grad student phase where they become convinced they should just induce a manifold over text. I've never seen anyone get close. Can't decide if this is a grad student bug or like, the most important problem.
Shared by
Sasha Rush
at
7/14/2021