robocrunch
Bojan Tunguz
@tunguz
#MachineLearning at @Nvidia. @Kaggle Quadruple Grandmaster. Data Scientist. Physicist. Catholic. Husband. Father. @Stanford Alum. Opinions my own.
Tweets by Bojan Tunguz
In the age of increasingly ubiquitous and powerful AI, your datasets are your moat.
Shared by
Bojan Tunguz
at
5/14/2022
This week DeepMind has released Gato - a single transformers-based architecture that is capable of working on over 600 different tasks. 1/3 https://t.co/C3KlA90MIU #ArtificialIntelligence #AI #MachineLearning #ML #DeepLearning #DL
Shared by
Bojan Tunguz
at
5/14/2022
LOL, now I’m being quoted in various articles. 🤣 ‘“This is something that many of us in the Python community have been hoping for for a long time,” noted Bojan Tunguz, a machine learning modeler at @Nvidia.’ https://t.co/FQSDqoXpGp via @DEVCLASS
Shared by
Bojan Tunguz
at
5/5/2022
From the very beginnings of the research in AI, the ability to understand natural language has been seen as the pinnacle and the end goal of what it means for an agent to be intelligent (think of the Turing Test). 1/6
Shared by
Bojan Tunguz
at
5/5/2022
That’s why only college dropouts should be in charge of startups. https://t.co/POhaOvnT0C
Shared by
Bojan Tunguz
at
5/5/2022
"A fox knows many things, but a hedgehog knows one big thing." - Archilochus This was a quote that Isaiah Berlin expanded into one of his essays, first published in 1953. The quote was used to classify writers and thinkers into two big categories - those who view the world 1/18
Shared by
Bojan Tunguz
at
5/3/2022
•Python ecosystem: Run many popular packages of Python and the scientific stack (such as numpy, pandas, scikit-learn, and more) •Python with JavaScript: Bi-directional communication between Python and Javascript objects and namespaces 3/6
Shared by
Bojan Tunguz
at
5/2/2022
Key features include: •Python in the browser: Enable drop-in content, external file hosting, and application hosting without the reliance on server-side configuration 2/6
Shared by
Bojan Tunguz
at
5/2/2022
How Banks Are Winning with AI and Automated Machine Learning https://t.co/zCpFPf04Iu via @rightrelevance thanks @datarobot
Shared by
Bojan Tunguz
at
4/30/2022
DeepSolar Dataset on #kaggle via @KaggleDatasets https://t.co/c9cxs056VN
Shared by
Bojan Tunguz
at
4/30/2022
If you are working in Python, and find yourself in a need of encoding categorical variables for various ML tasks, I'd strongly encourage you to take a look at the Category Encoders library. It's a very good and fairly comprehensive sklearn-style library. https://t.co/N8wubPlKDF
Shared by
Bojan Tunguz
at
4/28/2022
I’ve reached 40,000 followers on Twitter. Grateful for every one of them. 🙏 I tweet about: 🤖 Machine Learning/AI 🐍 Python 🟩 NVIDIA 🤪 Memes 👨💻 Future of work 🔭 Science 🔐 Blockchain Follow me to learn more about those topics.
Shared by
Bojan Tunguz
at
4/27/2022
Three ways you can provide insights: 1. Analysis 2. Visualization 3. Models
Shared by
Bojan Tunguz
at
4/27/2022
I honestly don't believe that ML/AI will ever have a substantial impact on TradFi. For ML to have a substantial impact on *any* domain, we need to go through a major digital transformation first. And that seems to be virtually impossible to pull off in TradFi.
Shared by
Bojan Tunguz
at
4/23/2022
Here are my top used ML algos for tabular data problems: 1. XGBoost 2. HistGradientBoosting 3. Logistic regression/ Ridge regression 4. LightGBM 5. MLP 6. A blend of 1. And 5.
Shared by
Bojan Tunguz
at
4/22/2022
One of the things that people who have never given Kaggle a serious try don’t understand is what it *really* takes to be successful there, and in particular what it takes to win a competition. No, it’s NOT about squeezing the maximum performance out of an algorithm. 1/5
Shared by
Bojan Tunguz
at
4/22/2022
I believe that going forward the biggest bottleneck for the continuing growth of the tech sector will be the availability of well trained tech talent. The demand is far outstripping the supply that's coming down the educational pipeline. 1/2
Shared by
Bojan Tunguz
at
4/21/2022
“We face danger whenever information growth outpaces our understanding of how to process it.” - @NateSilver538
Shared by
Bojan Tunguz
at
4/18/2022
All the best machine learning modelers I know are making their best models on @numerai now. Probably nothing.
Shared by
Bojan Tunguz
at
4/18/2022
Them: large language models are getting too powerful! Large language models:
Shared by
Bojan Tunguz
at
4/18/2022
Trying to build a tech startup at the steepest part of the sigmoid adoption curve is a very precarious thing to do.
Shared by
Bojan Tunguz
at
4/14/2022
I am excited to share that a paper that I contributed to “Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics” has been published in Nature Communications. https://t.co/R7cDHDlWyw #mRNA #biology #vaccine #machinelearning 1/5
Shared by
Bojan Tunguz
at
4/14/2022
One of my favorite things about XGBoost is how easy it is to install. Here it was a breeze to do so on a Raspberry Pi Zero! From supercomputing clusters, to the edge micro devices #XGBoostIsAllYouNeed.
Shared by
Bojan Tunguz
at
4/13/2022
I decided that in order to understand the large language models I needed to inspect all the weights by hand, so I printed them all out.
Shared by
Bojan Tunguz
at
4/8/2022
Just FYI, there is *much* more to Kaggle the platform than the Kaggle-sponsored ML competitions. If your concern is that those competitions are not representative of the "full" DS workflow, you *could* setup your own that mimic the rest of the DS pipeline as you see it. 1/2
Shared by
Bojan Tunguz
at
4/4/2022
As I was saying earlier … DataRobot’s vision to democratize machine learning with no-code AI https://t.co/dg1v6eTFSu
Shared by
Bojan Tunguz
at
4/3/2022
NB: there are currently at least two tech unicorns, many other high-value startups, and countless number of extremely successful careers that were built by Kagglers using the skills and insights they gained on that platform.
Shared by
Bojan Tunguz
at
4/3/2022
XGBoost Is All You Need Deep Neural Networks and Tabular Data: A Survey https://t.co/Z2KsHP3fvp
Shared by
Bojan Tunguz
at
3/30/2022
Here is a great post about the genealogy of the most popular transformers models by Xavier Amatriain. @xamat https://t.co/w2Z7Kc7opG #NLP #ML #DeepLearning #ArtificialIntelligence #AI
Shared by
Bojan Tunguz
at
3/28/2022
Deep Learning with PyTorch https://t.co/yjuSB9QxvK Introducing Python https://t.co/iUmywBeHRw Machine Learning Using TensorFlow Cookbook https://t.co/bKlQkiiJMV Natural Language Processing with Transformers https://t.co/D1X8dPgHNW 5/6
Shared by
Bojan Tunguz
at
3/23/2022
Hands-On Gradient Boosting with XGBoost and scikit-learn https://t.co/sRLIFOI8DT Deep Learning with Python https://t.co/TjL36JKjqA Effective Pandas https://t.co/2UPkRsvDa7 Machine Learning with PyTorch and Scikit-Learn https://t.co/J0sRKyC2Wb 6/6
Shared by
Bojan Tunguz
at
3/23/2022
Books: Approaching (Almost) Any Machine Learning Problem https://t.co/fYVtINpZ4z Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists https://t.co/d4myngd69o Hands-On Unsupervised Learning Using Python https://t.co/VbuK177Clr 4/6
Shared by
Bojan Tunguz
at
3/23/2022
Today @NVIDIA has announced a new version of the high-end server class GPU - H100 Tensor Core GPU. This is an order-of-magnitude leap over the previous generation in computational performance. #GPU #gpucomputing #machinelearning #artificialintelligence https://t.co/rQBIm0ui5b
Shared by
Bojan Tunguz
at
3/22/2022
What is something that you can say during sex, but also while your ML model is training?
Shared by
Bojan Tunguz
at
3/19/2022
Wordceling is just shape rotation in the latent embedding space.
Shared by
Bojan Tunguz
at
3/15/2022
Don't forget to shift your learning rate one hour today. Otherwise none of your models will converge.
Shared by
Bojan Tunguz
at
3/13/2022
Very interesting work on Unsupervised Feature Selection using Supervised Algorithms. It turns out that even for unsupervised learning, XGBoost is all you need. :) Blog post: https://t.co/o2nC7wUl7d Code repo: https://t.co/aOR3mJ9VEy #machinelearning #unsupervisedlearning
Shared by
Bojan Tunguz
at
3/12/2022
4. Many thousands to about a billion datapoint is where Gradient Boosted trees rule. If you need just one algorithm, go with this. You'll never go wrong. 3/
Shared by
Bojan Tunguz
at
3/10/2022
OK, here is my honest take on when to use which approach/technique with a given dataset. These are my rules of thumb, and caveats could fill out the entire internet. 1. Up to a few hundred datapoint, use stats 2. For few hundred to few thousand use linear/logistic regression 1/
Shared by
Bojan Tunguz
at
3/10/2022
Woohoo, I'm winning! Thanks @abhi1thakur for organizing this one! "Overfitting"? Nay, it's called #winning. #kaggle - https://t.co/Cl9JscwRy2
Shared by
Bojan Tunguz
at
3/9/2022
Hot take: Statistics is just poor man's Machine Learning. Send tweet.
Shared by
Bojan Tunguz
at
3/9/2022
LinkedIn and DeepMind co-founders form AI startup to help humans talk to computers https://t.co/w08cXvpwSg via @engadget
Shared by
Bojan Tunguz
at
3/8/2022
Everyone is like 4D chess this, 10D chess that, and I am like dude ever heard of Hilbert space??? Unless you are playing chess in infinite dimensional vector spaces, you are ngmi.
Shared by
Bojan Tunguz
at
3/7/2022
but we are only scratching the surface of what massive parallelization can achieve. Once your datasets contain all of human knowledge, it's hard to imagine where else you can find more data to feed your algorithms with. 5/7
Shared by
Bojan Tunguz
at
3/6/2022
For at least a decade now we have been enjoying an unprecedented AI Springtime. A perfect storm of major advances in algorithms (deep learning), computational architecture (GPUs) and availability of large high quality datasets has enabled the field to grow - exponentially! 2/7
Shared by
Bojan Tunguz
at
3/6/2022
You think your image segmentation deep learning model is SOTA? Try it on this image.
Shared by
Bojan Tunguz
at
3/5/2022
I just came across this cool deep learning image library. Anomalib is a deep learning library that aims to collect state-of-the-art anomaly detection algorithms for benchmarking on both public and private datasets. https://t.co/hThDksM5m5 #computervision #dl #ml #ai
Shared by
Bojan Tunguz
at
3/3/2022
Being able to think in terms of probabilities is a superpower, period. And that includes your own beliefs and opinions. Probabilistic and statistical thinking unfortunately doesn’t come naturally to us. Takes years of training and experience to develop.
Shared by
Bojan Tunguz
at
3/3/2022
Just received my review copy of Machine Learning with PyTorch and Scikit-Learn. Thank you @rasbt for making this happen! The book is huge, much bigger than Python Machine Learning. Seems really comprehensive, covers almost all relevant areas of modern ML. https://t.co/aEFpjo9gek
Shared by
Bojan Tunguz
at
3/2/2022
I’ve gotten a lot of support for this tweet, far more than I had expected. I also got some pushback, but NONE of the pushback challenged the basic facts of my allegations. Almost all of the pushback was just empty ideological posturing.
Shared by
Bojan Tunguz
at
3/1/2022
way forward. I feel that we are getting close to the limit of what this line of attack can achieve. My *intuition* is that we'll probably need to take a step or two back, and try to somehow incorporate more insights form how humans understand and process meaning in language. 15/
Shared by
Bojan Tunguz
at
2/28/2022
I just got my review copy of “Natural Language Processing with Transformers” from @OReillyMedia. Thank you @_lewtun for making this happen! Really looking forward to digging deeper into transformers and @huggingface. 🤗 https://t.co/g1NGNPLXlz
Shared by
Bojan Tunguz
at
2/26/2022
I’ll never stop being fascinated by the fact that 99.999% Artificial Intelligence researchers know absolutely nothing about a century+ of solid fundamental research in *human* Intelligence. 1/
Shared by
Bojan Tunguz
at
2/26/2022
200,000+ Jeopardy! Questions on #kaggle via @KaggleDatasets https://t.co/nyohzyj1mX
Shared by
Bojan Tunguz
at
2/23/2022
Took a look at my spam folder for the first time in a long while. One thing that struck me was how *visually* obvious the spam message titles are these days. I wonder if the spam detection software might be using vision ML models on titles - I know I would try it.
Shared by
Bojan Tunguz
at
2/23/2022
Adversarial Attacks on Deep Neural Networks: an Overview - https://t.co/ZUQ1k5gTpS https://t.co/kU1Imy1Y7I via @rightrelevance thanks @kirkdborne
Shared by
Bojan Tunguz
at
2/20/2022