robocrunch
AK
@ak92501
paper tweets, dms are open ML @Gradio (acq. by @HuggingFace 🤗)
Tweets by AK
Lifting the Curse of Multilinguality by Pre-training Modular Transformers abs: https://t.co/rNzSbBrlmx github: https://t.co/BnZVgZnamp
Shared by
AK
at
5/13/2022
Simple Open-Vocabulary Object Detection with Vision Transformers abs: https://t.co/ytb2Tvliu1
Shared by
AK
at
5/13/2022
Symphony Generation with Permutation Invariant Language Model abs: https://t.co/I62NVnsu9C project page: https://t.co/ri89btU9zS github: https://t.co/0ngNO7bG69
Shared by
AK
at
5/12/2022
Unifying Language Learning Paradigms abs: https://t.co/YVhXlLqOyR github: https://t.co/QinnLMs9pk model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization
Shared by
AK
at
5/12/2022
.@Gradio Demo for MindsEye Lite, run multiple text-to-image models in one place by @multimodalart on @huggingface Spaces demo: https://t.co/Im3KH7lZmi
Shared by
AK
at
5/11/2022
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers abs: https://t.co/PKSGQ5g5N6 few-shot learning emerges only from applying the right architecture to the right data distribution; neither component is sufficient on its own
Shared by
AK
at
5/11/2022
Building Machine Translation Systems for the Next Thousand Languages abs: https://t.co/Vg5gIFvY2y
Shared by
AK
at
5/10/2022
.@Gradio Demo for CaptchaCracker, an open source Python library that provides functions to create and apply deep learning models for Captcha Image recognition on @huggingface Spaces demo: https://t.co/jXbFUsSgqx github: https://t.co/tEFCEB43uM
Shared by
AK
at
5/9/2022
InCoder: A Generative Model for Code Infilling and Synthesis abs: https://t.co/qAbrJzgVkw project page: https://t.co/Sp87l2oGix
Shared by
AK
at
4/14/2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis abs: https://t.co/ja83CLkool
Shared by
AK
at
4/13/2022
Scalable Training of Language Models using JAX pjit and TPUv4 abs: https://t.co/8dv5RogjQ9
Shared by
AK
at
4/13/2022
Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation abs: https://t.co/RNwGJoKrxC
Shared by
AK
at
4/13/2022
Unified Speech-Text Pre-training for Speech Translation and Recognition abs: https://t.co/HnltZX3db4 achieves between 1.7 and 2.3 BLEU improvement above the sota on the MUST-C speech translation dataset and comparable WERs to wav2vec 2.0 on the LIBRISPEECH speech recognition
Shared by
AK
at
4/13/2022
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration abs: https://t.co/KiEuJwgCLc project page: https://t.co/X7O8H4YFHq
Shared by
AK
at
4/13/2022
Few-shot Learning with Noisy Labels abs: https://t.co/wGnBAoCH8D results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise
Shared by
AK
at
4/13/2022
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers abs: https://t.co/A6pKJrj8Yi
Shared by
AK
at
4/13/2022
GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation abs: https://t.co/QtAfo9yvTX project page: https://t.co/m59sGsRzZx
Shared by
AK
at
4/12/2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? abs: https://t.co/Lk71qAPdzm github: https://t.co/hIzImwwFoD
Shared by
AK
at
4/12/2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension abs: https://t.co/j1mrXHqzOB reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg
Shared by
AK
at
4/12/2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback abs: https://t.co/VLl22Ib5NO apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants
Shared by
AK
at
4/12/2022
Correcting Robot Plans with Natural Language Feedback abs: https://t.co/nXxdMv5Orj
Shared by
AK
at
4/12/2022
End-to-End Speech Translation for Code Switched Speech abs: https://t.co/UVSILSDAKJ
Shared by
AK
at
4/12/2022
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink abs: https://t.co/dxP7Sy8MEG
Shared by
AK
at
4/12/2022
No Token Left Behind: Explainability-Aided Image Classification and Generation abs: https://t.co/n5Jeu5Q8c7
Shared by
AK
at
4/12/2022
FLAVA: A Foundational Language And Vision Alignment Model abs: https://t.co/MbmbbQKnHt demonstrate impressive performance on a wide range of 35 tasks spanning these target modalities
Shared by
AK
at
12/9/2021
Improving language models by retrieving from trillions of tokens abs: https://t.co/GndkHppvaf With a 2 trillion token database, Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters
Shared by
AK
at
12/9/2021
Grounded Language-Image Pre-training abs: https://t.co/iFk1qhKnjX pre-train GLIP on 27M grounding data, 3M human-annotated and 24M web-crawled image-text pairs. Evaluated on COCO and LVIS, GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines
Shared by
AK
at
12/8/2021
Theseus: A library for differentiable nonlinear optimization built on PyTorch to support constructing various problems in robotics and vision as end-to-end differentiable architectures github: https://t.co/m4rCmyMDFn
Shared by
AK
at
12/3/2021
N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras abs: https://t.co/AfSpjQ0wB2
Shared by
AK
at
12/3/2021
SEAL: Self-supervised Embodied Active Learning abs: https://t.co/LnD6Am8rKH project page: https://t.co/5BptA0ZMwm
Shared by
AK
at
12/3/2021
BEVT: BERT Pretraining of Video Transformers abs: https://t.co/6BI5E3f9Cv
Shared by
AK
at
12/3/2021
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data abs: https://t.co/FXQBdf7nWA
Shared by
AK
at
12/2/2021
VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion abs: https://t.co/rHvq4wtvx2 an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion
Shared by
AK
at
12/2/2021
Show Your Work: Scratchpads for Intermediate Computation with Language Models abs: https://t.co/kTHsnECVyW On complex tasks ranging from long addition to execution of arbitrary programs, scratchpads dramatically improve the ability of LMs to perform multi-step computations
Shared by
AK
at
12/2/2021
SegDiff: Image Segmentation with Diffusion Probabilistic Models abs: https://t.co/WA3RFRyFRK method obtains sota results on the Cityscapes validation set, the Vaihingen building segmentation benchmark, and the MoNuSeg dataset
Shared by
AK
at
12/2/2021
Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models abs: https://t.co/Gx7VMhYJcV
Shared by
AK
at
12/1/2021
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition abs: https://t.co/JkGgzi64CW experiments on ImageNet, method obtains more than 2× improvement on efficiency compared to sota vision transformers with 0.8% drop of accuracy
Shared by
AK
at
12/1/2021
NeRFReN: Neural Radiance Fields with Reflections abs: https://t.co/GSrJpBWjqP project page: https://t.co/fBqtVcrZ23 Experiments on self-captured scenes show method achieves high-quality NVS and physically sound depth estimation results while enabling scene editing applications
Shared by
AK
at
12/1/2021
Video Frame Interpolation Transformer abs: https://t.co/JtQik8ahVt
Shared by
AK
at
11/30/2021
Searching the Search Space of Vision Transformer abs: https://t.co/Swel8VTzqO The search models, S3, achieve superior performance to recent prevalent ViT and Swin transformer model families under aligned settings
Shared by
AK
at
11/30/2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity abs: https://t.co/U4I6gnNvej Sparse DETR achieves better performance than Deformable DETR even with only 10% encoder tokens on the COCO dataset
Shared by
AK
at
11/30/2021
Mesa: A Memory-saving Training Framework for Transformers abs: https://t.co/vklQFjvMEy github: https://t.co/XVIw5wxpa3
Shared by
AK
at
11/29/2021
ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation abs: https://t.co/4CHhZZHXfo github: https://t.co/WW9H3cmHUK
Shared by
AK
at
11/29/2021
PolyViT co-trained on multiple modalities is even more parameter efficient, still competitive with the state-of-the-art, and learns feature representations that generalize across multiple modalities
Shared by
AK
at
11/29/2021
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs abs: https://t.co/lZB3yVYf7D
Shared by
AK
at
11/29/2021
Interesting Object, Curious Agent: Learning Task-Agnostic Exploration abs: https://t.co/SVS23oWWe1
Shared by
AK
at
11/29/2021
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model abs: https://t.co/bA81dr0wnq
Shared by
AK
at
11/29/2021
Active Learning at the ImageNet Scale abs: https://t.co/HEHhncxUyE propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently by selecting more balanced samples for annotation than existing methods
Shared by
AK
at
11/29/2021
Improving the Perceptual Quality of 2D Animation Interpolation abs: https://t.co/Wd5uMeoOYd
Shared by
AK
at
11/29/2021
True Few-Shot Learning with Prompts -- A Real-World Perspective abs: https://t.co/3r1VfrNA5D
Shared by
AK
at
11/29/2021
Less is More: Generating Grounded Navigation Instructions from Landmarks abs: https://t.co/251C5jJiNq Supported by a new bootstrapped dataset of 1.1m grounded landmarks, MARKY-MT5 almost eliminates the gap between model-generated and human-written instructions on R2R paths
Shared by
AK
at
11/29/2021
Semi-Supervised Music Tagging Transformer abs: https://t.co/qUqNcnP9bT
Shared by
AK
at
11/29/2021
µNCA: Texture Generation with Ultra-Compact Neural Cellular Automata abs: https://t.co/aZlVfFrICB
Shared by
AK
at
11/29/2021
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers abs: https://t.co/EcsWER8kXT For Cityscapes segmentation with the Segformer-B3 backbone, AFNO can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms
Shared by
AK
at
11/29/2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning abs: https://t.co/ZghdXc2YSc experiments on 5 video captioning datasets, show that SWINBERT achieves across-the-board performance improvements over previous methods
Shared by
AK
at
11/29/2021
Global Interaction Modelling in Vision Transformer via Super Tokens abs: https://t.co/oDe5XATTDX In standard image classification on Imagenet1K, STTS25 achieves 83.5% accuracy which is equivalent to Swin transformer (Swin-B) with circa half the number of parameters (49M)
Shared by
AK
at
11/29/2021
arxiv: https://t.co/ga00IjnZ6z By co-training PolyViT on a single modality, achieved sota results on three video and two audio datasets, while reducing the total number of parameters linearly compared to single-task models
Shared by
AK
at
11/29/2021
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition abs: https://t.co/fpbaXpkCdB achieve 84.5% Top-1 classification accuracy on ImageNet with only 73M parameters, but also show promising performance on dense prediction tasks
Shared by
AK
at
11/29/2021
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations abs: https://t.co/3ZwaX7YA6i project page: https://t.co/kBiloukNiP
Shared by
AK
at
11/29/2021
Conditional Image Generation with Score-Based Diffusion Models abs: https://t.co/N8Ua9D5Lf0
Shared by
AK
at
11/29/2021
Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization abs: https://t.co/ycHe83e5hC
Shared by
AK
at
11/29/2021
Sparse is Enough in Scaling Transformers abs: https://t.co/wuWtZC6Fq6 sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters
Shared by
AK
at
11/29/2021
On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features abs: https://t.co/aboq1hkhdU
Shared by
AK
at
11/25/2021
Extracting Triangular 3D Models, Materials, and Lighting From Images abs: https://t.co/cLRgkuKqxr present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations
Shared by
AK
at
11/25/2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion abs: https://t.co/lwYYEzc5PZ presents a unified multimodal pretrained model that can generate new or manipulate existing visual data (i.e., images and videos) for various visual synthesis tasks
Shared by
AK
at
11/25/2021
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning abs: https://t.co/5ab62ABAeX
Shared by
AK
at
11/24/2021
Efficient Video Transformers with Spatial-Temporal Token Selection abs: https://t.co/xVW722l9Ng
Shared by
AK
at
11/24/2021
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts? abs: https://t.co/gtDhek8hBe LMs provide the SOTA performance on semantic merge conflict resolution for Edge, LMs do not yet obviate the benefits of fine-tuning neural models
Shared by
AK
at
11/24/2021
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields abs: https://t.co/Vjf7oQF1Fo reduces mean-squared error by 54% compared to mip-NeRF, and is able to produce realistic synthesized views and detailed depth maps for highly intricate, unbounded real-world scenes
Shared by
AK
at
11/24/2021
Efficient Training of Visual Transformers with Small Datasets abs: https://t.co/xCDfL1TkM1
Shared by
AK
at
11/23/2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration abs: https://t.co/dDSo7ZAqny github: https://t.co/DDA9kJ9L6p
Shared by
AK
at
11/23/2021
Machine Learning for Mechanical Ventilation Control abs: https://t.co/4iMBYrRVcj
Shared by
AK
at
11/23/2021
Neural Fields in Visual Computing and Beyond abs: https://t.co/RWqcMlGISm
Shared by
AK
at
11/23/2021
RedCaps: web-curated image-text data created by the people, for the people abs: https://t.co/3QVHnfH4eQ project page: https://t.co/D8lHvjBT6s a large-scale dataset of 12M image-text pairs collected from Reddit
Shared by
AK
at
11/23/2021
Benchmarking Detection Transfer Learning with Vision Transformers abs: https://t.co/2qo2GVjZ7N masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO, increasing APbox up to 4% (absolute)
Shared by
AK
at
11/23/2021
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion abs: https://t.co/O3M8MPfGIW
Shared by
AK
at
11/23/2021
Discrete Representations Strengthen Vision Transformer Robustness abs: https://t.co/GRHJEJQmQ1 adding discrete representation on 4 architecture variants strengthens ViT robustness by up to 12% across 7 ImageNet robustness benchmarks while maintaining the performance on ImageNet
Shared by
AK
at
11/23/2021
Florence: A New Foundation Model for Computer Vision abs: https://t.co/sxNJitYcL0 sota results in majority of 44 representative benchmarks, ImageNet-1K zero-shot classification with top-1 accuracy of 83.74 and the top-5 accuracy of 97.18, 62.4 mAP on COCO fine tuning
Shared by
AK
at
11/23/2021
Scaling Law for Recommendation Models: Towards General-purpose User Representations abs: https://t.co/kHvTsyZxJW present CLUE trained on the billion scale real-world user behavior data to learn general-purpose user representations
Shared by
AK
at
11/23/2021
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction abs: https://t.co/Njll2XpXm0 github: https://t.co/QCXmh1Fohd approach achieves NeRF comparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU
Shared by
AK
at
11/23/2021
PointMixer: MLP-Mixer for Point Cloud Understanding abs: https://t.co/4hTJCwGOJw experiments show the competitive or superior performance of PointMixer in semantic segmentation, classification, and point reconstruction against transformer-based methods
Shared by
AK
at
11/23/2021
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning abs: https://t.co/kFjjMLQHX2 EXT5 outperforms strong T5 baselines on SuperGLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside of EXMIX
Shared by
AK
at
11/23/2021
L-Verse: Bidirectional Generation Between Image and Text abs: https://t.co/yDuXka25MO framework for bidirectional generation between image and text, AugVAE achieves new sota reconstruction FID and shows its potential as an universal backbone encoder-decoder for generative models
Shared by
AK
at
11/23/2021
StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis abs: https://t.co/lCxOVWmqUi a 3D-aware generative model that can synthesize high-resolution images with high multi-view consistency
Shared by
AK
at
10/6/2021
Optimal Stroke Learning with Policy Gradient Approach for Robotic Table Tennis pdf: https://t.co/eUXessD8br abs: https://t.co/3jkbTkLYbo
Shared by
AK
at
9/8/2021
Multimodal Conditionality for Natural Language Generation pdf: https://t.co/jRtGIjRo4T abs: https://t.co/l6YOcaLCrl a novel approach for adapting pretrained language models into multimodal conditional NLG models
Shared by
AK
at
9/5/2021
Challenges in Generalization in Open Domain Question Answering abs: https://t.co/7zel8Y5SWe experimental findings establish that novel-entity entities are not the main bottleneck for non-parametric models and identify key factors that impact their performance
Shared by
AK
at
9/5/2021
Learning to Prompt for Vision-Language Models pdf: https://t.co/7vZdovDppj abs: https://t.co/XWFG7Xd5jO a differentiable approach that focuses on continuous prompt learning to facilitate deployment of pre-trained vision language models in downstream datasets
Shared by
AK
at
9/3/2021
MergeBERT: Program Merge Conflict Resolution via Neural Transformers abs: https://t.co/DdZYkiQR0q model achieves 64–69% precision of merge resolution synthesis, yielding nearly a 2× performance improvement over existing structured and neural program merge tools
Shared by
AK
at
9/1/2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice pdf: https://t.co/jn1FDRUh5I abs: https://t.co/YCyhCtBVKK
Shared by
AK
at
8/31/2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers pdf: https://t.co/MyPQISWNk5 abs: https://t.co/E1Emoa6t8b extension to the classic DTW algorithm, relaxes the constraints of matching endpoints of paired sequences and the continuity of the path cost
Shared by
AK
at
8/30/2021
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers abs: https://t.co/H8uDvueFjD models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS
Shared by
AK
at
8/29/2021
Multi-Task Self-Training for Learning General Representations pdf: https://t.co/c6IlbxEymv abs: https://t.co/t1Hf7o4vVb a scalable multi-task self-training method for learning general representations
Shared by
AK
at
8/25/2021
.@Gradio web demo for MDETR: Modulated Detection for End-to-End Multi-Modal Understanding now on @huggingface Spaces demo: https://t.co/DWjhTVGOgK paper: https://t.co/tV2Ux2yxsX github: https://t.co/yM3pjwYLoy
Shared by
AK
at
8/24/2021
How Can Increased Randomness in Stochastic Gradient Descent Improve Generalization? pdf: https://t.co/Jsj1hpi3vB abs: https://t.co/nEGOGZ2Z8v
Shared by
AK
at
8/24/2021
Fastformer: Additive Attention is All You Need pdf: https://t.co/HelF2hT4Te abs: https://t.co/ch8O4kG6oA a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity
Shared by
AK
at
8/23/2021
Image2Lego: Customized LEGO® Set Generation from Images pdf: https://t.co/yHBU4o5qSt abs: https://t.co/3pBPqI1dFz project page: https://t.co/96SPLGgO06 a pipeline for producing 3D LEGO® models from 2D images
Shared by
AK
at
8/20/2021
Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing pdf: https://t.co/JEnRjpK03Z abs: https://t.co/Ykt8cfJyBr model to learn articulation and pose-dependent deformation for humans in complex clothing using an implicit 3D surface representation
Shared by
AK
at
8/19/2021
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis abs: https://t.co/dTknlBbANY a hierarchical approach to introduce bidirectional context into autoregressive transformer models for high-fidelity controllable image synthesis
Shared by
AK
at
8/19/2021
Do Vision Transformers See Like Convolutional Neural Networks? pdf: https://t.co/5Yz5F2PZwO abs: https://t.co/bpHO2rOYDv find striking differences between the two architectures, such as ViT having more uniform representations across all layers
Shared by
AK
at
8/19/2021
MUSIQ: Multi-scale Image Quality Transformer pdf: https://t.co/LKoJ05canf abs: https://t.co/gUmNW7aX6Q a multi-scale image quality Transformer, which can handle full-size image input with varying resolutions and aspect ratios
Shared by
AK
at
8/15/2021
Mobile-Former: Bridging MobileNet and Transformer pdf: https://t.co/Ssr6oFOjy7 abs: https://t.co/lctrhRG2Oq achieves 77.9% top-1 accuracy at 294M FLOPs, gaining 1.3% over MobileNetV3 but saving 17% of computations
Shared by
AK
at
8/13/2021
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision pdf: https://t.co/h6Z30bGMDS abs: https://t.co/n70eDomcPz project page: https://t.co/eICW1QO6l3
Shared by
AK
at
8/13/2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion paper: https://t.co/OtNYzgYWJn successfully handles the long-horizon, dense, multi-modal histories of ALFRED, and the first ALFRED model to utilize object-centric navigation targets
Shared by
AK
at
8/11/2021
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP pdf: https://t.co/KSjm8txURN abs: https://t.co/olELgzAMqI a method that allows to automatically segment images into semantically meaningful regions without human supervision
Shared by
AK
at
7/27/2021
Towards Generative Video Compression pdf: https://t.co/ZNguNuZ2QW abs: https://t.co/uUqoOt5Vqf a neural video compression method based on GANs that outperforms previous neural video compression methods and is comparable to HEVC in a user study
Shared by
AK
at
7/27/2021
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation pdf: https://t.co/4eSx78aV93 abs: https://t.co/19u97Oko0H github: https://t.co/aQ4cd3ifT6
Shared by
AK
at
5/4/2021
Larger-Scale Transformers for Multilingual Masked Language Modeling pdf: https://t.co/GPnk0T7I69 abs: https://t.co/2gkgwV7KDU XLM-R model scaled up to 10.7B parameters and obtained stronger performance than previous XLMR models on cross-lingual understanding benchmarks
Shared by
AK
at
5/4/2021
@lucidrains attention is all we need
Shared by
AK
at
5/4/2021
Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions pdf: https://t.co/4Bgj34xfP8 abs: https://t.co/q7IqEITpGU Transformer architecture without using any convolutional filters can be adapted for large scale audio understanding
Shared by
AK
at
5/4/2021
One Model to Rule them All: Towards Zero-Shot Learning for Databases pdf: https://t.co/hoNBr4qONK abs: https://t.co/Sb9IRmIoYI a new approach for learned database components that can support new databases without running any training query on that database
Shared by
AK
at
5/4/2021
CoCon: Cooperative-Contrastive Learning pdf: https://t.co/BxFs21DHel abs: https://t.co/y5vwjgu3QY github: https://t.co/n0xYs7ZksZ a cooperative version of contrastive learning, called CoCon, for self-supervised video representation learning
Shared by
AK
at
5/3/2021
DriveGAN: Towards a Controllable High-Quality Neural Simulation pdf: https://t.co/awz1lz0BaX abs: https://t.co/fpzJ5u2nmZ DriveGAN is a fully differentiable simulator, allows for re-simulation of a given video sequence, offering an agent to drive through a recorded scene again
Shared by
AK
at
5/3/2021
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations pdf: https://t.co/DdianuJ0fG abs: https://t.co/iTAhX2yTYS (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives
Shared by
AK
at
4/30/2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection pdf: https://t.co/Wf1aVckayx abs: https://t.co/SoKySKJzEp
Shared by
AK
at
4/29/2021