Aran Komatsuzaki   @arankomatsuzaki

ML PhD @ GaTech



















 

  Tweets by Aran Komatsuzaki  

Aran Komatsuzaki    @arankomatsuzaki    9 hours      

I feel that the SotA open-domain QA models (e.g. Fusion-in-Decoder) on closed NQ and TriviaQA are already human-level. Looking at sample questions of NQ, I bet I'd score much worse than the SotA due to questions being too random and specific.
  
    4         23



Aran Komatsuzaki    @arankomatsuzaki    11/30/2021      

LAFITE : Towards Language-Free Training for Text-to-Image Generation Obtains competitive results in zero-shot text-to-image generation on the MS-COCO, yet with around only 1% of the model size and language-free training data size relative to DALL-E. https://t.co/mNtwBzzDR0
  
          6



Aran Komatsuzaki    @arankomatsuzaki    11/30/2021      

Also, would it be effective to fine-tune a pretrained T5 to generate EA pair for a given Q via RL with a verifier being the value function (analogy to OpenAI's recent paper)? Ref: https://t.co/Ty1vySMM8N Ref2: https://t.co/97KFwxPR1E
  
          4



Aran Komatsuzaki    @arankomatsuzaki    11/30/2021      

Fine-tuning on question-answer pairs struggles with math word problems, which are better solved with training on question-explanation-answer (QEA) pairs and generating E from Q and then A from QE pair. Now, can we solve other hard tasks better by building a dataset of QEA pairs?
  
          3



Aran Komatsuzaki    @arankomatsuzaki    11/29/2021      

PolyViT: Co-training Vision Transformers on Images, Videos and Audio Co-training PolyViT on multiple modalities and tasks leads to a model that is even more parameter-efficient, and learns representations that generalize across multiple domains. https://t.co/OvWXAQWEWI
  
    60         344



Aran Komatsuzaki    @arankomatsuzaki    11/29/2021      

Sparse is Enough in Scaling Transformers Sparsifying attention and FFN leads to dramatic speedup in decoding https://t.co/RySSyfZTvo
  
          3



Aran Komatsuzaki    @arankomatsuzaki    11/26/2021      

It's fascinating to me how localized the information human eyes process at a given moment (via fovea) compared with the current vision models. I hope more papers on the latter will come up.
  
    1         4



Aran Komatsuzaki    @arankomatsuzaki    11/25/2021      

On a related note, (phenomenal) EfficientZero paper took 5 months until release like many other NeurIPS submissions, which was a loss. We should discuss, as a community, ways to make it possible or incentivize to rapidly share paper/code/dataset/model from academia and industry.
  
          3



Aran Komatsuzaki    @arankomatsuzaki    11/25/2021      

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Achieves SotA results on text-to-image generation, text-to-video generation, video prediction, etc. Outperforms DALL-E in text2image. abs: https://t.co/LgrUVjCAEB repo: https://t.co/xlLetCJi1P
  
    1         10



Aran Komatsuzaki    @arankomatsuzaki    11/25/2021      

Towards Learning Universal Audio Representations Proposes a novel normalizer-free Slowfast NFNet and achieves Sota performance on various domains of audio rep learning. https://t.co/OMKvllF8jB
  
    2         5



Aran Komatsuzaki    @arankomatsuzaki    11/23/2021      

Florence: A New Foundation Model for Computer Vision Florence achieves new SotA in majority of 44 representative CV benchmarks, including classification, retrieval, object detection, VQA, image caption, video retrieval and action recognition. https://t.co/Afvkc9pp5N
  
    9         57