Ross Wightman   @wightmanr

Angel Investor. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.



















 

  Tweets by Ross Wightman  

Ross Wightman    @wightmanr    11/30/2021      

I'd love to train some larger V2 variants but need bigger GPUs to avoid burning lots of time iterating hparams and fiddling with stability. If someone can pass me the ssh keys to a 4x 40GB+ GPU system (ie 4x A100, 4x A6000, 4x RTX8000) for ~1-2 month, I can train V2-S, maybe M
  
          5



Ross Wightman    @wightmanr    11/30/2021      

I've tried a few times to get some normalizer free (NFNet + EffDet) object detection models working well. No luck so far (NF head is posing stability problems). BUT the AGC (adaptive grad clipping) from https://t.co/gGg3XvYp1T works decently with standard EfficientDet models.
  
    5         53



Ross Wightman    @wightmanr    11/30/2021      

I retrained V2 'Tiny' EfficientDet model, slight bump to my prev 46.1 mAP at train res 768 and 47 mAP at 896 from original 45.8. Gain was from using AGC w/ a bit higher LR, could likely go higher but runs take a while. AGC support added to train script. https://t.co/xyLzftOh3D
  
          9



Ross Wightman    @wightmanr    11/25/2021      

And of course there is the normalizer free approach in NFNets with w/o norm and w/ carefully placed gains, weight standardization, etc. It's far from hands off to adapt across architectures and certainly wasn't very easy to throw into an object detection network 😓
  
          3



Ross Wightman    @wightmanr    11/25/2021      

ProxyNorm from Graphcore looks intruiging (https://t.co/ILGTazRhNK) and I still haven't tried FilterResponseNorm (https://t.co/0w2SemtPLg). Lots of others that seem to come and go without getting traction...
  
          3



Ross Wightman    @wightmanr    11/25/2021      

Has anyone out there done some extensive experiments with normalization layers outside the usual suspects (BN, GN, LN, IN) on image tasks w/ large(ish) natural image datasets (ImageNet, COCO, or >, etc) and found some good setups that they're not sharing with the rest of us?
  
    2         12



Ross Wightman    @wightmanr    11/25/2021      

I'm aware of `Beyond BatchNorm: Towards a Unified Understanding...` - https://t.co/3qysPQXzzt. It's great, but tasks / networks were small and moving to larger tasks and a wider variety of model architectures (and performing well across them) is non-trivial.
  
          3



Ross Wightman    @wightmanr    11/25/2021      

A different view -- minus the datast-training labels, and with GMACs (FLOPs) -- the positions shift. Here, the NS EfficientNets are sitting above everything else, including LeViT.
  
    1         1



Ross Wightman    @wightmanr    11/25/2021      

The timm monster 'pareto curve' (x-axis log scale). 500+ points in one crazy graph. Dug this up today due to recent discussions. This variant is color / shape coded across architecture variants and pretrain dataset - technique. On GPUs, LeViT stands out where most don't.
  
    1         1



Ross Wightman    @wightmanr    11/24/2021      

A lot of architectures and training techniques are trumpeted on 'improvements' within that range. Lots of ablation studies have steps that could easily be noise (many hparam adjustments could alter the path of random number generators or sample selection over training).
  
          3



Ross Wightman    @wightmanr    11/23/2021      

No, because the baselines. There was a paper written about that ;) They used the same code, they used aug scheme that looks roughly based on DeiT (co authors of said paper) and yet didn't quote better RN scores (with same training setup) that they should be aware of.
  
          5