“Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.”
The Foundation Models nomenclature is getting picked up🧐 "Florence: A New Foundation Model for Computer Vision" aka SOTA SOTA SOTA in pretty much everything CV😅 by by Lu Yuan et al. including @ddongchen @NoelCodella @dgdsxyushi @jw2yang4ai
Stanford: "we call our models 'Foundation Models'". Microsoft: Hold my beer "we use the name of Florence as the origin of the trail for exploring vision foundation models, as well as the birthplace of Renaissance."

Florence: A New Foundation Model for Computer Vision abs: sota results in majority of 44 representative benchmarks, ImageNet-1K zero-shot classification with top-1 accuracy of 83.74 and the top-5 accuracy of 97.18, 62.4 mAP on COCO fine tuning
A strange form of tunneling vision I see often in AI consists in beginning to think that humans work just like ML models, that we are in the end just RL agents or language models (depending what one works on) The human experience is much more diverse & complex than any of these
