I've noticed lots of papers learn "low" dimensional embeddings of images (say d=50), then do UMAP/tSNE/whatever to bring d down to 2-3. But has anyone tried to directly learn a 2D embedding? Is something fundamental (eg bumpier landscape) preventing this from being the default?
