Model FLOPs utilization (MFU) introduced in Pathways LM work (https://t.co/5K8SIwk9WD, sec 4.1).
— Awni Hannun (@awnihannun) May 12, 2022
MFU is roughly the ratio of the measured iterations-per-second to the theoretical max (for a given hardware). Mostly designed to penalize activation caching / redundant work. https://t.co/4TJ2EHbcQJ