METR - Model Evaluation & Threat Research
METR claims to be an independent R&D organisation accessing objectively the quality of LLM models. Whilst posing as critical, they foremost share positive scenario’s. E.g. their main page prominently share the statistics for a 50% chance in succeeding in a 10h task, while 80% is probably what people are really after, which is still at 1 hour tasks.
The only negative statistic I was able to find was on actual productivity boosts from these models, but they heavily downplay the importance of this.