delicious

METR - Model Evaluation & Threat Research

A murb'ed feed, posted 27 days ago filed in ai, performance, coding & productivity.

METR claims to be an independent R&D organisation accessing objectively the quality of LLM models. Whilst posing as critical, they foremost share positive scenario’s. E.g. their main page prominently share the statistics for a 50% chance in succeeding in a 10h task, while 80% is probably what people are really after, which is still at 1 hour tasks.

The only negative statistic I was able to find was on actual productivity boosts from these models, but they heavily downplay the importance of this.

Go to the original link.