Scalable evaluation

We aim to reduce the human effort needed to evaluate large language models. For example, Mauve enables automatic evaluation of text at a distributional level using information divergences.

Related publications

2023

MAUVE Scores for Generative Models: Theory and Practice

Krishna Pillutla, Lang Liu, John Thickstun, and 6 more authors

JMLR, Nov 2023

2021

Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals

Lang Liu, Krishna Pillutla, Sean Welleck, and 3 more authors

In Advances in Neural Information Processing Systems, Nov 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, and 4 more authors

In Advances in Neural Information Processing Systems, Nov 2021

NeurIPS 2021 Outstanding Paper Award