Scalable evaluation

We aim to reduce the human effort needed to evaluate large language models. For example, Mauve enables automatic evaluation of text at a distributional level using information divergences.

Related publications

2023

  1. Krishna Pillutla, Lang Liu, John Thickstun, and 6 more authors
    ArXiv, Nov 2023

2021

  1. Lang Liu, Krishna Pillutla, Sean Welleck, and 3 more authors
    In Advances in Neural Information Processing Systems, Nov 2021
  2. Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, and 4 more authors
    In Advances in Neural Information Processing Systems, Nov 2021