Learning from feedback

We draw on ideas from reinforcement learning to design new algorithms for learning from feedback. For example, Quark brings ideas from batch- and goal-conditioned RL to language.

Left: Quantized Reward Konditioning (Quark); Right: Unlikelihood training.

Related publications


  1. Ximing Lu, Faeze Brahman, Peter West, and 14 more authors
    EMNLP, 09–15 jun 2023
  2. Skyler Hallinan, Faeze Brahman, Ximing Lu, and 3 more authors
    EMNLP Findings, 09–15 jun 2023
  3. Sean Welleck, Ximing Lu, Peter West, and 4 more authors
    In The Eleventh International Conference on Learning Representations , 09–15 jun 2023


  1. Jiacheng Liu, Skyler Hallinan, Ximing Lu, and 4 more authors
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022
  2. Ximing Lu, Sean Welleck, Jack Hessel, and 5 more authors
    In Advances in Neural Information Processing Systems, Dec 2022


  1. Sean Welleck, and Kyunghyun Cho
    In AAAI Conference on Artificial Intelligence, Dec 2020
  2. Margaret Li, Stephen Roller, Ilia Kulikov, and 4 more authors
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020
  3. Sean Welleck, Ilia Kulikov, Stephen Roller, and 3 more authors
    In International Conference on Learning Representations, Jul 2020