Learning from feedback | L3 Lab at CMU

Left: Quantized Reward Konditioning (Quark); Right: Unlikelihood training.

Related publications

2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Zhiqing Sun, Longhui Yu, Yikang Shen, and 4 more authors

arXiv preprint arXiv:2403.09472, 09–15 jun 2024

2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing Lu, Faeze Brahman, Peter West, and 14 more authors

EMNLP, 09–15 jun 2023
STEER: Unified Style Transfer with Expert Reinforcement

Skyler Hallinan, Faeze Brahman, Ximing Lu, and 3 more authors

EMNLP Findings, 09–15 jun 2023
Generating Sequences by Learning to Self-Correct

Sean Welleck, Ximing Lu, Peter West, and 4 more authors

In The Eleventh International Conference on Learning Representations , 09–15 jun 2023

2022

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

Jiacheng Liu, Skyler Hallinan, Ximing Lu, and 4 more authors

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs

Knowledge underpins reasoning. Recent research demonstrates that when relevant knowledge is provided as additional context to commonsense question answering (QA), it can substantially enhance the performance even on top of state-of-the-art. The fundamental challenge is where and how to find such knowledge that is high quality and on point with respect to the question; knowledge retrieved from knowledge bases are incomplete and knowledge generated from language models are inconsistent. We present Rainier, or Reinforced Knowledge Introspector, that learns to generate contextually relevant knowledge in response to given questions. Our approach starts by imitating knowledge generated by GPT-3, then learns to generate its own knowledge via reinforcement learning where rewards are shaped based on the increased performance on the resulting question answering. Rainier demonstrates substantial and consistent performance gains when tested over 9 different commonsense benchmarks: including 5 datasets that are seen during model training, as well as 4 datasets that are kept unseen. Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of commonsense knowledge elicited from GPT-3.
QUARK: Controllable Text Generation with Reinforced Unlearning

Ximing Lu, Sean Welleck, Jack Hessel, and 5 more authors

In Advances in Neural Information Processing Systems, Dec 2022

2020

MLE-guided parameter search for task loss minimization in neural sequence modeling

Sean Welleck, and Kyunghyun Cho

In AAAI Conference on Artificial Intelligence, Dec 2020
Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Margaret Li, Stephen Roller, Ilia Kulikov, and 4 more authors

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020

Abs

Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability. We demonstrate the efficacy of our approach across several dialogue tasks.
Neural Text Generation With Unlikelihood Training

Sean Welleck, Ilia Kulikov, Stephen Roller, and 3 more authors

In International Conference on Learning Representations, Jul 2020