Advanced Natural Language Processing / Spring 2026

Advanced natural language processing is a graduate-level course on natural language processing aimed at students who are interested in doing cutting-edge research in the field. The course focuses on modern methods using neural networks, covering the fundamentals of generative models with a particular focus on the foundations of large language models. This includes the modeling, learning, and inference algorithms required for building cutting-edge NLP systems. The class culminates in an open-ended, creative research project on a cutting-edge topic in NLP.

Course Details

Instructor

Sean Welleck

Teaching Assistants

Logistics

Class times: TR 3:30pm - 4:50pm
Room: TEP 1403
Course identifier: LTI 11-711
Piazza: Piazza
Code: GitHub

Office hours:

	Location	Day	Time
Weihua Du	GHC 6711	Monday	10:30am - 11:30am
Ibrahim Aldarmaki	WEH 3002	Monday	4:00pm - 5:00pm
Zhen Wu	GHC 6717	Wednesday	11:00am - 12:00pm
Dareen Alharthi	WEH 3110	Wednesday	3:30pm - 4:30pm
Daniel Chechelnitsky	GHC 5701	Thursday	2:00pm - 3:00pm
Andy Liu	GHC 6601	Thursday	5:00pm - 6:00pm
Sean Welleck	GHC 6513	Friday	11:00am - 12:00pm
Arnav Yayavaram	GHC 5417	Friday	3:00pm - 4:00pm
Siddharth Yayavaram	GHC 5417	Friday	5:00pm - 6:00pm

Grading

Quizzes: 5 in-class quizzes.
Exam: 1 in-class exam.
Assignments: 4 assignments (the final one being the project).

Grade weights

Component	Weight
Assignment 1	10%
Assignment 2	10%
Assignment 3	15%
Assignment 4 (Final Project)	25%
Exam	20%
Quizzes	15%
In-class Participation	5%

The final grades will be determined based on the weighted average of the components above. Cutoffs for final grades will be approximately 97+ A+, 93+ A, 90+ A-, 87+ B+, 83+ B, 80+ B-, etc., although we reserve some flexibility to change these thresholds slightly.

Course description

The course covers key algorithmic foundations and applications of advanced natural language processing. While there are no hard course pre-requisites, programming experience in Python and knowledge of probability and linear algebra are expected. In practice, prior experience with deep learning is highly recommended.

Acknowledgements. This semester's course is based on previous versions: Advanced NLP Fall 2025 (Sean Welleck), Advanced NLP Spring 2025 (Sean Welleck), Advanced NLP Fall 2024 (Graham Neubig).

Class format

Lectures: For each class there will be:

Reading: Most classes will have associated reading material. You are expected to read the main readings. Topics within the main readings that are covered in the lecture are eligible for quizzes and the exam. The Additional References provide additional content that is either related to the lecture or referenced in the lecture; you may find the additional references useful for gaining a better understanding of the lecture content.
Lecture and Interactive Activities: There will be a lecture about the class materials, including interactive elements such as discussions and polls.
Code/Data Walkthrough: Some classes will involve looking through code or data.

Quizzes: There will be 5 in-class quizzes throughout the semester. Quizzes take place in the first 20 minutes of the lecture. See the Quizzes section for details.

Exam: There will be 1 in-class exam. See the Exam section for details.

Questions and Discussion: Ideally in class or through Piazza so we can share information with the class, but coming to office hours is also encouraged.

Schedule

Class

Type

Topic

Resources
# 1 01/13/2026

Lecture

Fundamentals
Introduction & Fundamentals
[slides]
[code]
Main readings:
- Natural Language Understanding with Distributed Representation (Ch. 1) (Cho 2015)
- Machine Learning: a Lecture Note (Ch. 1) (Cho 2025)
# 2 01/15/2026

Lecture

Fundamentals
Fundamentals: Learned Representations
[slides]
[code]
Main readings:
- Natural Language Understanding with Distributed Representation (Ch. 2, Ch. 3) (Cho 2015)
Additional references
- (Video) Let's build the GPT Tokenizer (Karpathy 2024)
# 3 01/20/2026

Lecture

Fundamentals
Fundamentals: Autoregressive Language Modeling
[slides]
[code]
Main readings:
- Natural Language Understanding with Distributed Representation (Ch. 5 up to 5.4.2) (Cho 2015)
Additional references
- A Neural Probabilistic Language Model (Bengio et al 2003)
- Understanding the difficulty of training deep feedforward neural networks (Glorot & Bengio 2010)
# 4 01/22/2026

Lecture

Architectures
Architectures I: Recurrent Neural Networks
[slides]
[code]
Main readings:
- Natural Language Understanding with Distributed Representation (Ch. 4, Ch. 5.5-5.6, Ch. 6) (Cho 2015)
Additional references
- Recurrent neural network based language model (Mikolov et al 2010)
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al 2014)
- Why LSTMs Stop Your Gradients From Vanishing: A View from the Backwards Pass (Weber 2017)
- Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al 2015)
# 5 01/27/2026

Lecture

Architectures
Architectures II: Attention and Transformers
[slides]
[code]
Main readings:
- Attention Is All You Need (Vaswani et al 2017)
- The Annotated Transformer (Rush et al 2018)
Additional references
- Root Mean Square Layer Normalization (Zhang & Sennrich 2019)
- On Layer Normalization in the Transformer Architecture (Xiong et al 2020)
- RoFormer: Enhanced Transformer with Rotary Position Embedding (Su et al 2021)
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints (Ainslie et al 2023)
- (Helpful Blog Post): Why Are Sines and Cosines Used For Positional Encoding? (Muhammad 2023)
# 5 01/28/2026

Assignment Released
Assignment 1 Released
# 6 01/29/2026

Lecture

Learning & Inference
Learning I: Pretraining
[slides]
[code]
Main readings:
- Language Models are Unsupervised Multitask Learners (Radford et al 2019)
- The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale (Penedo et al 2024)
Additional references
- OLMo 3 (AI2 2025)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al 2018)
- LLaMA: Open and Efficient Foundation Language Models (Touvron et al 2023)
- OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text (Paster et al 2023)
- Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research (Soldaini et al 2024)
- Scaling Laws for Neural Language Models (Kaplan et al 2020)
- Training Compute-Optimal Large Language Models (Hoffmann et al 2022)
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism (Deepseek AI 2024)
- Language Modeling Is Compression (Delétang et al 2023)
# 7 02/03/2026

Quiz

Quiz
Quiz 1
# 7 02/03/2026

Lecture

Learning & Inference
Scaling Laws and In-Context Learning
[slides]
[code]
Main readings:
- Language Models are Few-Shot Learners (Brown et al 2020)
- Deep Learning Scaling is Predictable, Empirically (Hestness et al 2017)
Additional references
- Scaling Laws for Neural Language Models (Kaplan et al 2020)
- Training Compute-Optimal Large Language Models (Hoffmann et al 2022)
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism (Deepseek AI 2024)
# 8 02/05/2026

Lecture

Learning & Inference
Learning III: Fine-tuning and Distillation
[slides]
[code]
Main readings:
- LoRA: Low-Rank Adaptation of Large Language Models (Hu et al 2021)
- Sequence-Level Knowledge Distillation (Kim & Rush 2016)
Additional references
- Universal Language Model Fine-tuning for Text Classification (Howard & Ruder 2018)
- Cross-Task Generalization via Natural Language Crowdsourcing Instructions (Mishra et al 2021)
- Finetuned Language Models Are Zero-Shot Learners (Wei et al 2021)
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks (Wang et al 2022)
- Self-Instruct: Aligning Language Models with Self-Generated Instructions (Wang et al 2023)
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4 (Mukherjee et al 2023)
- Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (West et al 2022)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al 2023)
# 9 02/10/2026

Lecture

Learning & Inference
Inference II: Decoding Algorithms
[slides]
[code]
Main readings:
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models (Sections 1-3) (Welleck et al 2024)
# 10 02/12/2026

Lecture
Guest Lecture:
Akari Asai

Modeling
Modeling I: Retrieval and RAG
[slides]
Main readings:
- Retrieval-based Language Models and Applications (Asai et al 2023)
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (Asai et al 2024)
Additional references
- Task-aware Retrieval with Instructions (Asai et al 2023)
- When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (Mallen & Asai et al 2023)
- Reliable, Adaptable, and Attributable Language Models with Retrieval (Asai et al 2024)
- Scaling Retrieval-Based Language Models with a Trillion-Token Datastore (Shao et al 2024)
- OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs (Asai et al 2024)
# 10 02/12/2026

Assignment Due
Assignment 1 Due
# 10 02/12/2026

Assignment Released
Assignment 2 Released
# 11 02/17/2026

Quiz

Quiz
Quiz 2
# 11 02/17/2026

Lecture

Modeling
Modeling II: Multimodal I
[slides]
[code]
Main readings:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al 2020)
- Learning Transferable Visual Models From Natural Language Supervision (Radford et al 2021)
Additional references
- Visual Instruction Tuning (Liu et al 2023)
- Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models (Deitke et al 2024)
- PaliGemma: A versatile 3B VLM for transfer (Beyer et al 2024)
# 12 02/19/2026

Lecture

Modeling
Modeling III: Multimodal II
[slides]
[code]
Main readings:
- Neural Discrete Representation Learning (van den Oord et al 2017)
- Taming Transformers for High-Resolution Image Synthesis (Esser et al 2021)
Additional references
- Chameleon: Mixed-Modal Early-Fusion Foundation Models (Meta 2024)
- (Blog Post) Image GPT (OpenAI 2020)
- Generative Pretraining from Pixels (Chen et al 2020)
- Pixel Recurrent Neural Networks (van den Oord et al 2016)
- Zero-Shot Text-to-Image Generation (Ramesh et al 2021)
- Auto-Encoding Variational Bayes (Kingma & Welling 2013)
- Rethinking Generative Image Pretraining: How Far Are We from Scaling Up Next-Pixel Prediction? (Yan et al 2025)
# 13 02/24/2026

Lecture

Evaluation
Evaluation Techniques
[slides]
[code]
Main readings:
- Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations (Miller 2024)
# 14 02/26/2026

Lecture

Evaluation
Research Skills and Experimental Design
[slides]
[code]
Main readings:
- Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations (Miller 2024)
# 14 02/26/2026

Assignment Due
Assignment 2 Due
# 14 02/26/2026

Assignment Released
Assignment 3, 4 Released
# 03/03/2026

Break

No Class
Spring Break
# 03/05/2026

Break

No Class
Spring Break
# 15 03/10/2026

Quiz

Quiz
Quiz 3
# 15 03/10/2026

Lecture

Modeling
Modeling IV: Diffusion and Flows
Main readings:
- Denoising Diffusion Probabilistic Models (Ho et al 2020)
- Flow Matching for Generative Modeling (Lipman et al 2022)
# 16 03/12/2026

Lecture

RL and Agents
Reinforcement Learning I: Fundamentals
Main readings:
- Deep Reinforcement Learning: Pong from Pixels (Karpathy 2016)
- Spinning Up in Deep RL (Part 1, Part 3, Vanilla PG, PPO) (OpenAI)
Additional references
- Proximal Policy Optimization Algorithms (Schulman et al 2017)
- High-Dimensional Continuous Control Using Generalized Advantage Estimation (Schulman et al 2015)
# 17 03/17/2026

Lecture

RL and Agents
Reinforcement Learning II: Applications
Main readings:
- Training language models to follow instructions with human feedback (Ouyang et al 2022)
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-AI 2025)
Additional references
- AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
- Deep reinforcement learning from human preferences (Christiano et al 2017)
- Fine-Tuning Language Models from Human Preferences (Ziegler et al 2019)
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
# 18 03/19/2026

Lecture

RL and Agents
Agents
Main readings:
- World of Bits: An Open-Domain Platform for Web-Based Agents (Shi et al 2017)
- WebGPT: Browser-assisted question-answering with human feedback (Nakano et al 2022)
Additional references
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (Yao et al 2022)
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (Yang et al 2024)
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks (Koh et al 2024)
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments (Xie et al 2024)
- Programming with Pixels: Computer-Use Meets Software Engineering (Aggarwal & Welleck 2025)
# 03/24/2026

Project Hours

Course Project
Project Hours / Assignment 3.1 Presentations
# 03/24/2026

Assignment Due
Assignment 3.1 Due
# 19 03/26/2026

Quiz

Quiz
Quiz 4
# 19 03/26/2026

Lecture

Scaling and Efficiency
Quantization
Main readings:
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Dettmers et al 2022)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al 2023)
Additional references
- 8-bit Optimizers via Block-wise Quantization (Dettmers et al 2021)
- The case for 4-bit precision: k-bit Inference Scaling Laws (Dettmers & Zettlemoyer 2022)
# 20 03/31/2026

Lecture

Scaling and Efficiency
Parallelism and Distributed Training
Main readings:
- The Ultra-Scale Playbook: Training LLMs on GPU Clusters (Tazi et al 2025)
# 21 04/02/2026

Lecture

Scaling and Efficiency
Mixture of Experts
Main readings:
- A Review of Sparse Expert Models in Deep Learning (Fedus et al 2022)
- OLMoE: Open Mixture-of-Experts Language Models (Muennighoff et al 2024)
Additional references
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models (Dai et al 2024)
# 21 04/02/2026

Assignment Due
Assignment 3.2 Due
# 22 04/07/2026

Quiz

Quiz
Quiz 5
# 22 04/07/2026

Lecture

Scaling and Efficiency
Scaling Sequence Length
Main readings:
- Self-attention Does Not Need O(n2) Memory (Rabe & Staats 2021)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu & Dao 2023)
Additional references
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (Dao et al 2022)
- Ring Attention with Blockwise Transformers for Near-Infinite Context (Liu et al 2023)
# 04/09/2026

Break

No Class
Spring Carnival
# 23 04/14/2026

Lecture

Scaling and Efficiency
Test-Time Scaling
Main readings:
- From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models (Sections 4-7) (Welleck et al 2024)
Additional references
- NeurIPS 2024 LLM Inference Tutorial (Reading List)
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-AI 2025)
- s1: Simple test-time scaling (Muennighoff et al 2025)
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning (Aggarwal & Welleck 2025)
# 04/16/2026

Exam

Exam
Exam
# 04/21/2026

Poster Session

Course Project
Poster Session 1
# 04/23/2026

Poster Session

Course Project
Poster Session 2
# 04/27/2026

Assignment Due
Assignment 4 Due

Quizzes

All quizzes are closed-book. All quizzes are in-person. They take place in the first 20 minutes of the lecture (15 minutes for the quiz, 5 minutes for handing out and collecting the quizzes). You are required to attend all quizzes.

Quizzes cover content from the preceding lectures, not including lectures that have already been covered by a quiz.

When computing your final average quiz grade, we will drop your lowest quiz grade score.

Exam

The exam is closed-book. It is in-person and occurs in the time and place of the lecture. You are required to attend the exam.

Assignments

The aim of the assignment and project is to build basic understanding and advanced implementation skills needed to build cutting-edge systems or do cutting-edge research using neural networks for NLP, culminating in a project that demonstrates these abilities.

Read all the instructions on this page carefully
You are responsible for reading these instructions and following them carefully. If you do not, you may be marked down as a result.

Assignment Policies

Working in Teams:

There are 4 assignments in the class. Assignments 1 and 2 must be done individually, while Assignments 3 and 4 must be done in teams of 2-3 (individual submissions will not be accepted for these assignments). If you are having trouble finding a group, the instructor and TAs will help you find one after the first initial survey.

Submission Information:

To submit your assignment you must submit via Canvas a zip file containing:

your code (A1, A2, A3.2, A4): This should be in a directory "code" in the top directory unless specified otherwise.
system outputs (A1, A2): The format will be specified separately for each assignment.
a report (A2, A3.1, A3.2, A4, optional for A1): This should be named "report.pdf" in the top directory. The report can be up to 7 pages for A2, 5 pages for A3.1 and A3.2, and 9 pages for A4. References are not included in the page count, and it is OK to submit appendices that include supplementary information such as hyperparameter settings or additional output examples, although there is no guarantee that the TAs will read them. Submissions that exceed the page count will be penalized one third grade for each page over (e.g., A to A- or A- to B+). You may also submit report.pdf for assignment 1 if you have any interesting information to convey to the TAs, for example, if you did anything interesting above and beyond the minimal requirements.
a link to a GitHub repository containing your code (A2, A3.2, A4): This should be a single line file "github.txt" in the top directory. Your GitHub repository must be viewable to the TAs in charge of the assignment by the submission deadline. If your repository is private, make it accessible to the TAs by the submission deadline. If your repository is not visible to the TAs, your assignment will not be considered complete, so if you are worried, please submit well in advance of the deadline so we can confirm the submission is visible. For group assignments (A3, A4), we use this repository to check contributions of all team members.

Late Day Policy:

In case there are unforeseen circumstances that don't let you turn in your assignment on time, 5 late days total for assignments 1, 2, 3.1, and 3.2 will be allowed. Note that other than these late days, we will not be making exceptions and extending deadlines except for documented health reasons, so please try to be frugal with your late days and use them only if necessary. Assignments that are late beyond the allowed late days will be graded down one third-grade per day late (e.g., A to A- for one day, and A to B+ for two days).

Plagiarism/Code Reuse Policy:

All assignments are expected to be conducted under the CMU policy for academic integrity. All rules here apply and violations will be subject to penalty including zero credit on the assignment, failing the course, or other disciplinary measures. In particular, in your implementation:

Code or pseudo-code provided by the TAs or instructor may be used freely without restriction.
For assignment 2, you may not just re-use an existing implementation written by someone else. The implementation should basically be your own.
Code written by other students in the class cannot be used (except, obviously, you can share code within your group for assignments 3 and 4).
If you are doing a similar project for a graded class at CMU (including independent studies or directed research), you must declare so on your report, and note which parts of the project are for 11-711, and which parts are for the other class. Consult with the Instructor during office hours or on Piazza if you are unsure.

Use of AI Tools:

We adopt the policy from Stanford CS336:

Prompting LLMs such as ChatGPT is permitted for low-level programming questions or high-level conceptual questions about language models, but using it directly to solve the problem is prohibited. We strongly encourage you to disable AI autocomplete (e.g., Cursor Tab, GitHub CoPilot) in your IDE when completing assignments (though non-AI autocomplete, e.g., autocompleting function names is totally fine). We have found that AI autocomplete makes it much harder to engage deeply with the content.

Note: special exceptions and restrictions exist for the course project (Assignments 3 and 4). These will be described in the assignment writeup.

Consulting w/ Instructors/TAs:

For assignments and projects, you are free to consult with the TAs and instructors during office hours, project hours, and through Piazza. If you don't have much experience with NLP, it will be helpful to consult with the instructors and TAs to learn about how to do the assignments and course project.

Because this is a project-based course, we assume that many of the students taking the course will be interested in turning their assignments or project into research papers. In this case, if you have received useful advice from the instructor or TAs that made the project significantly better, consider inviting them to be co-authors on the paper. Of course, you do not need to do so just because the paper is a result of the class, only if you feel that their advice or help made a contribution.

Details of Each Assignment

Assignment 1: Build Your Own LLaMa (Individual assignment)
- Released: Jan 28
- Due: Feb 12
Assignment 2: End-to-end NLP System Building (Individual assignment)
- Released: Feb 12
- Due: Feb 26
Assignment 3: Project Proposal & State-of-the-art Reimplementation (Group assignment)
- Assignment 3.1: Literature Review & Project Proposal
  - Released: Feb 26
  - Due: Mar 24
- Assignment 3.2: Baseline Reproduction
  - Released: Feb 26
  - Due: Apr 2
Assignment 4: Final Project (Group assignment)
- Released: Feb 26
- Due: Apr 27

Assignment details to be provided later. For an idea of the course project (Assignments 3.1, 3.2, 4), you can see the Fall 2025 version of the course.

Poster Presentation

Time/Location

Time: During class time (3:30pm - 4:50pm), April 21 and April 23
Location: TBD

We will announce which teams will be presenting on which days during the course. If you have a major, immovable conflict that will prevent your team from presenting on one day please contact us via piazza and we will try to make accommodations.

Goals and Grading

The intention of the poster is several-fold:

That you share your preliminary results with the TAs and instructor so we can give feedback to make any last adjustments to improve your final project report.
That you can see the other projects in the class to learn from them and get any ideas that may improve your final project report.
That you can practice explaining the work that you did.

Posters are graded pass/fail based on (a) whether you completed a poster; (b) whether you attended your poster session. No exceptions will be made; you must attend your poster session in order to receive a grade.

What information should be included in a poster? It should be mostly:

What is the problem you’re solving
What is your method for solving that problem
What are the results

There is not a set format for creating a poster, but if you would like some guidance, I would suggest creating three columns, where the left one describes “1”, the middle one describes “2”, and the right one describes “3”. The middle one can be a bit wider.

Poster Printing

If you are a member of SCS, we suggest that you use SCS poster printing. If you are not a member of SCS, you can send your PDF to the TAs no less than 5 days before your presentation, and we will print it for you.