Advanced Natural Language Processing / Spring 2025
Advanced natural language processing is an introductory graduate-level course on natural language processing aimed at students who are interested in doing cutting-edge research in the field. In it, we describe fundamental tasks in natural language processing as well as methods to solve these tasks. The course focuses on modern methods using neural networks, and covers the basic modeling, learning, and inference algorithms required therefore. The class culminates in a project in which students attempt to reimplement and improve upon a research paper in a topic of their choosing.
Course Details
Instructor
Teaching Assistants
Darsh Agrawal
Hugo Contant
Alex Fang
Akshita Gupta
Trisha Sarkar
Sanidhya Vijayvargiya
Logistics
- Class times: TR 3:30pm - 4:50pm
- Room: TEP 1403
- Course identifier: LTI 11-711
- Office hours: See Piazza
Grading
- The assignments will be given a grade of A+ (100), A (96), A- (92), B+ (88), B (85), B- (82), or below.
- The final grades will be determined based on the weighted average of the quizzes, assignments, and project. Cutoffs for final grades will be approximately 97+ A+, 93+ A, 90+ A-, 87+ B+, 83+ B, 80+ B-, etc., although we reserve some flexibility to change these thresholds slightly.
- Quizzes: Worth 20% of the grade. Your lowest 3 quiz grades will be dropped.
- Assignments: There will be 4 assignments (the final one being the project), worth respectively 15%, 15%, 20%, 30% of the grade.
Course description
The course covers key algorithmic foundations and applications of advanced natural language processing.
There are no hard pre-requisites for the course, but programming experience in Python and knowledge of probability and linear algebra are expected. It will be helpful if you have used neural networks previously.
Acknowledgements. This semester's course is adapted from Advanced NLP Fall 2024, designed and taught by Graham Neubig. The course structure (e.g., grading, course description, class format, assignments, poster presentation) is from Advanced NLP Fall 2024. Many lectures are adapted from Advanced NLP Fall 2024; please refer to individual slides.
Class format
Lectures: For each class there will be:- Reading: Most classes will have associated reading material that we recommend you read before the class to familiarize yourself with the topic.
- Lecture and Discussion: There will be a lecture and discussion regarding the class material. This will be recorded and posted online for those who cannot make the in-person class.
- Code/Data Walkthrough: Some classes will involve looking through code or data.
- Quiz: There will be a quiz covering the reading material and/or lecture material that you can fill out on Canvas. The quiz will be released by the end of the day of the class and will be due at the end of the following day.
Questions and Discussion: Ideally in class or through piazza so we can share information with the class, but emailing the TA mailing list and coming to office hours are also encouraged.
Schedule
-
ClassTypeTopicResources
-
# 1 01/14/2025Lecture
Main readings: -
# 2 01/16/2025Lecture
-
# 3 01/21/2025Lecture
Intro and Basics
Sequence Modeling I
Language Modeling FundamentalsMain readings: Additional references:- A Neural Probabilistic Language Model (Bengio et al 2003)
- Understanding the difficulty of training deep feedforward neural networks (Glorot & Bengio 2010)
- Adam: A Method for Stochastic Optimization (Kingma & Ba 2015)
- A Neural Probabilistic Language Model (Bengio et al 2003)
-
# 4 01/23/2025Lecture
Intro and Basics
Sequence Modeling II
Recurrent neural networksMain readings:- Natural Language Understanding with Distributed Representation (Ch. 4, Ch. 5.5-5.6, Ch. 6) (Cho 2015)
- Recurrent neural network based language model (Mikolov et al 2010)
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al 2014)
- Why LSTMs Stop Your Gradients From Vanishing: A View from the Backwards Pass (Weber 2017)
- Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al 2015)
- Natural Language Understanding with Distributed Representation (Ch. 4, Ch. 5.5-5.6, Ch. 6) (Cho 2015)
-
# 5 01/28/2025Lecture
Building Blocks
Sequence Modeling III
Attention and Transformers -
# 5
Office Hours (See Piazza)Recitation
Building Blocks
Annotated Transformer -
# 6 01/30/2025Lecture
Building Blocks
Learning I
Pretraining -
# 7 02/04/2025Lecture
Building Blocks
Inference I
Decoding and Generation Algorithms -
# 7
Office Hours (See Piazza)Recitation
Building Blocks
HuggingFace Transformers -
# 8 02/06/2025Lecture
Building Blocks
Inference II
In-Context Learning and Prompting -
# 8
Office Hours (See Piazza)Recitation
Building Blocks
LiteLLM and LLM APIs -
# 9 02/11/2025Lecture
Building Blocks
Learning II
Supervised Fine-Tuning -
# 10 02/13/2025Lecture
Building Blocks
Modeling IV
Retrieval and RAG -
# 10
Office Hours (See Piazza)Recitation
Building Blocks
LangChain/LlamaIndex -
# 11 02/18/2025Lecture
Building Blocks
Learning III
Reinforcement Learning -
# 12 02/20/2025Lecture
Building Blocks
Evaluation
Evaluating Language Generators -
# 13 02/25/2025Lecture
Building Blocks
Research Skills
Experimental Design and Human Annotation -
# 14 02/27/2025Lecture
Advanced Topics
Efficiency
Distillation, Quantization, and Pruning -
# 15 03/04/2025Break
No Class
Spring Break -
# 16 03/06/2025Break
No Class
Spring Break -
# 17 03/11/2025Lecture
Advanced Topics
Model Coordination
Ensembling, Merging, and Mixture of Experts -
# 18 03/13/2025Lecture
Advanced Topics
Learning III
Advanced Pretraining: Parallelism and Advanced Techniques -
# 19 03/18/2025Lecture
Course Project
Project Discussion -
# 20 03/20/2025Lecture
Advanced Topics
Learning IV
Advanced Post Training: RLHF Alternatives -
# 20
Office Hours (See Piazza)Recitation
Advanced Topics
OpenRLHF -
# 21 03/25/2025Lecture
Advanced Topics
Inference III
Meta-Generation Algorithms -
# 22 03/27/2025Lecture
Advanced Topics
Inference IV
Speeding Up Inference -
# 22
Office Hours (See Piazza)Recitation
Advanced Topics
vLLM / SGLang -
# 23 04/01/2025Lecture
Advanced Topics
Modeling III
Long Sequence Models -
# 24 04/03/2025Break
No Class
Spring Carnival -
# 25 04/08/2025Lecture
Applications and Society
Multimodal models -
# 26 04/10/2025Lecture
Applications and Society
Multilingual NLP -
# 27 04/15/2025Lecture
Applications and Society
Agents -
# 28 04/17/2025Lecture
Applications and Society
Safety and Security: Bias, Fairness, and Privacy -
# 29 04/22/2025Lecture
Course Project
Posters -
# 30 04/24/2025Lecture
Course Project
Posters
Assignments
The aim of the assignment and project is to build basic understanding and advanced implementation skills needed to build cutting-edge systems or do cutting-edge research using neural networks for NLP, culminating with a project that demonstrates these abilities through a project.
Read all the instructions on this page carefully
You are responsible for reading these instructions and following them carefully. If you do not, you may be marked down as a result.
Assignment Policies
Working in Teams:
There are 4 assignments in the class. Assignment 1 must be done individually, while Assignments 2, 3, and 4 must be done in teams of 2-3 (individual submissions will not be accepted for these assignments). If you are having trouble finding a group, the instructor and TAs will help you find one after the first initial survey.
Submission Information:
To submit your assignment you must submit via Canvas a zip file containing:
- your code: This should be in a directory “code” in the top directory unless specified otherwise.
- system outputs (assignments 1 and 2): The format will be specified separately for each assignment.
- a report (assignments 2, 3 and 4, optional for assignment 1): This should be named “report.pdf” in the top directory. This is for assignments 2, 3 and 4, and can be up to 7 pages for assignments 2 and 3 and 9 pages for assignment 4. References are not included in the page count, and it is OK to submit appendices that include supplementary information such as hyperparameter settings or additional output examples, although there is no guarantee that the TAs will read them. Submissions that exceed the page count will be penalized one third grade for each page over (e.g., A to A- or A- to B+). You may also submit report.pdf for assignment 1 if you have any interesting information to convey to the TAs, for example, if you did anything interesting above and beyond the minimal requirements.
- a link to a GitHub repository containing your code (assignments 2, 3 and 4): This should be a single line file “github.txt” in the top directory. Your GitHub repository must be viewable to the TAs in charge of the assignment by the submission deadline. If your repository is private, make it accessible to the TAs by the submission deadline. If your repository is not visible to the TAs, your assignment will not be considered complete, so if you are worried, please submit well in advance of the deadline so we can confirm the submission is visible. We use this repository to check contributions of all team members.
Late Day Policy:
In case there are unforeseen circumstances that don’t let you turn in your assignment on time, 5 late days total for assignments 2 and 3 will be allowed. Note that other than these late days, we will not be making exceptions and extending deadlines except for health reasons, so please try to be frugal with your late days and use them only if necessary. Assignments that are late beyond the allowed late days will be graded down one third-grade per day late (e.g., A to A- for one day, and A to B+ for two days).
Plagiarism/Code Reuse Policy:
All assignments are expected to be conducted under the CMU policy for academic integrity. All rules here apply and violations will be subject to penalty including zero credit on the assignment, failing the course, or other disciplinary measures. In particular, in your implementation:
- Code or pseudo-code provided by the TAs or instructor may be used freely without restriction.
- For assignment 2, you may not just re-use an existing implementation written by someone else. The implementation should basically be your own.
- Code written by other students in the class cannot be used (except, obviously, you can share code within your group for assignments 2, 3, and 4).
- If you are doing a similar project for a graded class at CMU (including independent studies or directed research), you must declare so on your report, and note which parts of the project are for 11-711, and which parts are for the other class. Consult with the TA mailing list if you are unsure.
Consulting w/ Instructors/TAs:
For assignments and projects, you are free to consult as much as you want, any time you want with the instructors and TAs. That is what we’re here for, and in no way is this considered cheating. In fact, if you don’t have much experience with NLP previously, it will be helpful to liberally consult with the instructors and TAs to learn about how to do the implementation and finish the assignments. So please do so.
Because this is a project-based course, we assume that many of the students taking the course will be interested in turning their assignments or project into research papers. In this case, if you have received useful advice from the instructor or TAs that made the project significantly better, consider inviting them to be co-authors on the paper. Of course, you do not need to do so just because the paper is a result of the class, only if you feel that their advice or help made a contribution.
Details of Each Assignment
- Assignment 1: To be released
- Assignment 2: To be released
- Assignment 3: To be released
- Assignment 4: To be released
Poster Presentation
Time/Location
- Time: TBD
- Location: TBD
Goals and Grading
The intention of the poster is several-fold:- That you share your preliminary results with the TAs and instructor so we can give feedback to make any last adjustments to improve your final project report.
- That you can see the other projects in the class to learn from them and get any ideas that may improve your final project report.
- That you can practice explaining the work that you did.
What information should be included in a poster? It should be mostly:
- What is the problem you’re solving
- What is your method for solving that problem
- What are the results