Course Description

Models of language based on large Neural Networks — otherwise known as Deep Learning — are revolutionizing the way we work, learn, and communicate. But how do these models work on a fundamental level? What enables their (seemingly) human-like command of language? In this course, we will focus on building an understanding of neural language models from the ground up, starting with mathematical fundamentals, introducing crucial topics from Computational Linguistics and Machine Learning, surveying current and historical approaches to building neural LMs, and gaining hands-on experience training and analyzing such models on UR's BlueHive computing cluster.

Days	Time	Location
Monday and Wednesday	10:25 - 11:40 AM	Lattimore 513

Teaching Staff

Role	Name	Office	Office Hours
Instructor	C.M. Downey	Lattimore 507	TBD

"Required" Textbook

Required readings are posted in the schedule below, drawn mostly from the following online textbook (abbreviated JM in the schedule), which is a very good general resource:

(JM) Jurafsky and Martin, Speech and Language Processing (3rd ed)

Prerequisites

1 semester of Calculus (differentiation concepts and rules)
Programming in Python
Linux/Unix commands

Grading and Policies

Attendance

Class attendance and participation are expected and count towards your grade. I will keep track of attendance. Students are allowed to be absent from up to four sessions for any reason, without needing to contact me, and without penalty. These excused absences may be used for travel, illness, catching up with other courses, etc. However, unexecused absences beyond these four sessions will count against the student's final attendance grade, with the exception of important obligations listed below.

Quizzes

Short quizzes will be held at the beginning of class on most Mondays, and occasionally on Wednesdays if there is no class on Monday. All quiz dates and topics are noted on the course calendar (see below). The quizzes are primarily meant to promote active engagement with the material, and grades will be adjusted so that a score of 85% is the median, within the undergraduate and graduate sections separately. Grades will ONLY be adjusted upward, never downward. For example, if the median grade on a quiz for undergraduates is 75%, all undergraduate grades will be shifted 10% higher (absolute). HOWEVER, quiz grades will be capped at 100%. If the median grade is 85% or higher, grades will not be adjusted.

Homework

Students will complete 6-8 homeworks, comprised of both written and (Python) programming assignments. Unless noted otherwise on the schedule, homeworks will be released on Wednesdays, and due at 11pm on the following Wednesday. All homework will be submitted via Blackboard.

Term Project

Students will work in assigned groups to complete a substantial term project focused on answering a question about language or linguistic theory with deep learning methods. This will minimally involve training or fine-tuning a neural model of language (though not necessarily a Language Model in the technical sense). This project will be scientifically-oriented, i.e. going beyond simply engineering a model to solve an NLP task, and seeking to extend scientific understanding of Language or Language Models. Within these parameters, student groups are encouraged to creatively pursue a topic of interest to them.

Project milestones will be assigned throughout the semester to ensure timely progress and feasible goals (more details to be given as the semester semester progresses). At the end of the semester, each project group will present their work and results to the class, submit a Github repository containing project software, and submit a final writeup in the style of a scientific research paper.

Late work

All deadlines and meeting times for this class are in "Eastern Time". Please note: on Sunday November 2, this will change from Eastern Daylight Time (EDT/UTC-4) to Eastern Standard Time (EST/UTC-5). All work should be submitted by 11:00pm the day it is due. Work that is received late will incur the following penalties:

Up to 1 hour late: 5%
Up to 24 hours late: 10%
Up to 48 hours late: 20%
Later than 48 hours: not graded (0 for the assignment)

Extensions (without penalty) may be offered if they are requested within a reasonable amount of time (relative to the reason for the extension) before the work is due. Please don't hesitate to ask for an extension if you need one.

Final grading

40%: Homework assignments
30%: Term project
20%: In-class quizzes
10%: Attendance/participation

Exceptions

Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected missed work in advance, as soon as possible.

Academic honesty

All assignments and activities associated with this course must be performed in accordance with the University of Rochester's Academic Honesty Policy. More information is available here. Please note: The use of Generative AI to produce any part of the written or programming homeworks is not allowed. Generative AI is allowed for programming work on the Term Project only (the final writeup must be your own work).

Schedule

(subject to change)

Date	Topics + Slides	Required Readings	Events
Aug 25	Introduction, Deep Learning History
Aug 27	Vectors and Linear Transformations	Essence of Linear Algebra Ch.1-8 (Youtube)
Sep 1	Labor Day: no class
Sep 3	The Perceptron	JM 4.0-4.3	in-class quiz: vectors, matrices, linear transformations
Sep 8	Supervised Learning, Gradient Descent	JM 4.5-4.7	in-class quiz: function derivatives (Calc. pre-reqs)
Sep 10	Computation Graphs, Backpropagation	JM 6.6.3 - 6.6.5 Calculus on computational graphs Yes, you should understand backprop	hw1 released [pdf, tex] [due Sep 17]
Sep 15	Word Vectors, word2vec	JM 5.0, 5.2-5.9	in-class quiz: gradients, computation graphs
Sep 17	word2vec (cont.) Language Modeling, N-Grams	JM 3.0-3.6	hw1 due
Sep 22	Class canceled
Sep 24	BlueHive Cluster		hw2 released [pdf, tex] [due Oct 6]
Sep 29	PyTorch Feed-forward Language Models	JM 6.0-6.3, 6.5-6.7 A Neural Probabilistic Language Model (Bengio et al 2003)
Oct 1	Feed-forward Language Models cont. Recurrent Neural Networks	JM 13.0-13.3 The Unreasonable Effectiveness of Recurrent Neural Networks
Oct 6	RNNs (cont.) Vanishing Gradients, RNN Variants	JM 13.5-13.6 Understanding LSTMs Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation On the difficulty of training recurrent neural networks	in-class quiz: Chain Rule (Prob), n-grams, FFNNs hw2 due
Oct 8	Sampling and Generation	JM 7.4, 8.6
Oct 13	Fall Break: no class
Oct 15	Sequence-to-Sequence, Attention	JM 13.7-13.8 Sequence to Sequence Learning with Neural Networks (original seq2seq paper) Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq + attention paper)
Oct 20	Transformers 1	JM 8.0-8.7 Attention is All You Need (original Transformer paper) The Annotated Transformer The Illustrated Transformer	in-class quiz: RNNs, LSTMs, seq2seq
Oct 22	Transformers 2	JM 8.0-8.7	hw3 released [pdf, tex] [due Oct 29]
Oct 27	Pre-training & Fine-tuning 1	JM 7.1, 7.5, 10 Contextual Word Representations: Putting Words into Computers The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Oct 29	Pre-training & Fine-tuning 2	"	hw3 due
Nov 3	Text Tokenization		in-class quiz: Attention, Transformers
Nov 5	Multilingual Language Models	Cross-Lingual Language Model Pretraining Optional / peruse if interested: Are All Languages Created Equal in Multilingual BERT? Emerging Cross-lingual Structure in Pretrained Language Models On the Cross-lingual Transferability of Monolingual Representations Word Translation Without Parallel Data Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining
Nov 10	Class canceled
Nov 12	"Large Language Models" (LLMs) 1	JM 7.2-7.3, 9.0-9.1	in-class quiz: Pre-training, Tokenization, Multilingual LMs
Nov 17	"Large Language Models" (LLMs) 2	JM 9.2-9.4
Nov 19	Speech Data and Acoustics	JM 15.0-15.3
Nov 24	Neural Networks for Speech	JM 15.5-15.6
Nov 26	Thanksgiving Break: no class
Dec 1	No class session - office hours instead
Dec 3	Project Presentations
Dec 8	Project Presentations