Course Description

Models of language based on large Neural Networks — otherwise known as Deep Learning — are revolutionizing the way we work, learn, and communicate. But how do these models work on a fundamental level? What enables their (seemingly) human-like command of language? In this course, we will focus on building an understanding of neural language models from the ground up, starting with mathematical fundamentals, introducing crucial topics from Computational Linguistics and Machine Learning, surveying current and historical approaches to building neural LMs, and gaining hands-on experience training and analyzing such models on UR's BlueHive computing cluster.

Days Time Location
Monday and Wednesday 10:25 - 11:40 AM Lattimore 513

Teaching Staff

Role Name Office Office Hours
Instructor C.M. Downey Lattimore 507 TBD

"Required" Textbook

Required readings are posted in the schedule below, drawn mostly from the following online textbook (abbreviated JM in the schedule), which is a very good general resource:

Prerequisites

  • 1 semester of Calculus (differentiation concepts and rules)
  • Programming in Python
  • Linux/Unix commands

Grading and Policies

Attendance

Class attendance and participation are expected and count towards your grade. I will keep track of attendance. Students are allowed to be absent from up to four sessions for any reason, without needing to contact me, and without penalty. These excused absences may be used for travel, illness, catching up with other courses, etc. However, unexecused absences beyond these four sessions will count against the student's final attendance grade, with the exception of important obligations listed below.

Quizzes

Short quizzes will be held at the beginning of class on most Mondays, and occasionally on Wednesdays if there is no class on Monday. All quiz dates and topics are noted on the course calendar (see below). The quizzes are primarily meant to promote active engagement with the material, and grades will be adjusted so that a score of 85% is the median, within the undergraduate and graduate sections separately. Grades will ONLY be adjusted upward, never downward. For example, if the median grade on a quiz for undergraduates is 75%, all undergraduate grades will be shifted 10% higher (absolute). HOWEVER, quiz grades will be capped at 100%. If the median grade is 85% or higher, grades will not be adjusted.

Homework

Students will complete 6-8 homeworks, comprised of both written and (Python) programming assignments. Unless noted otherwise on the schedule, homeworks will be released on Wednesdays, and due at 11pm on the following Wednesday. All homework will be submitted via Blackboard.

Term Project

Students will work in assigned groups to complete a substantial term project focused on answering a question about language or linguistic theory with deep learning methods. This will minimally involve training or fine-tuning a neural model of language (though not necessarily a Language Model in the technical sense). This project will be scientifically-oriented, i.e. going beyond simply engineering a model to solve an NLP task, and seeking to extend scientific understanding of Language or Language Models. Within these parameters, student groups are encouraged to creatively pursue a topic of interest to them.

Project milestones will be assigned throughout the semester to ensure timely progress and feasible goals (more details to be given as the semester semester progresses). At the end of the semester, each project group will present their work and results to the class, submit a Github repository containing project software, and submit a final writeup in the style of a scientific research paper.

Late work

All deadlines and meeting times for this class are in "Eastern Time". Please note: on Sunday November 2, this will change from Eastern Daylight Time (EDT/UTC-4) to Eastern Standard Time (EST/UTC-5). All work should be submitted by 11:00pm the day it is due. Work that is received late will incur the following penalties:

  • Up to 1 hour late: 5%
  • Up to 24 hours late: 10%
  • Up to 48 hours late: 20%
  • Later than 48 hours: not graded (0 for the assignment)

Extensions (without penalty) may be offered if they are requested within a reasonable amount of time (relative to the reason for the extension) before the work is due. Please don't hesitate to ask for an extension if you need one.

Final grading

  • 40%: Homework assignments
  • 30%: Term project
  • 20%: In-class quizzes
  • 10%: Attendance/participation

Exceptions

Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected missed work in advance, as soon as possible.

Academic honesty

All assignments and activities associated with this course must be performed in accordance with the University of Rochester's Academic Honesty Policy. More information is available here. Please note: The use of Generative AI to produce any part of the written or programming homeworks is not allowed. Generative AI is allowed for programming work on the Term Project only (the final writeup must be your own work).

Schedule

(subject to change)


Date Topics + Slides Required Readings Events
Aug 25 Introduction, Deep Learning History
Aug 27 Vectors and Linear Transformations Essence of Linear Algebra Ch.1-8 (Youtube)
Sep 1 Labor Day: no class
Sep 3 The Perceptron JM 4.0-4.3 in-class quiz: vectors, matrices, linear transformations
Sep 8 Supervised Learning, Gradient Descent JM 4.5-4.7 in-class quiz: function derivatives (Calc. pre-reqs)
Sep 10 Computation Graphs, Backpropagation JM 6.6.3 - 6.6.5

Calculus on computational graphs
Yes, you should understand backprop
hw1 released
[pdf, tex]
[due Sep 17]
Sep 15 Word Vectors, word2vec JM 5.0, 5.2-5.9 in-class quiz: gradients, computation graphs
Sep 17 word2vec (cont.)
Language Modeling, N-Grams
JM 3.0-3.6 hw1 due
Sep 22 Class canceled
Sep 24 BlueHive Cluster hw2 released
[pdf, tex]
[due Oct 6]
Sep 29 PyTorch
Feed-forward Language Models
JM 6.0-6.3, 6.5-6.7

A Neural Probabilistic Language Model (Bengio et al 2003)
Oct 1 Feed-forward Language Models cont.
Recurrent Neural Networks
JM 13.0-13.3

The Unreasonable Effectiveness of Recurrent Neural Networks
Oct 6 RNNs (cont.)
Vanishing Gradients, RNN Variants
JM 13.5-13.6

Understanding LSTMs
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
On the difficulty of training recurrent neural networks
in-class quiz: Chain Rule (Prob), n-grams, FFNNs
hw2 due
Oct 8 Sampling and Generation JM 7.4, 8.6
Oct 13 Fall Break: no class
Oct 15 Sequence-to-Sequence, Attention JM 13.7-13.8

Sequence to Sequence Learning with Neural Networks (original seq2seq paper)
Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq + attention paper)
Oct 20 Transformers 1 JM 8.0-8.7

Attention is All You Need (original Transformer paper)
The Annotated Transformer
The Illustrated Transformer
in-class quiz: RNNs, LSTMs, seq2seq
Oct 22 Transformers 2 JM 8.0-8.7 hw3 released
[pdf, tex]
[due Oct 29]
Oct 27 Pre-training & Fine-tuning 1 JM 7.1, 7.5, 10
Contextual Word Representations: Putting Words into Computers
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Oct 29 Pre-training & Fine-tuning 2 " hw3 due
Nov 3 Text Tokenization in-class quiz: Attention, Transformers
Nov 5 Multilingual Language Models Cross-Lingual Language Model Pretraining
Optional / peruse if interested:
Are All Languages Created Equal in Multilingual BERT?
Emerging Cross-lingual Structure in Pretrained Language Models
On the Cross-lingual Transferability of Monolingual Representations
Word Translation Without Parallel Data
Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining
Nov 10 Class canceled
Nov 12 "Large Language Models" (LLMs) 1 JM 7.2-7.3, 9.0-9.1 in-class quiz:
Pre-training, Tokenization, Multilingual LMs
Nov 17 "Large Language Models" (LLMs) 2 JM 9.2-9.4
Nov 19 Speech Data and Acoustics JM 15.0-15.3
Nov 24 Neural Networks for Speech JM 15.5-15.6
Nov 26 Thanksgiving Break: no class
Dec 1 No class session - office hours instead
Dec 3 Project Presentations
Dec 8 Project Presentations