Schedule
Legend: * = asynchronous
Date | Topic | Assignments | Other |
9/23 Wed | Introduction and logistics | A1 released | lesson 1 |
9/25 Fri | Linear regression 1 | A1 due before class |
lesson 2 least squares solution Gdrive |
9/28 Mon | Linear regression 2 | watch for tomorrow: polynomial fits on Gdrive |
lesson 3 optional extras: SVD overview SVD & least squares |
9/29 Tue | Polynomials & overfitting | lesson 4 notebook | |
9/30 Wed | SVD | A2 released, code, tex | lesson 5 |
10/2 Fri | PCA | lesson 6 | |
10/5 Mon | Homework help, PCA 2 | lesson 7 PCA notebook | |
10/6 Tue | Bias-variance and overfitting/underfitting | lesson 8 Linear algebra extra 12-1 pm | |
10/7 Wed | Ridge regression 1 | lesson 9 notebook | |
10/9 Fri | Ridge regression 2 | lesson 10 | |
10/12 Mon | Ridge 3, Probability and priors | A2 due before midnight (pdf on Canvas, code on github) | lesson 11 |
10/13 Tue* | Lasso (asynchronous) | A3 released, tex, code | Watch Gdrive video on Lasso, sparsity notebook lasso 1 |
10/14 Wed | Lasso 2 | lesson 13 | |
10/16 Fri | A2 feedback | Note: typo in A3 fixed | solutions on Piazza |
10/19 Mon | Classification intro | lesson 14 | |
10/20 Tue | Logistic regression 1 | lesson 15 | |
10/21 Wed | Logistic 2: losses, gradient descent | lesson 16 | |
10/23 Fri | GD details, SGD | A3 due | lesson 17 |
10/26 Mon | Mini-batching and convexity | A4 tex code (note: due Monday 11/2) | lesson 18 |
10/27 Tue | Getting started with nonlinear models: nearest-neighbors and trees | ||
10/28 Wed | Features and kernels | ||
10/30 Fri | Project discussion | ||
11/2 Mon | A4 due |
Resources
There are no required textbooks for the course, but several that I recommend are:
- Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
- James, Witten, Hastie and Tibshirani "An Introduction to Statistical Learning"
- Goodfellow, Bengio and Courville "Deep Learning"
- Boyd and Vandenberghe "Convex Optimization"
- List from Kevin Jamieson and Jamie Morgenstern, including great math resources!
- Hastie and Tibshirani video lectures
Syllabus
Course information
Time: M,T,W,F 2–2:50 pm (sync & async)
Place: Remote
Communication: This website, Piazza, occasional email blasts
Lessons: Zoom (link on Canvas)
Office hours: W 10-11 am, F 10-11 am, Zoom (link on Canvas)
Prerequisites:
- CSCI 241, Math 341
- Grad status for 571
- Familiarity with multivariable calculus, linear algebra, and probability highly recommended
Contact
Name: Kameron Decker Harris
Office: CF 475
Phone: 360-650-7366
Email: kameron.harris@wwu.edu
Learning in an unusual year
This academic quarter is not normal. We are all adjusting to the new online environment and many of us may be dealing with extra stress and uncertainty. My goal is to facilitate your learning as best I can.
The class is listed as synchronous (Zoom), however a minority of lessons will be provided asynchronously (video posted to Google drive). The class schedule (this website) will keep you up to date at least a week in advance about which lessons require synchronous attendance. I will also be offering multiple office hours, but if these do not work for you and you would like to meet, send me an email with 3 times that will work for you.
Course description and outcomes
Covers important machine learning research areas such as neural nets, kernel methods, graphical models, Bayesian learning, decision tree learning, evolutionary computation and computational learning theory. Models and algorithms from these research areas will be analyzed.
On completion of CSCI 471, students will demonstrate:
- A thorough understanding of classification, regression and clustering.
- A thorough understanding of supervised and unsupervised learning.
- A basic understanding of assorted, advanced machine learning algorithms.
- A basic understanding of feature selection and transformation.
- The ability to implement standard machine learning algorithms and run machine learning experiments.
On completion of CSCI 571, students will demonstrate:
- A thorough understanding of classification, regression and clustering.
- A thorough understanding of supervised and unsupervised learning.
- A basic understanding of assorted, advanced machine learning algorithms.
- A thorough understanding of one advanced machine learning algorithm.
- A basic understanding of feature selection and transformation.
- The ability to implement standard machine learning algorithms and run machine learning experiments.
Grading
All students will be graded on homework assignments, participation and quizzes, and a final project. There will be no exams. For 571 students, there is also a slideshow presentation of an advanced machine learning topic to the class. Sometimes there will be extra, more difficult problems in the homework which will only be graded for grad students.
- Homework: 70%, Final project: 20%, Participation & quizzes: 10%
- 571 students: multiply the above numbers by 90% and add 10% for your presentation
I may curve grades at the end of the course if it provides a spread of grades that more accurately represents the quality of work in the class. I will only do this if it improves your grade.
Participation and quizzes
Part of your grade will reflect whether or not you attend the synchronous sessions, since these will be interactive and the class will benefit from being able to ask questions and work in breakout groups. Short quizzes may be used to check your understanding of background and recently covered material.
Deadlines
Please refer to this website for the most up-to-date deadlines. You have 2 late days that you can use for up to two assignments (i.e., you may submit 1 assignment 2 days late or 2 assignments 1 day late each). If you miss a deadline or expect to miss one due to a medical or family emergency, please contact me as soon as possible to discuss arrangements.
Submitting your work
Program and project code will be developed and stored in git version control repositories, which can be accessed in Linux using the command line client. Specifically, we will use private github repositories under the "kamdh-teaching" organization. You will receive an invitation link to create a repository for each assignment, complete the assignment in a local copy of the repo, and submit by pushing your final changes to GitHub. You will receive feedback in the same repository in a branch called "grading."
Written homework must be submitted in a single PDF, with any math preferably typeset in LaTeX. You may write an assignment out by hand and scan it, but it must be legible and neat. Using your phone as a “scanner” is not ideal but okay so long as the result is as legible as a good photocopy, cropped, and submitted as a single PDF.
For group homework assignments, each student must write up and submit their own answers to the assignment. List everyone you worked with.
Homework guidelines
Teamwork is allowed on the assignments unless it is explicitly an individual assignment or problem. Working in groups is one of the best ways to learn from each other. An ideal group size is 3 people, since it's hard for everyone to contribute in large groups.
Each student must write up and submit their own answers to the assignment, with all group members listed. That means you may discuss the steps in an algorithm or mathematical argument, but write it out or type the code yourself. Doing this helps fix the ideas in your memory. Don't copy-paste from colleagues or the internet. Do not post your code to the internet.
Math problems are graded for correctness and quality of explanation ("X follows from Y, because Z"). If you make a math mistake early on that leads to an incorrect final result, you will still receive partial credit if the rest of the logic is sound.
Writing assignments are graded like a writing class, since communication is an important skill even for engineers and other technical people.
Coding language: We will be working exclusively in python 3, one of the most common languages of machine learning practitioners. For the assignments, you will be required to write your own algorithms using basic linear algebra routines in numpy and scipy but without any machine learning-specific libraries such as scikit-learn. Which packages are allowed in a given assignment will be specified. In any case, your programs must run on the department computers running Linux. If installing python on your home machine, I recommend the Anaconda distribution.
Programs will be graded on correctness, clarity, and efficiency (in that order).
Correctness: A correct program is one that always produces the correct output for a given input. Also, a correct program does not produce unintended side-effects. The most effective way to ensure that your program is correct is to test each component as you introduce it; once you are confident that a component works you can use it in other routines and test them. Try to break your program in testing, and if you succeed, locate the bug and fix it. Most of your grade will depend on the correctness of your code.
Clarity: The easier it is to read and maintain your code, the easier it is to locate and remove bugs. Your program should be organized, appropriately commented, and easy to maintain. To have a well-organized program, design your program thoughtfully and modularly. Think before you code: hacking away blindly leads to ugly, hard to read code. If you do hack away, producing a functional but ugly wall of code, check it into your version control repository and try to clean it up a bit. If your cleaning introduces bugs you can always revert back to the original version. If you have two separate code blocks doing essentially the same thing, try to create one subroutine and call it from both places (copying and pasting code is a big red flag). The less code you have to maintain the fewer chances to introduce bugs (and the less code to change when you find a bug).
Please follow these simple commenting guidelines proposed by Emeritus Professor Osborne:
- A comment at the top of the program states the program's purpose, the author(s), and date.
- Each subprocedure is accompanied by a comment that states its purpose and describes pre and post conditions, and mentions any exceptions raised.
- Within a procedure, precede code sections by a comment that says what the section does.
- Occasionally, in tricky pieces of code, a comment to the right explains what the code does.
- For each major variable, a comment explains it purpose.
- Don't over comment. Brief is best, but not so brief as to be cryptic.
Efficiency: Your programs should be efficient, but I also want you to understand the individual operations in our machine learning algorithms. In some cases, we will be implementing algorithms which are not "state-of-the-art" in order to learn.
Technical assistance
If you are having problems with any of the machines in a Computer Science Department lab, contact CS Support at cs.support@wwu.edu.
In-class Behavior, Norms, and University Policies
In this course, you should act professionally, respectfully, and maturely. This includes both our time together and when interacting with classmates outside of class. Some of our discussions may deal with sensitive topics (e.g., politics and race). Please consider how your words may sound to others. It is okay to disagree, but I expect you to treat each other with civility.
Please review the University policies outlined at http://syllabi.wwu.edu regarding:
- Academic Honesty
- Accommodations
- Ethical Conduct with WWU Network and Computing Resources
- Equal Opportunity
- Medical Excuse Policy
- Student Conduct Code