Who are you?
The best way to reach us is to use this course’s online discussion forum, available through the course’s Canvas site. You can also visit us during our respective office hours, which we will set during the first week of class and update below.
- Rich: TBD
- Sara: TBD
- Beena: TBD
What will I learn?
You will build, “from scratch,” the basic components of a data analysis pipeline: collection, preprocessing, storage, analysis, and visualization. You will see several examples of high-level data analysis questions, concepts and techniques for formalizing those questions into mathematical or computational tasks, and methods for translating those tasks into code. Beyond programming and best practices, you’ll learn elementary data processing algorithms, notions of program correctness and efficiency, and numerical methods for linear algebra and mathematical optimization.
The basic philosophy of this course is that you’ll learn the material best by actively doing. Therefore, you should make an effort to complete all assignments, including any ungraded (“optional”) parts, and go a bit beyond on your own (see “How much time and effort are expected of you?” below).
How will I do all that?
(Assignments and grading.)
Your grade will be based on a combination of “lab notebooks” (programming homework assignments) and three exams.
- Notebooks: 50%
- Midterm 1: 10%
- Midterm 2: 15%
- Final exam: 25%
There is approximately one assignment (lab notebook) or exam due every week. The assignments vary in difficulty but are weighted roughly equally. Some students find this pace very demanding; the reason we set it up this way is that we believe learning to program is like learning a foreign language, which demands constant and consistent practice.
What should I know already?
You should have at least an undergraduate-level understanding in the following topics:
- Programming proficiency in general Python or similar language
- Basic calculus
- Probability and statistics
- Linear algebra
What does “programming proficiency” mean? For context, this course aims to fill in gaps in your programming background that might keep you from succeeding in other programming-intensive courses of Georgia Tech’s MS Analytics program, most notably, CSE 6242. If you already have a significant programming background, consider placing out. If you have no programming background, you will need to ramp up very quickly. See below for more specific guidance on what we expect on the two hardest gaps to fill, namely, programming proficiency and linear algebra.
When are things due?
(Deadlines and late submission policies.)
All assignments are due at 11:59 UTC time. Here is a handy online tool, Time Zone Converter, for you to convert UTC time to your local time: https://www.timeanddate.com/worldclock/converter.html
Please make sure you are aware of the due date and time for your local area. We will not grant extensions based on your misunderstanding of how to translate dates and times.
Late policy. For your lab notebooks, you get an automatic 72-hour extension on every assignment. (This extension does not apply to exams.) However, you will lose points every day the assignment is late, and we will not accept any assignment after the 72-hour period.
The penalty is a deduction of 15% of the value of the assignment each day. For instance, if the total points for the assignment is 25 points, then you will lose (0.15 * 25) = 3.75 points out of 25 for each day it is late, up to 3 days.
The reason we do not grant extensions beyond 72 hours is that we want to post sample solutions so your classmates can benefit from seeing them; we do not want to delay everyone else’s learning because a few people need significantly more time. Keep in mind that there are many assignments, so any given assignment is only worth a couple percent of your final grade.
Exam procedures. For the exams, you will receive a window of about five (5) days in which to attempt the exam, with a hard deadline to submit (absolutely no extensions). Once you start an exam, you have up to 24 hours to submit all your work or the hard deadline, whichever comes first. (That is, if you start the exam 12 hours before the hard deadline, you’ll only have 12 hours.)
How much time and effort do you expect of me?
At Georgia Tech, this course is a 3 credit-hour graduate-level (Masters degree) course. So what does that mean?
The “3 credit hours” part translates into an average amount of time of about 10-12 hours per week. However, the actual amount of time you will spend depends heavily on your background and preparation. Past students who are very good at programming and math reporting spending much less time per week (maybe as few as 4-5 hours), and students who are rusty or novices at programming or math have reported spending more (maybe 15 or more hours).
The “graduate-level” part means you are mature and independent enough to try to understand the material at more than a superficial level. That is, you don’t just go to lectures or watch some videos, go through the assignments, and stop there; rather, you spend some extra time looking at the code and examples in detail, trying to cook up your own examples, and coming up with self-tests to check your understanding. Also, you will need to figure out, quickly, where your gaps are and make time to get caught up.
As noted above, in past runs of this course we’ve found the two hardest parts for many students are catching up on (a) basic programming proficiency and (b) linear algebra, which are both prerequisites to this course. We’ll supply some refresher material but expect that you can catch up. Here is some additional advice on these two areas.
Programming proficiency. Regarding programming proficiency, we expect that you have taken at least one introductory programming course in any language, though Python will save you the most time. You should be familiar with basic programming ideas at least at the level of the Python Bootcamp that most on-campus MS Analytics students take just before they start. We also strongly recommend having gone through a course like CS 1301x, which is Georgia Tech’s undergraduate introduction to Python class. Students who struggled with this course in the past have reported success when taking CS 1301x and re-taking this class later. Beyond that, code drill sites, like CodeSignal and codewars.com (the latter’s absurdly combative name notwithstanding) can help improve your speed at general computational problem solving. Please spend time looking at these or similar resources.
Part of developing and improving your programming proficiency is learning how to find answers. We can’t give you every detail you might need; but, thankfully, you have access to the entire internet! Getting good at formulating queries, searching for helpful code snippets, and adapting those snippets into your solutions will be a lifelong skill and is common practice in the “real world” of software development, so use this class to practice doing so. (During exams, you will be allowed to search for stuff on the internet!) It’s also a good skill to have because whatever we teach now might 5 years from now no longer be state-of-the-art, so knowing how to pick up new things quickly will be a competitive advantage for you. Of course, the time to search may make the assignments harder and more time-consuming, but you’ll find that you get better and faster at it as you go, which will save you the same learning curve when you’re on the job.
Math proficiency. Regarding math, and more specifically, your linear algebra background, we do provide some refresher material within this course. However, it is non-graded self-study material. Therefore, you should be prepared to fill in any gaps you find when you encounter unfamiliar ideas. We strongly recommend looking at the notes from the edX course, Linear Algebra: Foundations to Frontiers (LAFF). Its website includes a freely downloadable PDF with many nice examples and exercises.
What about collabs – can I work with others?
You may collaborate on the lab notebooks at the “whiteboard” level. That is, you can discuss ideas and have technical conversations with other students in the class, which we especially encourage on the online forums. However, each student must write-up and submit his or her own notebooks.
But what does “whiteboard level” mean? It’s hard to define precisely, but here is what we have in mind.
The spirit of this policy is that we do not want is someone posting their solution attempt (possibly with bugs) and then asking their peers, “Hey can someone help me figure out why this doesn’t work?” That’s essentially asking others to debug your work for you. That’s a no-no.
Okay, but what can I do instead? In such situations, try to reduce the problem to the simplest possible example that also fails. Posting code, in that case, would be OK. (And the process of distilling an example often reveals the bug!)
In other words, it’s fine and encouraged to post and discuss code examples as a way of learning. But you want to avoid doing so in a way that might reveal the solution to an assignment that you are being asked to produce.
You must do all exams completely on your own, without any assistance from others.
Honor code. All course participants—you and we—are expected and required to abide by the letter and the spirit of the Georgia Tech Academic Honor Code. In particular, always keep the following in mind:
- Ethical behavior is extremely important in all facets of life. Honest and ethical behavior is expected at all times.
- You are responsible for completing your own work.
- Any learner found in violation of the Honor Code will be subject to any or all of the actions listed therein.
Will I need school supplies?
(Books, materials, equipment.)
The main pieces of equipment you will need are a pen or pencil, paper, an internet-enabled device, and your brain!
We highly recommend the following textbook for this course.
- William McKinney. Python for Data Analysis: Data wrangling with Pandas, NumPy, and IPython, 2nd edition. O’Reilly Media, September 2017. ISBN-13: 978-1449319793. Buy on Amazon
I have more questions. Where do I go for help?
(Course discussion forum and office hours.)
The main way for us to communicate is the online discussion forum, hosted on Piazza. (The professor’s email situation is bad, so do not expect timely responses on email queries.) We will make all course announcements and host all course discussion there. Therefore, it is imperative that you access and refer to this forum when you have questions, issues, or want to know what is going on as we progress. You can post your questions or issues anonymously, if you wish. You can also opt-in to receive email notification on new posts or follow-up discussions to your posts.
Here are some tips to improve the response time for your questions. First, make your post public (rather than private to the instructors), so that anyone in the class can see and respond to your post. Secondly, adhere to the “Collaboration Policy,” above. If you create a post that violates this policy, the instructors may ignore your post or even delete it. Thirdly, post during the week rather than the weekend; the instructors are also trying to maintain some semblance of work-life balance, so you can expect slower responses over the weekend. Lastly, be sure to tag your post with the relevant notebook assignment so we can better triage issues. (In Piazza, a “tag” is also called a “folder,” though unlike desktop folders, you can place a post in more than one folder.)
What if my question is private in nature? In that case, you can make your post private to the instructors. (After pressing “new post” to create the post, look for the “Post to” field and select “Individual student(s)/instructor(s)” and then type “Instructors” to make the post visible only to all instructors––it’s important to include all instructors so that all of them will see and have a chance to address your post, which will be faster than addressing only one person.)
Office hours (GT students only). Watch Piazza for an announcement and logistical details.
Accommodations for individuals with disabilities (GT students only). If you have learning needs that require special accommodation, please contact the Office of Disability Services at (404) 894-2563 or http://disabilityservices.gatech.edu/, as soon as possible, to make an appointment to discuss your special needs and to obtain an accommodations letter.