BIA660 Web Analytics



Course Catalog Description

Introduction

Prerequisites:

Students must have programming experience. It is also highly recommended for the students to have taken Multivariate Data Analytics (BIA 652), Data & Knowledge Management (MIS 630), and Knowledge Discovery in Databases (MIS 637).


Campus Fall Spring Summer
On Campus
Web Campus

Instructors

Professor Email Office
Theodoros Lappas
Theodoros.Lappas@stevens.edu Babbio 639

More Information

Course Outcomes

In this course, students will learn through hands-on experience how to extract data from the web and analyze web-scale data using distributed computing. Students will learn different analysis methods that are widely used across the range of internet companies, from start-ups to online giants like Amazon or Google. At the end of the course, students will apply these methods to answer a real scientific question.

Additional learning objectives include the development of:

  • Data collection and preprocessing skills: students will learn how to identify and profile candidate sources of valuable data, as well as how to automatically collect and manage the information they need for their analytics tasks.
  • Diverse Analytic Skills: students will be exposed to a wide range of both quantitative and qualitative analytics techniques with applications across multiple business domains.
  • Team Skills: the students will be organized in teams and collaborate on projects for the duration of the course. Each student will evaluate his/her teammates twice during the semester via a customized team survey tool. The tool provides a detailed analysis of a person’s contributions to the different stages of the team’s operation and will be used to promptly identify and address possible problems.

Grading

Grading Policies

Grading Percentages:

Class work: 30%; Mid-term Project: 30%; Final Project: 40%. The total number of credits will be multiplied by a value between 0 and 1 to produce the final grade. The multiplier will be based on a peer evaluation process which will be conducted twice during the semester: once after the midterm project and once after the final project.

Midterm project:

Collect, clean and organize online data from 2 different websites of your choice. The deliverable includes 2 datasets, the collection & cleaning scripts, and a presentation to be given in class.

Final project:

Choose an important research question that emerges in the context of one of the two datasets collected for the midterm project. Develop, apply and record an analytics methodology to address your question. This work will be presented in class.

Grading Scale:
Grade Score
A 93 - 100
A- 90 - 92
B+ 87 - 89
B 83 - 86
B- 80 - 82
C+ 77 – 79
C 73 – 76
C- 70 – 72
F < 70

Lecture Outline

Recommended Readings: Links to free material will be provided in class.

Topic Reading
Week 1 Introduction to the course
Introduction to Python I (basic concepts)
Week 2 Introduction to Python II (parsing & using libraries)
Week 3 Using Python to scrape the web I (regex & other libraries)
Week 4 Using Python to scrape the web II (data cleaning)
Week 5 Text Mining with Python (nltk)
Week 6 Midterm Project Presentations
Week 7 Sentiment Analysis with Python
Week 8 Social Network & Graph Mining with Python (networkx)
Week 9 Machine Learning & Analytics with Python I (sklearn)
Week 10 Machine Learning & Analytics with Python II (sklearn)
Week 11 Machine Learning & Analytics with Python III (sklearn)
Week 12 Visualization (matplotlib & other tools)
Week 13 Work on Final projects.
Week 14 Final Project Presentations