Data Analytics - DAT

DAT 500 Interactive Graphical Case Studies in Big Data 1 Credit

Students will be introduced to Data Analytics via the study of a variety of case studies of published studies, or successful commercial applications of methods. Students will also learn to replicate the graphical presentations used in these studies, and develop alternative visual representations of the data used in the studies. The R statistical language will be used, as students learn how to produce publication grade graphics that can be used throughout other courses and in their career.

Offered: every summer.

DAT 511 Data Stewardship: Preparation, Exploration and Handling of Big Data 3 Credits

This course introduces students to foundational and practical skills in data stewardship, with an emphasis on reproducible research and programming in R. Students will explore the data analysis process from data acquisition and cleaning to transformation, visualization, and documentation. Topics include tidy data principles, exploratory data analysis, clustering techniques, and the ethical handling of data. Students will gain hands-on experience using R and RStudio to manage large and complex datasets, utilize packages like dplyr and data.table, and produce publication-ready reports with R Markdown. The course also incorporates version control through Git and GitHub to promote collaborative and transparent workflows.

Prerequisite: CSC 511 or CSC 111.

Offered: every fall & spring.

DAT 512 Statistical Approaches to Big Data 3 Credits

This course is a Core course in the Data Analytics program. It starts with a brief review of univariate statistics and then covers selected topics usually taught in courses in multivariate statistical analysis and regression analysis. It is assumed that every student in this course has completed at least one college-level statistics course. The theoretical knowledge and analytical skills gained in this course are an essential component of the Data Analytics program.

Prerequisite: BAN 609 or equivalent, CSC 512 or equivalent, & MAT 500 or equivalent.

Offered: every spring.

DAT 514 Data Mining and Machine Learning 3 Credits

This core course in the Data Analytics program provides a comprehensive introduction to data mining and machine learning techniques, blending theoretical foundations with practical applications in Python. Students will explore a wide range of tools and methods essential to modern data analysis, including supervised and unsupervised learning, neural networks, logistic regression, clustering algorithms, anomaly detection, ensemble methods (random forests, XGBoost), recommender systems, and reinforcement learning. The course emphasizes key concepts such as feature engineering, model evaluation, overfitting, regularization, and iterative model refinement. Through interactive labs, assignments, and quizzes, students will build the skills necessary to develop, implement, and evaluate machine learning solutions for real-world data problems.

Prerequisites: MAT 500, CSC 511, and CSC 512 or equivalents.

Offered: every spring.

DAT 515 Visualization and Presentation of Advanced Analytics 3 Credits

Students will develop the ability to present complex results from Data Analytics to a range of audiences. The course will cover both real time interactive displays and tools, such as graphic user interface and dashboard design, as well as written, oral and graphical communication of analytic results. Students will complete a range of projects in each of these areas.

Prerequisites: DAT 511 & DAT 521 (courses may be taken concurrently) and the ability to program in Python.

Offered: every spring.

DAT 517 Machine Learning for Natural Language Processing 3 Credits

This course is on constructing, training and using Machine Learning tools (neural networks) for Natural Language Processing, covering the fundamentals of operation of ChatGPT and other tools for generative language applications, translation, theme detection, text summarize, question answering and a range of other applications. This is a programming driven course, in which students will construct and evaluate a number of machine learning applications. Students will construct NLP processing models (neural networks) using the Pytorch and/or TensorFlow frameworks within the python programming language using the Jupyter notebook system. The course will also cover text encoding, tokenization, embedded and other reduced space representations, string and sentence transformations and related topics. Basic predictive models will be covered in the introduction to PyTorch and TensorFlow. Data storage in the Apache Arrow and HuggingFace datasets systems will also be discussed. Students may need to subscribe to the Google Colab Pro platform at a modest cost if they do not have regular access to a computer with an Nvidia GPU. Cost of the subscription is comparable to that of a typical electronic textbook.

Prerequisite: CSC 512 and CSC 512L.

Offered: once a year.

DAT 521 Applied Integrative Projects in Data Analytics I 3 Credits

In this course, students would learn SAS. Since the focus is on hands-on, all lectures would be conducted in a computer lab. Students learn how to input various types of data into SAS, such as text, csv, binary and sas7bdat. How to clean data is an important skill students are expected to master. Students learn how to deal with missing variables and run basic sample statistics such as mean, standard deviation, minimum and maximum. Many visualization techniques would be taught. In addition, students learn how to run some basic statistical functions, such as linear regression. Since this course is a preparation for the next course (DAT 522) titled "Applied Integrative Projects in Data Analytics II", students could start to think about their next big projects.

Offered: every fall.

DAT 522 Applied Integrative Projects in Data Analytics II 3 Credits

This course is supervised internship or project course. Students may chose to apply for a competitive internship position in Data Analytics with a local corporation, government or not-for-profit agency, or may apply to carry out a data analytics project with an employer or on-campus research sponsor.

Prerequisites: DAT 500, DAT 514, DAT 521.

Offered: every fall, spring, & summer.

DAT 555 Seminar on Deep Learning 1 Credit

Deep Learning is a computational and mathematical approach to building "deep" or many layer neural network architectures for solving complex machine learning tasks, such as image processing, audio processing, complex time series, natural language processing and other big data problems. This course would teach students to build and training deep learning models using current state of the art tools.

Prerequisite: CSC 112 or CSC 512 and MAT 500 or MAT 211 or MAT 219.

Offered: occasionally.