Note: Not all courses are offered every semester, and new courses may be added at any time. Check the schedule of classes, for the latest offerings.
The goal of this class is to give students an introduction to and hands on experience with all phases of the data science process using real data and modern tools. Topics that will be covered include data formats, loading, and cleaning; data storage in relational and non-relational stores; data governance, data analysis using supervised and unsupervised learning using R and similar tools, and sound evaluation methods; data visualization; and scaling up with cluster computing, MapReduce, Hadoop, and Spark. Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission.
This course provides a broad introduction to the practical side of machine-learning and data analysis. This course examines the end-to-end processing pipeline for extracting and identifying useful features that best represent data, a few of the most important machine algorithms, and evaluating their performance for modeling data. Topics covered include decision trees, logistic regression, linear discriminant analysis, linear and non-linear regression, basic functions, support vector machines, neural networks, Bayesian networks, bias/variance theory, ensemble methods, clustering, evaluation methodologies, and experiment design. Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission. Corequisite: DATA 601: Introduction to Data Science
The goal of this course is to introduce methods, technologies, and computing platforms for performing data analysis at scale. Topics include the theory and techniques for data acquisition, cleansing, aggregation, management of large heterogeneous data collections, processing, information and knowledge extraction. Students are introduced to map-reduce, streaming, and external memory algorithms and their implementations using Hadoop and its eco-system (HBase, Hive, Pig and Spark). Students will gain practical experience in analyzing large existing databases. Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission. Corequisite: DATA 601: Introduction to Data Science
This course introduces students to the data management, storage and manipulation tools common in data science. Students will get an overview of relational database management systems and various NoSQL database technologies, and apply them to real scenarios. Topics include: ER and relational data models, storage and concurrency preliminaries, relational databases and SQL queries, NoSQL databases, and Data Governance. Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission. Corequisite: DATA 601: Introduction to Data Science
This course provides a comprehensive overview of important legal and ethical issues pertaining to the full life cycle of data science. The student learns how to think through the ethics of making decisions and inferences based on data and how important cases and laws have shaped the data science field. Students will use real and hypothetical case studies across various domains to explore these issues. Prerequisite: Enrollment in the Data Science program. Other students may be admitted with instructor permission. Corequisite: DATA 601: Introduction to Data Science
This is a semi-independent course that provides the advanced graduate student in the Data Science program the opportunity to apply the knowledge, skills and tools they’ve learned to a real-world data science project. Students will work with a real data set and go through the entire process of solving a real-world data science project. The project will be conducted with industry, government and academic partners, who will be responsible for providing the data set, with guidance and feedback from the instructor. Prerequisite: Completion of the required courses.
Students learn effective management and communication skills through case study-analysis, reading, class discussion and role-playing. The course covers topics such as effective listening, setting expectations, delegation, coaching, performance, evaluations, conflict management, negotiation with senior management and managing with integrity.
This course addresses the concepts, tools, and techniques of GIS modeling, and presents modeling concepts and theory as well as provides opportunities for hands-on model design, construction, and application. The focus is given to model calibration and validation.
This course investigates statistical techniques for exploring and characterizing spatial phenomena. The course covers local/global cluster analysis, spatial autocorrelation, interpolation, kriging, as well as exposure to prominent GIS statistical packages. An emphasis is placed on exploratory spatial data analysis (ESDA)to develop spatial cognition and analytical skills with practical applications to modeling spatial phenomena in computer environments.
The integration of systems and the seamless exchange of information stored in them provides an answer to a very common problem when organizations merge and inherit information systems that are not compatible with each other. Data systems and information should easily interoperate for the success of the organization. This course investigates the various technologies in the field of information integration with an emphasis on semantic interoperation of systems. Topics that are covered include: Modeling Data Semantics, Semantic Interoperability, Metadata, Semantic Integration Patterns, Context-Awareness, Semantic Networks, Mediation and Wrapper techniques, Data Warehouses, Integration Servers, etc. Students will keep abreast of the latest technologies and research on data semantics, information integration, and also gain practical experience integrating information from disparate and heterogeneous systems.
The purpose of this course is to provide a comprehensive discussion on using organizational databases to enable decision support through warehousing and mining of data. This course will provide an in depth understanding of the technical, business, and research issues in each of these two areas. Issues in data warehousing include designing multi-dimensional data model, cleansing and loading of data, determining refresh cycles and methods, administrative aspects of running a data warehouse including efficient data retrieval using bitmap and join indexes, reporting, ad hoc querying, and multi-dimensional operations such as slicing, dicing, pivoting, drill-down, and roll-up operations. Areas with data mining will include justifying the need for knowledge recovery in databases, data mining methods such as clustering, classification, Bayesian networks, association rules, and visualization. New areas of research and development in data mining warehousing will also be discussed.