The emergence of massive datasets provides the primary drive for the field. Data can be video surveillance, social media, speech, text, telecommunications, astronomy, etc. This course emphasizes the practical techniques for working with large-scale data.
The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms such as deep neural computing, and kernel methods.
This course is an introduction to computer architecture and distributed systems with emphasis on warehouse scale computing systems. Topics include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism.
This course covers fundamentals of data visualization, such as layered grammar of graphics, perception of discrete and continuous variables, introduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification..
This lab covers main programming paradigms for machine learning as well as platforms, including programming in Python, R, using tools such as Anaconda.
This lab covers more advanced projects in data engineering in addition to data management and cloud engineering.
In addition to the core courses, participants can choose 2 elective courses (3 credit hours each) from the list below: