Page 185 - Microsoft Word - B.Tech. Course Structure (R20) WITH 163 CREDITS

P. 185

JNTUA College of Engineering(Autonomous),Ananthapuramu
Department of Computer Science & Engineering
MINOR DEGREE (R20)
Introduction to Data Science
Course Code L T P C : 3 1 0 4
Course Objectives:
● The objective of the data scientist is to explore, sort and analyze mega data from various
sources in order to take advantage of them and reach conclusions to optimize business

processes or for decision support.

Course Outcomes:
After completion of the course, students will be able to
● Students will develop relevant programming abilities.
● Students will demonstrate proficiency with statistical analysis of data.
● Students will develop the ability to build and assess data-based models.
● Students will execute statistical analyses with professional statistical software.
● Students will demonstrate skill in data management.
● Students will apply data science concepts and methods to solve problems in real-world
contexts and will communicate these solutions effectively
UNIT – I
High dimension space: introduction, the law of large numbers, the geometry of high dimensions,
properties if the unit ball, generating points uniformly at random from a ball, Gaussians in high
dimension, random projection and Johnson lindenstrauss lemma, separating gaussians, fitting a
spherical Gaussian to data
Best fit subspaces and singular value decomposition: introduction, preliminaries, singular vectors,
SVD, best rank-k approximations, left singular vectors, and power method for singular value
decomposition, singular vectors and Eigen vectors, applications of SVD
UNIT – II
Random walks and Markov chains: stationary distribution, markov chain Monte carlo, areas and
volumes, convergence of random walks on undirected graphs, electrical networks and random walks
,random walks on undirected graphs with unit weight edge weights, random walks in Euclidean space,
the web as a markov chain.

UNIT - III
Machine learning: introduction, the perceptron algorithm, kernel functions and non linearly separable
data, generalizing to new data, Vc-dimension, Vc-dimension and, machine learning, other measures of
complexity, deep learning, Gradient descent, online learning, boosting
Algorithm for massive data problems: sampling, streaming, and sketching introduction, frequency
moments, matrix algorithms using sampling, sketches of documenting.
UNIT – IV
Machine learning: introduction, the perceptron algorithm, kernel functions and non linearly separable
data, generalizing to new data, Vc-dimension, Vc-dimension and, machine learning, other measures of
complexity, deep learning, Gradient descent, online learning, boosting
Algorithm for massive data problems: sampling, streaming, and sketching introduction, frequency
moments, matrix algorithms using sampling, sketches of documenting.
UNIT – IV
Clustering: introduction, k-means clustering-center clustering, finding low error clustering ,spectral
clustering, approximation stability, high density clustering, kernel methods, recursive clustering based

Mdv
Mdv

180 181 182 183 184 185 186 187 188 189 190