Data Science and Big Data – MCS -226. The course is one of the courses of the Master of Computer Applications programme of IGNOU.
This course introduces the students to the concepts of data science and big data, its architecture and a programming technique R that can be used to analyse big data.
1. Basics of Data Science
Introduction to Data Science: Probability: Conditional Probability and Bayes Theorem, Random Variables and Basic Distributions: Binomial Distribution, Probability Distribution of Continuous Random Variable & The Normal Distribution, Sampling Distribution and the Central Limit Theorem, Statistical Hypothesis Testing: Estimation of Parameters of the Population, Significance Testing of Statistical Hypothesis, Example Using Correlation and Regression & Types of Errors in Hypothesis Testing.
Data Preparation for Analysis: Need for Data Preparation, Data Preprocessing: Data Cleaning, Data Integration, Data Reduction & Data Transformation, Selection and Data Extraction, Data Curation: Steps of Data Curation, Importance of Data Curation, Data Integration: Data Integration Techniques & Data Integration Approaches, Knowledge Discovery.
Analysis of Simple Algorithm: Complexity analysis of Algorithms, Euclid Algorithm for GCD, Polynomial Evaluation Algorithm, Exponent Evaluation, Sorting Algorithm
Data Visualization and Interpretation: Different types of plots, Histograms, Box plots, Scatter plots, Heatmap, Bubble Chart, Bar Chart, Distribution plot, Pair plot, Line graph, Pie Chart, Doughnut Chart, Area Chart.
2. Big Data and its Management
Big Architecture: Big Data and Characteristics and Big Data Applications, Big data Applications, Structured vs Semi-structured and Unstructured Data, Big Data vs Data Warehouse, Distributed File System, HDFS and Map Reduce, Apache Hadoop 1 and 2 (YARN).
Programming Using MapReduce: Map Reduce Operations, Loading data into HDFS: Installing Hadoop and Loading Data, Executing the MapReduce phases: Execution of Map Phase, Shuffling and sorting, Reduce phase execution & Node Failure and MapReduce, Algorithms using MapReduce: Word counting and Matrix-Vector Multiplication.
Other Big data Architectures and Tools: Apache SPARK framework, HIVE: Working of HIVE Queries, Installation of HIVE and Writing Queries in HIVE, HBase: HBase Installation & Working with HBase, Other tools .
3. Big Data Analysis
Mining Big Data: Finding Similar Items: Jaccard Similarity of Sets, Documents Similarity & Collaborative Filtering and Set Similarity, Finding Similar Documents: Shingles, Minhashing & Locality Sensitive Hashing, Distance Measures: Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance & Hamming Distance, Introduction to Other Techniques.
Mining Data Streams: Data Streams: Model for Data Stream Processing, Data Stream Management: Queries of Data stream,
Example of Data Stream & Issues and Challenges of Data Stream, Data Sampling in Data Streams: The Representation Sample, Filtering of Data Streams: Bloom Filter, Algorithm to Count Different Elements in Stream.
Link Analysis: Link analysis, Page Rankin, Different Mechanisms of Finding PageRank: Finding PageRank & Web Structure and Associated Issues, Use of PageRank in Search Engines: Page Rank computation using Map-reduce, Topic sensitive PageRank, Link Spam, Hubs and Authorities.
Web and Social Network Analysis: Introduction to Web Analytics, Advertising on the Web: Issues & Algorithm, Recommendation Systems: The Long Tail, The Model and Content-Based Recommendations, Mining Social Networks: Social Networks as Graphs, Varieties of Social Networks, Distance Measure of Social Network Graphs & Clustering of Social Network Graphs.
4. Programming for Data Analysis
Basics of R Programming: Environment of R, Data types, Variables, Operators, Factors, Decision Making, Loops, Functions, Data Structures in R: Strings and Vectors, Lists & Matrices, Arrays and Frames.
Data Interfacing and Visualisation in R: Reading Data From Files: CSV File, Excel File, Binary Files, XML Files, JSON Files, interfacing with Databases & Web Data, Data Cleaning and Pre-processing, Visualisations in R: Bar Charts, Box Plots, Histograms, Line Graphs & Scatterplots.
Data Analysis and R: Chi-Square Test, Linear Regression, Multiple Regression, Logistic Regression, Time Series Analysis.
Advance Analysis Using R: Decision Trees, Random Forest, Classification, Clustering, Association Rules.
Course Status : | Ongoing |
Course Type : | Core |
Language for course content : | English |
Duration : | 12 weeks |
Category : |
|
Credit Points : | 4 |
Level : | Postgraduate |
Start Date : | 01 Jan 2025 |
End Date : | 30 Apr 2025 |
Enrollment Ends : | 28 Feb 2025 |
Exam Date : | 25 May 2025 IST |
NCrF Level : | 7.0 |
Exam Shift : | Shift-I |
Note: This exam date is subject to change based on seat availability. You can check final exam date on your hall ticket.
Dr. Sandeep Singh Rawat is a distinguished researcher, educator, and consultant specializing in Knowledge Extraction, Machine Learning, Data Mining, Big Data Analytics, and Group Decision Support Systems. Currently serving as Professor and Director of the School of Computer & Information Sciences at IGNOU, New Delhi, he has a rich teaching background at both graduate and undergraduate levels in Computer Engineering and IT. His research career spans India and the USA, including a tenure as Visiting Professor at Iowa State University (2016–2018).
Dr. Sandeep holds a B.E. from NIT Surat, an M.Tech. from IIT Roorkee, and a Ph.D. from Osmania University, Hyderabad. At IGNOU, he plays a pivotal role as the Swayam Prabha Channel Coordinator and contributes to key committees focused on online education, innovation, entrepreneurship, IT infrastructure, and cyber crisis management.
A Senior Member of IEEE and a Life Member of ISTE and CSI, he is also a co-editor for Springer’s AISC and LNNS series, a reviewer for prestigious journals, and a technical committee member for various conferences. A globally published researcher, Dr. Sandeep balances academic excellence with a passion for music, playing the harmonica, singing, and traveling.
DOWNLOAD APP
FOLLOW US