Big Data Specialist Certificate

12
Mar
- Data Science School
- 7315 (Registered)
1.Learning Methodology
- Instructor-Led Classroom Training (ILT).
2.Prerequisites:
- Data Science Specialist Certificate
3.Training Program Description:
- the capability of collecting and storing huge amounts of versatile data necessitates the development and use of new techniques and methodologies for processing and analyzing big data. this course provides comprehensive coverage of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools
- provide an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing, and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, KNN, decision trees, SVM, K-means, principal component analysis, independent component analysis, and Neural Nets.
- develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course.
- link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with a deeper focus in applying them to enable customers to realize business value.
- Duration of Program: 5 weeks
4.Projects
- This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you’ve learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in data analysis and feature engineering, machine learning algorithms, and training and evaluating models.
- One of our main goals at EAII is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects
- Building a project is one of the best ways both to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects:
- Project 1: Simulating and Predicting Traffic
- Project 2: Crime Prediction
- Project 3: Fraud Detection
- Project 4:Explore Weather Trends
- Project 5:Big Data with Spark
- Capstone Project
5.Training Program Curriculum:
I-Introduction to Big Data, Developing with Spark and Hadoop
- Introduction to Hadoop and MapReduce SQL JOINS
- Hadoop Ecosystems
- Hadoop Clusters
- MapReduce API Concepts
- Basic Writing and testing MapReduce programs
- Hadoop API
- ToolRunner Class
- HDFS programmatically
- Using the Hadoop API s Library of Mappers, Reducers and Practitioners
- Managing Data Input and Output
- Common MapReduce Algorithms
- Sorting and Searching Large Data Sets
- Indexing Data
- Computing Term Frequency
- Inverse Document Frequency (TF4IDF)
- Calculating Word Co4Occurrence
- Joining Data Sets in MapReduce Jobs
- Hadoop Tools for Data Acquisition
- Practical Development Tips and Techniques
- Strategies for Debugging and Testing MapReduce Code
- Reusing Objects
- Creating Map4only MapReduce Jobs
- PIG
- Complex Data Analysis with Pig
- Multi Dataset Operations with Pig
- Extending Pig
- Pig Troubleshooting and Optimization
- Hive
- Relational Data Analysis with Hive
- Hive Data Management
- Text Processing with Hive
- Hive Optimization
- Extending Hive
- Analyzing Data with Impala
- Introduction to Spark
- Spark Basics
- Working with Resilient Distributed Datasets (RDDs)
II-Advanced Big Data Analytics Technologies and Applications
- Analyzing Data with Scala and Spark
- Predicting Forest Cover with Decision Trees
- Anomaly Detection in Network Traffic with K-means Clustering
- Understanding Wikipedia with Latent Semantic Analysis
- Analyzing Co-occurrence Networks with GraphX
- Geospatial and Temporal Data Analysis on Taxi Trip Data
- Estimating Financial Risk through Monte Carlo Simulation
III- Hands-on Group Project Based on Real-life Use Case