Big Data Infrastructure Specialist Certificate

12
Mar
- Data Science School
- 3576 (Registered)
1.Learning Methodology
- Instructor-Led Classroom Training (ILT).
2.Prerequisites:
- Data Science Specialist Certificate
3.Training Program Description:
- the capability of collecting and storing huge amounts of versatile data necessitates the development and use of new techniques and methodologies for processing and analyzing big data. this course provides comprehensive coverage of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools
- provide an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing, and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, KNN, decision trees, SVM, K-means, principal component analysis, independent component analysis, and Neural Nets.
- develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course.
- link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with a deeper focus in applying them to enable customers to realize business value.
- Duration of Program: 5 weeks
4.Projects
- This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you’ve learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in data analysis and feature engineering, machine learning algorithms, and training and evaluating models.
- One of our main goals at EAII is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects
- Building a project is one of the best ways both to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers.
5.Training Program Curriculum:
I-Bigdata Infrastructure
- Introduction to Bigdata
- Hadoop Overview and History
- Overview of the Hadoop Ecosystem
- Using Hadoop’s Core: HDFS and MapReduce
- MapReduce: What it is, and how it works
- How MapReduce distributes processing
- MapReduce example: Break down movie ratings by rating score
- Installing Python, MRJob, and nano
- Programming Hadoop with Pig
- Apache Ambari
- Apache Spark
- Programming Hadoop with Spark
- Why Spark?
- The Resilient Distributed Dataset (RDD)
- Datasets and Spark 2.0
- SparkML
- Using relational data stores with Hadoop
- What is Hive?
- How Hive works.
- Integrating MySQL with Hadoop.
- Use Sqoop to import data from MySQL to HFDS/Hive.
- Use Sqoop to export data from Hadoop to MySQL.
- Using non-relational data stores with Hadoop
- Why NoSQL?
- What is HBase
- Use HBase with Pig to import data at scale.
- Cassandra overview
- Write Spark output into Cassandra
- MongoDB Overview
- Install MongoDB, and integrate Spark with MongoDB
- Using the MongoDB shell
- Choosing a database technology
- Choose a database for a given problem
- Querying your Data Interactively
- Overview of Drill
- Querying across multiple databases with Drill
- Overview of Phoenix
- Install Phoenix and query HBase with it
- Integrate Phoenix with Pig
- Overview of Presto
- Install Presto, and query Hive with it.
- Query both Cassandra and Hive using Presto
- Managing your Cluster
- YARN explained
- Tez explained
- Use Hive on Tez and measure the performance benefit
- Mesos explained
- ZooKeeper explained
- Simulating a failing master with ZooKeeper
- Oozie explained
- Set up a simple Oozie workflow
- Zeppelin overview
- Hue overview
- Feeding Data to your Cluster Kafka explained
- Setting up Kafka and publishing some data.
- Publishing weblogs with Kafka
- Flume explained
- Set up Flume and publish logs with it.
- Set up Flume to monitor a directory and store its data in HDFS
- Analyzing Streams of Data
- Spark Streaming
- Analyze weblogs published with Flume using Spark Streaming
- Monitor Flume-published logs for errors in real-time
- Aggregating HTTP access codes with Spark Streaming
- Apache Storm
- Count words with Storm
- Apache Flink
- Counting words with Flink
- Designing Real-World Systems
- Sample application: consume web server logs and keep track of top sellers
- Sample application: serving movie recommendations to a website
- Design a system to report web sessions per day
- Exercise solution: Design a system to count daily sessions
Course Content
Time: 5 weeks