Course Objectives

By the end of this course students should be able to:

Use standard software development tools such as the Linux command line (bash), git, and docker.
Store and manipulate files in HDFS.
Write pyspark scripts from within a python notebook (jupyter), and perform analysis to extract insights.
Create both "external" and internal hive tables, and understand the difference. Use Hive and/or Presto to extract insights.
Consume streaming messages from Kafka, and join/enrich streaming data using ksql
Stream data into NoSQL datastores such as Elasticsearch or Cassandra, and visualize using Kibana.

Last updated 5 years ago