Course Objectives
By the end of this course students should be able to:
Use standard software development tools such as the Linux command line (
bash
),git
, anddocker
.Store and manipulate files in HDFS.
Write
pyspark
scripts from within a python notebook (jupyter
), and perform analysis to extract insights.Create both "external" and internal
hive
tables, and understand the difference. Use Hive and/or Presto to extract insights.Consume streaming messages from Kafka, and join/enrich streaming data using
ksql
Stream data into NoSQL datastores such as Elasticsearch or Cassandra, and visualize using Kibana.
Last updated