Course Objectives

By the end of this course students should be able to:

  • Use standard software development tools such as the Linux command line (bash), git, and docker.

  • Store and manipulate files in HDFS.

  • Write pyspark scripts from within a python notebook (jupyter), and perform analysis to extract insights.

  • Create both "external" and internal hive tables, and understand the difference. Use Hive and/or Presto to extract insights.

  • Consume streaming messages from Kafka, and join/enrich streaming data using ksql

  • Stream data into NoSQL datastores such as Elasticsearch or Cassandra, and visualize using Kibana.

Last updated