Unstructured Data
release 1.0
release 1.0
  • MSBX 5420 - Spring 2019
  • Course Information
    • Disability Services
    • Course Materials
    • Course Outline
    • Course Objectives
    • Grading
    • Untitled
    • Classroom Behavior
    • Academic Integrity
    • Discrimination and Harassment
    • Schedule
    • Policy Regarding Religious Observances
  • Homework
    • Week 3
  • Python
  • Learning Enviornment
    • Git
      • Install Git for MacOS
      • Install Git for Linux
      • Install Git for Windows
    • Docker
      • Install Docker for MacOS
      • Install Docker for Linux
      • Install Docker for Windows
    • Vagrant
      • Install Vagrant on Mac
    • VirtualBox
      • Install local VM
      • Install VirtualBox on Mac
    • Virtual Environment
    • Autograder
  • Scale, Scale, Scale ...
  • Hadoop Ecosystem
    • Operating Systems
    • HDFS
      • Install HDFS
      • Working with HDFS
    • MapReduce
    • Hive
    • Pig
  • Functional Programming
    • Lambda Expressions
    • Map Abstraction
    • Filter Abstraction
    • Reduce Abstraction
  • Apache Spark Ecosystem
    • Architecture
    • Installation and Getting Started
    • Starting Your Spark Notebook
    • Programming with RDDs
    • Spark SQL
    • Spark DataFrames
    • Spark Streaming
      • Spark Streaming with TCP
      • Spark Streaming with Windows
    • Resources
      • Mastering Apache Spark 2.3.2
  • Apache Kafka
    • APIs
    • Installation and Getting Started
    • Lab - Confluent on the VM
    • Resources
      • Oracle - Introduction to Steam Processing
      • IBM - An introduction to Kafka
      • Confluent - Making Sense of Stream Processing
      • Kafka: The Definintive Guide
      • Kafka, Samza, and the Unix philosophy of distributed data
  • ElasticSearch and Kabana
  • Reference
    • Reference Material
      • Pandas Cheat Sheet
      • Pandas
      • NumPy
      • Matplotlib
      • Scikit-Learn
      • Docker
      • Jupyter Notebooks
      • Apache Spark
        • Apache Spark
        • Apache Spark RDD Guide
        • Apache Spark SQL and DataFrames
        • Apache Spark Streaming
      • Apache Kafka
        • Apache Kafka
        • Apache Kafka Streams
      • Streaming
        • Streaming 101: The world beyond batch
        • Streaming 102: The world beyond batch
        • Apache Structured Streaming Programming Guide
        • Streaming SQL for Apache Kafka
      • Apache ElasticSearch for Hadoop
Powered by GitBook
On this page

Was this helpful?

  1. Reference
  2. Reference Material

Apache Spark

Apache SparkApache Spark RDD GuideApache Spark SQL and DataFramesApache Spark Streaming
PreviousReference MaterialNextApache Kafka

Last updated 6 years ago

Was this helpful?