Working with HDFS

Introduction

Hadoop comes with a distributed filesystem, Hadoop Distributed File System, known as HDFS. You will learn basic HDFS commands and how to move data between your local filesystem and HDFS.

We will assume you have installed the HDFS docker container as in the previous step. You will learn to load a local file into HDFS and read it back.

Vagrant up

cd msbx5420vagrant
vagrant halt
vagrant up

Stop remove all containers - you will see errors if you have no containers running

docker kill $(docker ps -q)
docker rm $(docker ps -a -q)

Start the HDFS container

docker run -it --name hdfs sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash

You should see the bash prompt

#bash-4.18

Run commands

Update Environmental Variable

Create a data directory

Create a csv file - Enter these one at a time!!!!

HDFS Operations

  1. Create an HDFS directory

  2. Write a file to HDFS

  3. Cat the HDFS file

  4. Copy the file from HDFS to your local file system

Make a HDFS directory

Copy the people.csv file to HDFS

Examine people.csv on HDFS

Copy the HDFS file to your local file system

Read the local file

List HDFS files

Exit the docker machine (#bash-4.18)

Stop all docker containers on the VM

Shut down the VM - from the native machine

Last updated

Was this helpful?