Working with HDFS
Introduction
Hadoop comes with a distributed filesystem, Hadoop Distributed File System, known as HDFS. You will learn basic HDFS commands and how to move data between your local filesystem and HDFS.
We will assume you have installed the HDFS docker container as in the previous step. You will learn to load a local file into HDFS and read it back.
Vagrant up
cd msbx5420vagrant
vagrant halt
vagrant upStop remove all containers - you will see errors if you have no containers running
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)Start the HDFS container
docker run -it --name hdfs sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bashYou should see the bash prompt
#bash-4.18Run commands
Update Environmental Variable
Create a data directory
Create a csv file - Enter these one at a time!!!!
HDFS Operations
Create an HDFS directory
Write a file to HDFS
Cat the HDFS file
Copy the file from HDFS to your local file system
Make a HDFS directory
Copy the people.csv file to HDFS
Examine people.csv on HDFS
Copy the HDFS file to your local file system
Read the local file
List HDFS files
Exit the docker machine (#bash-4.18)
Stop all docker containers on the VM
Shut down the VM - from the native machine
Last updated
Was this helpful?