Working with HDFS
Introduction
Hadoop comes with a distributed filesystem, Hadoop Distributed File System, known as HDFS. You will learn basic HDFS commands and how to move data between your local filesystem and HDFS.
We will assume you have installed the HDFS docker container as in the previous step. You will learn to load a local file into HDFS and read it back.
Vagrant up
cd msbx5420vagrant
vagrant halt
vagrant upStop remove all containers - you will see errors if you have no containers running
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)Start the HDFS container
docker run -it --name hdfs sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bashYou should see the bash prompt
#bash-4.18Run commands
Update Environmental Variable
export PATH=$PATH:$HADOOP_PREFIX/bin Create a data directory
mkdir mydata
cd mydataCreate a csv file - Enter these one at a time!!!!
echo "Mary,Smith" >> people.csv
echo "William,Jones" >> people.csv
cat people.csvHDFS Operations
Create an HDFS directory
Write a file to HDFS
Cat the HDFS file
Copy the file from HDFS to your local file system
Make a HDFS directory
hdfs dfs -mkdir myinputCopy the people.csv file to HDFS
hdfs dfs -put people.csv myinputExamine people.csv on HDFS
hdfs dfs -cat myinput/people.csvCopy the HDFS file to your local file system
hdfs dfs -get myinput/people.csv local_people.csvRead the local file
cat local_people.csvList HDFS files
hdfs dfs -ls
hdfs dfs -ls myinputExit the docker machine (#bash-4.18)
exitStop all docker containers on the VM
docker stop hdfs
docker rm hdfsShut down the VM - from the native machine
vagrant haltLast updated
Was this helpful?