Working with HDFS
Introduction
Hadoop comes with a distributed filesystem, Hadoop Distributed File System, known as HDFS. You will learn basic HDFS commands and how to move data between your local filesystem and HDFS.
We will assume you have installed the HDFS docker container as in the previous step. You will learn to load a local file into HDFS and read it back.
Vagrant up
cd msbx5420vagrant
vagrant halt
vagrant up
Stop remove all containers - you will see errors if you have no containers running
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
Start the HDFS container
docker run -it --name hdfs sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
You should see the bash prompt
#bash-4.18
Run commands
Update Environmental Variable
export PATH=$PATH:$HADOOP_PREFIX/bin
Create a data directory
mkdir mydata
cd mydata
Create a csv file - Enter these one at a time!!!!
echo "Mary,Smith" >> people.csv
echo "William,Jones" >> people.csv
cat people.csv
HDFS Operations
Create an HDFS directory
Write a file to HDFS
Cat the HDFS file
Copy the file from HDFS to your local file system
Make a HDFS directory
hdfs dfs -mkdir myinput
Copy the people.csv file to HDFS
hdfs dfs -put people.csv myinput
Examine people.csv on HDFS
hdfs dfs -cat myinput/people.csv
Copy the HDFS file to your local file system
hdfs dfs -get myinput/people.csv local_people.csv
Read the local file
cat local_people.csv
List HDFS files
hdfs dfs -ls
hdfs dfs -ls myinput
Exit the docker machine (#bash-4.18)
exit
Stop all docker containers on the VM
docker stop hdfs
docker rm hdfs
Shut down the VM - from the native machine
vagrant halt
Last updated
Was this helpful?