Hadoop FileSystem Commands
In this article we will learn the basic and mostly used Hadoop File System Commands. The File System (FS) shell commands interact with the Hadoop Distributed File System (HDFS). Most of the commands in FS shell behave like corresponding Unix commands.
Hadoop FS Commands
All FS shell commands take path URIs as arguments. For HDFS schema, an HDFS file or directory is specified like hdfs://NameNode/log_analysis/php_error_log & for local filesystem schema a file or directory is specified like file:///root/logfiles
Lets check the FS Commands along with examples:
help
Displays help for any specific or all FS commands if none is specified.
root@EdgeNode:~# hadoop fs -help
root@EdgeNode:~# hadoop fs -help mkdir
usage
This is helpful to quickly check the available usage options & arguments for any commands.
root@EdgeNode:~# hadoop fs -usage
root@EdgeNode:~# hadoop fs -usage mkdir
mkdir
Create a directory in specified location. The -p option can be used so that the command does not fail if the directory already exists.
root@EdgeNode:~# hadoop fs -mkdir /myanalysis
root@EdgeNode:~# hadoop fs -mkdir -p /myanalysis/log/web/
root@EdgeNode:~# hadoop fs -mkdir -p /myanalysis/log/web/access /myanalysis/log/web/error
touchz
Create a file of zero length at the specified path. An error is returned if the file exists with non-zero length.
root@EdgeNode:~# hadoop fs -touchz /myanalysis/log/web/access/access.log
put
Copy single or multiple files from local file system to the destination file system. Copying fails if the file already exists, unless the -f flag is given, which Overwrites the destination if it already exists. Also reads input from stdin and writes to destination file system.
root@EdgeNode:~# echo "Dummy,Access,Log,File,In,Local,Filesystem" > /root/dummy.csv
root@EdgeNode:~# hadoop fs -put -f /root/dummy.csv /myanalysis/log/web/access/access.log
copyFromLocal
Similar to put command, except that the source is restricted to a local file reference. Similiarly the -f option will overwrite the destination if it already exists.
root@EdgeNode:~# echo "Dummy,Error,Log,File,In,Local,Filesystem" > /root/gummy.csv
root@EdgeNode:~# hadoop fs -copyFromLocal -f /root/gummy.csv /myanalysis/log/web/error/error.log
moveFromLocal
Similar to put command, except that the local source is deleted after it’s copied.
root@EdgeNode:~# hadoop fs -moveFromLocal /root/gummy.csv /myanalysis/log/web/error/error2.log
ls
List the contents that match the specified file pattern. If path is not specified, the contents of /user/<currentUser> will be listed. Two more additional arguments often used are:
-h which basically, formats the sizes of files in a human-readable fashion rather than a number of bytes.
-R which list the contents of directories recursively.
root@EdgeNode:~# hadoop fs -ls
root@EdgeNode:~# hadoop fs -ls /
root@EdgeNode:~# hadoop fs -ls -R /myanalysis
root@EdgeNode:~# hadoop fs -ls -h -R /myanalysis/log/web/
find
Finds all files that match the specified expression and applies selected actions to them. If no <path> is specified then defaults to the current working directory. Also if no expression is specified then defaults to -print.
root@EdgeNode:~# hadoop fs -find / -name web -print
root@EdgeNode:~# hadoop fs -find / -name access.log -print
cat
Copies source paths to stdout. Fetch all files that match the file pattern in specified source and display their content on stdout.
root@EdgeNode:~# hadoop fs -cat /myanalysis/log/web/access/access.log /myanalysis/log/web/error/error.log
tail
Displays last 1 kilobyte of the file to stdout. The -f option will output appended data as the file grows, similarly like Unix.
root@EdgeNode:~# hadoop fs -tail /myanalysis/log/web/error/error.log
root@EdgeNode:~# hadoop fs -tail -f /myanalysis/log/web/access/access.log
chown
Changes the owner and groups of the specified files. -R option modifies the files recursively.
root@EdgeNode:~# hadoop fs -chown -R root:supergroup /myanalysis
chgrp
Change group association of files. The -R option will make the change recursively through the directory structure. The user must be the owner of files, or else a super-user.
root@EdgeNode:~# hadoop fs -chgrp -R supergroup /myanalysis
chmod
Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-user.
root@EdgeNode:~# hadoop fs -chmod -R 1777 /myanalysis
cp
Copy files that match the file pattern from source to destination. This command allows multiple sources as well in which case the destination must be a directory. The -f option will overwrite the destination if it already exists.
root@EdgeNode:~# hadoop fs -cp /myanalysis/log/web/error/error.log /myanalysis/log/web/error/error3.log
mv
Moves files that match the file pattern from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory.
root@EdgeNode:~# hadoop fs -mv /myanalysis/log/web/error/error3.log /myanalysis/log/web/error/error.log.bkp
get
Copy files that matches the file pattern from HDFS to the local file system. When copying multiple files, the destination must be a directory.
root@EdgeNode:~# hadoop fs -get /myanalysis/log/web/error/*.log /root
copyToLocal
Similar to get command, except that the destination is restricted to a local file reference.
root@EdgeNode:~# hadoop fs -copyToLocal /myanalysis/log/web/access/*.log /root
getmerge
Takes a source directory and a destination file as input and concatenates files in source into the destination local file. Optionally -nl can be set to enable adding a newline character (LF) at the end of each file.
root@EdgeNode:~# hadoop fs -getmerge -nl /myanalysis/log/web/access/access.log /myanalysis/log/web/access/access.log /root/error.log
df
Displays free space. The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)
root@EdgeNode:~# hadoop fs -df -h /
rm
Delete files specified as args. The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist. The -R option deletes the directory and any content under it recursively. The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately.
root@EdgeNode:~# hadoop fs -rm /myanalysis/log/web/access/access.log
root@EdgeNode:~# hadoop fs -rm -skipTrash /myanalysis/log/web/error/*
rmdir
Deletes directories in the specified locations. Removes the directory entry specified by each directory argument, provided it is empty. The option --ignore-fail-on-non-empty is used so that the command do not fail if a directory still contains files or directories. In order to delete recursively we will use -rm -r
root@EdgeNode:~# hadoop fs -rmdir /myanalysis/log/web/*
root@EdgeNode:~# hadoop fs -rm -r /myanalysis