Hadoop FileSystem Commands

Updated on Oct 03, 2020

In this article we will learn the basic and mostly used Hadoop File System Commands. The File System (FS) shell commands interact with the Hadoop Distributed File System (HDFS). Most of the commands in FS shell behave like corresponding Unix commands.

Hadoop FS Commands

All FS shell commands take path URIs as arguments. For HDFS schema, an HDFS file or directory is specified like hdfs://NameNode/log_analysis/php_error_log & for local filesystem schema a file or directory is specified like file:///root/logfiles

Lets check the FS Commands along with examples:

help

Displays help for any specific or all FS commands if none is specified.

root@EdgeNode:~# hadoop fs -help

root@EdgeNode:~# hadoop fs -help mkdir

usage

This is helpful to quickly check the available usage options & arguments for any commands.

root@EdgeNode:~# hadoop fs -usage

root@EdgeNode:~# hadoop fs -usage mkdir

mkdir

Create a directory in specified location. The -p option can be used so that the command does not fail if the directory already exists.

root@EdgeNode:~# hadoop fs -mkdir /myanalysis

root@EdgeNode:~# hadoop fs -mkdir -p /myanalysis/log/web/

root@EdgeNode:~# hadoop fs -mkdir -p /myanalysis/log/web/access /myanalysis/log/web/error

touchz

Create a file of zero length at the specified path. An error is returned if the file exists with non-zero length.

root@EdgeNode:~# hadoop fs -touchz /myanalysis/log/web/access/access.log

put

Copy single or multiple files from local file system to the destination file system. Copying fails if the file already exists, unless the -f flag is given, which Overwrites the destination if it already exists. Also reads input from stdin and writes to destination file system.

root@EdgeNode:~# echo "Dummy,Access,Log,File,In,Local,Filesystem" > /root/dummy.csv

root@EdgeNode:~# hadoop fs -put -f /root/dummy.csv /myanalysis/log/web/access/access.log

copyFromLocal

Similar to put command, except that the source is restricted to a local file reference. Similiarly the -f option will overwrite the destination if it already exists.

root@EdgeNode:~# echo "Dummy,Error,Log,File,In,Local,Filesystem" > /root/gummy.csv

root@EdgeNode:~# hadoop fs -copyFromLocal -f /root/gummy.csv /myanalysis/log/web/error/error.log

moveFromLocal

Similar to put command, except that the local source is deleted after it’s copied.

root@EdgeNode:~# hadoop fs -moveFromLocal /root/gummy.csv /myanalysis/log/web/error/error2.log

ls

List the contents that match the specified file pattern. If path is not specified, the contents of /user/<currentUser> will be listed. Two more additional arguments often used are:
-h which basically, formats the sizes of files in a human-readable fashion rather than a number of bytes.
-R which list the contents of directories recursively.

root@EdgeNode:~# hadoop fs -ls

root@EdgeNode:~# hadoop fs -ls /

root@EdgeNode:~# hadoop fs -ls -R /myanalysis

root@EdgeNode:~# hadoop fs -ls -h -R /myanalysis/log/web/

find

Finds all files that match the specified expression and applies selected actions to them. If no <path> is specified then defaults to the current working directory. Also if no expression is specified then defaults to -print.

root@EdgeNode:~# hadoop fs -find / -name web -print

root@EdgeNode:~# hadoop fs -find / -name access.log -print

cat

Copies source paths to stdout. Fetch all files that match the file pattern in specified source and display their content on stdout.

root@EdgeNode:~# hadoop fs -cat /myanalysis/log/web/access/access.log /myanalysis/log/web/error/error.log

tail

Displays last 1 kilobyte of the file to stdout. The -f option will output appended data as the file grows, similarly like Unix.

root@EdgeNode:~# hadoop fs -tail /myanalysis/log/web/error/error.log

root@EdgeNode:~# hadoop fs -tail -f /myanalysis/log/web/access/access.log

chown

Changes the owner and groups of the specified files. -R option modifies the files recursively.

root@EdgeNode:~# hadoop fs -chown -R root:supergroup /myanalysis

chgrp

Change group association of files. The -R option will make the change recursively through the directory structure. The user must be the owner of files, or else a super-user.

root@EdgeNode:~# hadoop fs -chgrp -R supergroup /myanalysis

chmod

Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-user.

root@EdgeNode:~# hadoop fs -chmod -R 1777 /myanalysis

cp

Copy files that match the file pattern from source to destination. This command allows multiple sources as well in which case the destination must be a directory. The -f option will overwrite the destination if it already exists.

root@EdgeNode:~# hadoop fs -cp /myanalysis/log/web/error/error.log /myanalysis/log/web/error/error3.log

mv

Moves files that match the file pattern from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory.

root@EdgeNode:~# hadoop fs -mv /myanalysis/log/web/error/error3.log /myanalysis/log/web/error/error.log.bkp

get

Copy files that matches the file pattern from HDFS to the local file system. When copying multiple files, the destination must be a directory.

root@EdgeNode:~# hadoop fs -get /myanalysis/log/web/error/*.log /root

copyToLocal

Similar to get command, except that the destination is restricted to a local file reference.

root@EdgeNode:~# hadoop fs -copyToLocal /myanalysis/log/web/access/*.log /root

getmerge

Takes a source directory and a destination file as input and concatenates files in source into the destination local file. Optionally -nl can be set to enable adding a newline character (LF) at the end of each file.

root@EdgeNode:~# hadoop fs -getmerge -nl /myanalysis/log/web/access/access.log /myanalysis/log/web/access/access.log /root/error.log

df

Displays free space. The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)

root@EdgeNode:~# hadoop fs -df -h /

rm

Delete files specified as args. The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist. The -R option deletes the directory and any content under it recursively. The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately.

root@EdgeNode:~# hadoop fs -rm /myanalysis/log/web/access/access.log

root@EdgeNode:~# hadoop fs -rm -skipTrash /myanalysis/log/web/error/*

rmdir

Deletes directories in the specified locations. Removes the directory entry specified by each directory argument, provided it is empty. The option --ignore-fail-on-non-empty is used so that the command do not fail if a directory still contains files or directories. In order to delete recursively we will use -rm -r

root@EdgeNode:~# hadoop fs -rmdir /myanalysis/log/web/*

root@EdgeNode:~# hadoop fs -rm -r /myanalysis