Hadoop - ... or should we say "The Apache Hadoop software library"

mail

copyFromLocal

Usage

copy one or more local files to the destination filesystem :
mail

mkdir

Usage

create directories :

Flags

Flag Usage
-p create parent directories along the path, just like Bash's mkdir -p does
mail

distcp

Usage

distcp (distributed copy)
mail

Hadoop glossary

trash
  • default location : /user/userName/.Trash/Current
  • deleted data
    • stays in the trash for an amount of time configured by fs.trash.interval
    • can be restored simply by being moved out of the .Trash/ directory (source)
mail

count

Usage

mail

du

Usage

Display

Flags

Flag Usage
-h use human-readable units : K, M, G,
-s sum the files sizes rather than displaying each file size individually

Example

mail

find

Usage

mail

ls

Usage

list files. Default output format :

Flags

Flag Usage
-C display the paths of files but not the other fields (i.e. don't display permissions, owner, group, dates, ) like ls -1 does)
mail

rm

Usage

delete files

Flags

Flag Usage
-r -R recursively delete a directory and its content
The rmr command has been deprecated by rm -r
-safely before deleting directory, prompt for confirmation if total number of files > hadoop.shell.delete.limit.num.files (set in core-site.xml, default: 100)
  • it can be used with -skipTrash to prevent accidental deletion of large directories
  • expect a delay when walking over large directory recursively to count the number of files to be deleted before the confirmation
-skipTrash delete the specified files immediately instead of moving them to the trash
This is a dangerous option which use is discouraged.