Map outputs are temporary intermediate data which doesn't purpose to the user running the job. It is used by the reducer to combine, sort, shuffle and produce the final output. It's not recommended to store it in hdfs as the data will be replicated across the cluster, the namenode has to update its metadata, etc. … Continue reading Why Map outputs are stored in local FS and not in HDFS?