At times I would like to know how a file is stored in HDFS. What is below will show which blocks exist for a given file, as well as on which nodes they are stored.
import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
public class HDFSFileBlocks {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path file = new Path(args[0]);
FileStatus fileStatus = fs.getFileStatus(file);
BlockLocation[] blocks = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
for (int i = 0; i < blocks.length; i++) {
System.out.println(blocks[i].toString());
}
fs.close();
}
}
For example, a 16MB file stored in 1MB blocks may look like what is below in a two datanode cluster.
# hadoop-0.23.7/bin/hadoop HDFSFileBlocks i7.txt
0,1048576,expressdb1,expressdb2
1048576,1048576,expressdb1,expressdb2
2097152,1048576,expressdb1,expressdb2
3145728,1048576,expressdb1,expressdb2
4194304,1048576,expressdb1,expressdb2
5242880,1048576,expressdb1,expressdb2
6291456,1048576,expressdb2,expressdb1
7340032,1048576,expressdb2,expressdb1
8388608,1048576,expressdb1,expressdb2
9437184,1048576,expressdb1,expressdb2
10485760,1048576,expressdb2,expressdb1
11534336,1048576,expressdb2,expressdb1
12582912,1048576,expressdb1,expressdb2
13631488,1048576,expressdb1,expressdb2
14680064,1048576,expressdb1,expressdb2
15728640,960255,expressdb1,expressdb2
#