HDFS – Where are my file blocks?

At times I would like to know how a file is stored in HDFS. What is below will show which blocks exist for a given file, as well as on which nodes they are stored.

import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;

public class HDFSFileBlocks {
  public static void main(String[] args) throws IOException {
    Configuration conf = new Configuration();

    FileSystem fs = FileSystem.get(conf);
    Path file = new Path(args[0]);
    FileStatus fileStatus = fs.getFileStatus(file);
    BlockLocation[] blocks = fs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
    for (int i = 0; i < blocks.length; i++) {
      System.out.println(blocks[i].toString());
    }
    fs.close();
  }
}

For example, a 16MB file stored in 1MB blocks may look like what is below in a two datanode cluster.

# hadoop-0.23.7/bin/hadoop HDFSFileBlocks i7.txt
0,1048576,expressdb1,expressdb2
1048576,1048576,expressdb1,expressdb2
2097152,1048576,expressdb1,expressdb2
3145728,1048576,expressdb1,expressdb2
4194304,1048576,expressdb1,expressdb2
5242880,1048576,expressdb1,expressdb2
6291456,1048576,expressdb2,expressdb1
7340032,1048576,expressdb2,expressdb1
8388608,1048576,expressdb1,expressdb2
9437184,1048576,expressdb1,expressdb2
10485760,1048576,expressdb2,expressdb1
11534336,1048576,expressdb2,expressdb1
12582912,1048576,expressdb1,expressdb2
13631488,1048576,expressdb1,expressdb2
14680064,1048576,expressdb1,expressdb2
15728640,960255,expressdb1,expressdb2
#

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.