Are files in HDFS immutable?

Call me cynical, I just am a bit of a doubting Thomas.

Using our previous write test code, we simply run the exact same test, only we do it twice.

[root@cmhlpdlkedat15 ~]# hadoop HDFSWriteTest foobar.txt
hdfs://cmhlpdlkedat14.expressco.com:8020
[root@cmhlpdlkedat15 ~]# hdfs dfs -ls /user/root
Found 3 items
drwx------   - root hdfs          0 2015-04-03 02:00 /user/root/.Trash
drwxr-xr-x   - root hdfs          0 2015-04-06 13:37 /user/root/.hiveJars
-rw-r--r--   3 root hdfs   16688895 2015-04-10 15:20 /user/root/foobar.txt
[root@cmhlpdlkedat15 ~]# hadoop HDFSWriteTest foobar.txt
hdfs://cmhlpdlkedat14.expressco.com:8020
Exception in thread "main" org.apache.hadoop.fs.FileAlreadyExistsException: /user/root/foobar.txt for client 172.27.2.64 already exists

However, there is an append method on the FileSystem object. If we change one line in our class…

   
    //comment out the create and change it to an append operation
    //FSDataOutputStream outStream = fs.create(file,false, 4096, (short)3, (long)1048576);
    FSDataOutputStream outStream = fs.append(file);

…we find it allows us to in fact append to an existing file.

[root@cmhlpdlkedat15 ~]# hadoop HDFSWriteTest foobar.txt
hdfs://cmhlpdlkedat14.expressco.com:8020
[root@cmhlpdlkedat15 ~]# hdfs dfs -ls /user/root
Found 3 items
drwx------   - root hdfs          0 2015-04-03 02:00 /user/root/.Trash
drwxr-xr-x   - root hdfs          0 2015-04-06 13:37 /user/root/.hiveJars
-rw-r--r--   3 root hdfs   33377790 2015-04-10 15:28 /user/root/foobar.txt
[root@cmhlpdlkedat15 ~]#

As such, we need to be clear that while you can’t change existing content, you can add to it. This actually has a long history, which you can find here

1 comment for “Are files in HDFS immutable?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.