Today, after downloading a lot of files with Nutch, we had a problem with HDFS.Look at these errors:
2011-08-05 09:29:06,878 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-08-05 09:29:06,906 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 378928
2011-08-05 09:29:11,507 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 4
2011-08-05 09:29:11,511 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 53011779 loaded in 4 seconds.
2011-08-05 09:29:11,653 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /opt/hdfs/name/current/edits of size 115413 edits # 969 loaded in 0 seconds.
2011-08-05 09:29:11,657 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log at offset 4114
2011-08-05 09:29:11,657 ERROR org.apache.hadoop.hdfs.server.common.Storage: Last 4 opcodes at offsets: 3494 3661 3952 4110
2011-08-05 09:29:11,659 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Incorrect data format. logVersion is -19 but writables.length is 0.
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:566)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1039)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217)
2011-08-05 09:29:11,661 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Incorrect data format. logVersion is -19 but writables.length is 0.
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:566)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1039)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217)
2011-08-05 09:29:11,662 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
Namenode local dir for HDFS became full, so the Namenode shut down. After googling we found that the problem was connected with the edits files under:
/dfs/name/current/edits
Someone said that the solution was:
- delete some directory from Namenode local dir for HDFS (HDFS has data replication so you should have no problem deleting one copy of files). Now the namenode shoud have some empty space. Check it using:
df -h
- Move out the Namenode from data nodes (it must be almost full so we don’t want to take a risk putting it into the cluster as namenode!)
- Backup this files in your machine
/dfs/name/current/edits
and
/dfs/name/current/edits.new
- Try inserting empty binary string into files with:
printf "\xff\xff\xff\xee\xff" > /opt/hdfs/name/current/edits
- Then restart the HDFS cluster. Note that this solution didn’t work for us. But worked for other Hdfs Users. If this didn’t work for you…
- …restore the edits file from your disk
- and try:
printf "\xff\xff\xff\xee\xff" > /opt/hdfs/name/current/edits.new
- Then restart the HDFS cluster.
- This worked nice for us, the HDFS become up and running and it statst the lost blocks replication!