29 March, 2010

Most beatiful bug I ever had

While testing my ext2 MINIX 3 server I found a bug which caused problems on file systems with 1024 block size. I used following work flow: download about 1 Gb of different files (either wget or using HGFS/MFS), remove some files and download some more. Then I checked md5sums. Everything was fine. But in Linux e2fsck started to enlarge filesizes of files greater than 64 Mb. The only thing that could differ from file systems with greater block sizes was ranges of indirect blocks (singe, double and triple) and most wonderful thing I discovered is that I had problem with double indirection, which was fine with another block sizes. Correct md5sums pushed me in the wrong direction...
So after some time I discovered that I had to check md5sum in Linux before running e2fsck: it differed from md5sum I got in MINIX! After running cmp I noticed the problem was in last double indirect addressed block, which was really really really strange (function wich maps blocks to positions was very well tested).
And the reason was in pow() function implemented in MINIX 3 (during tests I used python prototype and then gcc)... Instead of shifting or multiplication (block_size^{2,3}) I used pow()... What a bad idea it was! Triple indirect blocks should start at 65804 block and in MINIX same code produced 65803 which was last double indirect addressed block. So I just had working implementation of ext2 incompatible with all another ext2 implementations :D

No comments: