30 March, 2010

On the dangers of drugs

First I smiled, but at the end I was trying to сlose my mouth in surprise...

29 March, 2010

Most beatiful bug I ever had

While testing my ext2 MINIX 3 server I found a bug which caused problems on file systems with 1024 block size. I used following work flow: download about 1 Gb of different files (either wget or using HGFS/MFS), remove some files and download some more. Then I checked md5sums. Everything was fine. But in Linux e2fsck started to enlarge filesizes of files greater than 64 Mb. The only thing that could differ from file systems with greater block sizes was ranges of indirect blocks (singe, double and triple) and most wonderful thing I discovered is that I had problem with double indirection, which was fine with another block sizes. Correct md5sums pushed me in the wrong direction...
So after some time I discovered that I had to check md5sum in Linux before running e2fsck: it differed from md5sum I got in MINIX! After running cmp I noticed the problem was in last double indirect addressed block, which was really really really strange (function wich maps blocks to positions was very well tested).
And the reason was in pow() function implemented in MINIX 3 (during tests I used python prototype and then gcc)... Instead of shifting or multiplication (block_size^{2,3}) I used pow()... What a bad idea it was! Triple indirect blocks should start at 65804 block and in MINIX same code produced 65803 which was last double indirect addressed block. So I just had working implementation of ext2 incompatible with all another ext2 implementations :D

11 March, 2010

Evolution of one simple function (ext2_max_size)

For a while I've been working on ext2 fs implementation for Minix 3 (ext2 server in Minix terms). I've already implemented ext2 server which can read and write. One of functions required by implementation is ext2_max_size which calculate max file size for ext2/ext3 fs.
Here is a function from linux (with my comments):

/*
 *  linux/fs/ext2/super.c
 *
 * Copyright (C) 1992, 1993, 1994, 1995
 * Remy Card (card@masi.ibp.fr)
 * Laboratoire MASI - Institut Blaise Pascal
 * Universite Pierre et Marie Curie (Paris VI)
 *
 *  from
 *
 *  linux/fs/minix/inode.c
 *
 *  Copyright (C) 1991, 1992  Linus Torvalds
 *
 *  Big-endian to little-endian byte-swapping/bitmaps by
 *        David S. Miller (davem@caip.rutgers.edu), 1995
 */

...
/*
 * Maximal file size.  There is a direct, and {,double-,triple-}indirect
 * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks.
 * We need to be 1 filesystem block less than the 2^32 sector limit.
 */
static loff_t ext2_max_size(int bits)
{
        /* eivanov: bits = 10 + log2(block_size/1024),
         * note that 2^10 is 1024, which is minimal block_size on ext2,
         */
        loff_t res = EXT2_NDIR_BLOCKS;
        int meta_blocks;
        loff_t upper_limit;

        /* This is calculated to be the largest file size for a
         * dense, file such that the total number of
         * sectors in the file, including data and all indirect blocks,
         * does not exceed 2^32 -1
         * __u32 i_blocks representing the total number of
         * 512 bytes blocks of the file
         */
        upper_limit = (1LL << 32) - 1;

        /* total blocks in file system block size */
        /* eivanov: same as "upper_limit /= (block_size/512)" */
        upper_limit >>= (bits - 9);


        /* indirect blocks */
        meta_blocks = 1;
        /* double indirect blocks */
        meta_blocks += 1 + (1LL << (bits-2));
        /* tripple indirect blocks */
        meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));

        upper_limit -= meta_blocks;
        upper_limit <<= bits; /* upper_limit in bytes */

        res += 1LL << (bits-2);
        res += 1LL << (2*(bits-2));
        res += 1LL << (3*(bits-2)); /* eivanov: total blocks addressed by inode */
        res <<= bits; /* eivanov: res in bytes */
        if (res > upper_limit)
                res = upper_limit;

        if (res > MAX_LFS_FILESIZE)
                res = MAX_LFS_FILESIZE;

        return res;
}
This function can work with any block_size, but the only supported block_sizes are 1024, 2048, 4096 and on rare platforms 8096. Thus it only calculates constants. I can recall any reason to calculate constants instead of using literals.
Here what I wrote while I was reading linux' function (used arithmetics instead of shifting):

long long ext2_max_size2(int block_size)
{
    /* Note, that this one takes block_size instead of bits */
  long addr_in_block = block_size/4; /* 4 bytes per addr, long for further arithmetic */
  long sectors_in_block = block_size/512;
  long meta_blocks; /* single, double and triple indirect blocks */
  long long out_range_s; /* max blocks addressed by inode */
  unsigned long long max_bytes;
  unsigned long long upper_limit;

  /* 1 indirect block, 1 + addr_in_block dindirect and 1 + addr_in_block +
   * + addr_in_block*addr_in_block triple indirect blocks */
  meta_blocks = 2*addr_in_block + addr_in_block*addr_in_block + 3;
  out_range_s = 12 + addr_in_block + pow(addr_in_block, 2) + pow(addr_in_block, 3);
  max_bytes = out_range_s * block_size;

  upper_limit = (1LL << 32) - 1; /* max 512-byte blocks by i_blocks */
  upper_limit /= sectors_in_block; /* total block_size blocks */
  upper_limit -= meta_blocks; /* total data blocks */
  upper_limit *= (long long)block_size; /* max size in bytes */

  if (max_bytes > upper_limit)
      max_bytes = upper_limit;

  return max_bytes;
}

IMHO it's much more readable. Probably compiler will not gnerate shifting instructions, but this function is called just once, when fs is mounted, so don't claim on performance.

Then I came to decision use following function (similar):

unsigned long ext2_max_size2(int block_size)
{
    switch(block_size)
    {
        case 1024: return 67383296L;
        case 2048: return 537944064L;
        case 4096: return 4286562304U;
        /* add default panic, when move to kernel server */
    }
}
Also there is a small pitfall: these constants also depend on EXT2_NDIR_BLOCKS, but the code listed above doesn't use it. So ideal variant is to check if EXT2_NDIR_BLOCKS is the same as was used in calculations (99.9 that it is, since changing it will break compatibility with older on-disk filesystems) and recalculate if it was changed.
Would it be the fastest? Sure! Simplest? Of course :) The only missing thing is missing 8096 block size, since it's used on some rare architecture (don't remember name) which will not be supported by Minix 3 (x86 and arm/x86-64 in plans). Also there is a small pitfall: these constants also depend on EXT2_NDIR_BLOCKS, but the code listed above doesn't use it. So ideal variant is to check if EXT2_NDIR_BLOCKS is the same as was used in calculations (99.9 that it is, since changing it will break compatibility with older on-disk filesystems) and recalculate if it was changed.