Karamba 16 September 2004 Summary Karamba boots to both CPUs, the OS is on /hda1, and there's 250GB of software RAID1 with the XFS file system on /md0. The raid drives are Maxtor 7Y250M0. The raid is currently defined in /etc/mdadm/mdadm.conf and possibly mdadm.conf.init. Hardware
For information on SATA support in Linux, see http://www.linuxmafia.com/faq/Hardware/sata.html: Silicon Image 3112 / 3114 (integrated), and 3512 (PCI) (CMD Technology, Inc.) — libata driver set provides beta-level support (as of 2004-07-08) via the sata_sil driver. Note that enabling libata support for this chipset requires enabling CONFIG_BROKEN (under "Code maturity level options") in your kernel configuration, for reasons Jeff Garzik has explained.It is not entirely clear from this whether sata_sil is a RAID driver, which I had assumed, or you need a RAID driver on top of sata_sil -- and in that case, which? Turns out we need md, or Linux software RAID. RAID We should get a third identical drive and define them as RAID5 at the level of the RAID card (SII 3112) by pressing F4 at boot and hopefully making sense of the menu choices. Then we'll need to define them as RAID5 in Linux and finally put a single file system on the whole thing -- the three 250GB drives will appear as a single 500GB drive in RAID5. I've checked out RAID a bit and found this:
By the same token, a set of RAID1 arrays, which just does mirroring, is safer, since it survives without data loss if half the drives fail. Software RAID It turns out that the Silicon Image 3112 is not a full hardware RAID card but a hybrid software/hardware RAID solution. Under the 2.4 kernel it can be run with the Medley driver, but people report it runs faster under 2.6 as a SCSI system using MD, or Linux software RAID. This is what we should be using. Without MD, the sata_sil driver just sees the individual drives, not the RAID array. MD ('multiple devices') uses the admin program mdadm, which I've installed. The current setup is RAID1 (mirroring), so I've loaded the RAID1 module (using modconf), just so we get used to how MD works. To configure mdadm, run "just reconfigure mdadm" -- it has some interesting parameters. For instance it will e-mail you if a disk fails. Configuration file -- I started by issuing cp /usr/share/doc/mdadm/examples/mdadm.conf-example /etc/mdadm.confOnce the array is created, change the configuration file to reflect the details. You don't need it to create the array, it's just used for reassembling it later, if required. I ended up using raidtools2 instead of mdadm -- as the latter gave me trouble. For details, see /etc/raidtab. Installation history Update 22 September 2004 The kernel should from now on be compiled with gcc 3.4; I changed the symlink. It turns out lowmem only handles 896MB of memory, so I enabled highmem -- and 2GB of RAM showed up! I also removed DRI, since the mach64 driver isn't included in the kernel (though we could patch it). Comparing dmesg files, I noticed that swap is no longer being initialized, ever since devfs mount at boot was included in the kernel. In fact swapon -a doesn't work, as the /dev/hda2 partition isn't even seen! On the other hand, the system boots off /dev/hda1, so it's not as if there's a problem seeing the disk. # grep swap *But swap is enabled in all kernels -- and the Real Time Clock Driver also doesn't show. Now, # fdisk -lproduces nothing, and cfdisk /dev/hda says "FATAL ERROR: Cannot open disk drive". Recall this is the drive the operating system is currently running on -- that is, /dev/hda1. The system in fact can't see the other partitions: #swapon -aI rebooted with devfs=nomount to test this hypothesis. Indeed, that was the problem -- dmesg now shows, Adding 2096472k swap on /dev/hda2. Priority:-1 extents:1Whew! That's a relief. Make sure never to use devfs automount. fdisk now works fine. The kernel should now be in good shape; it's tweaked and checked. 19 September 2004: ATI Rage XL AGP card The ATI Rage XL is in the Mach64 family. In official XFree86 releases there is currently no hardware accelerated 3D support for Mach64. However, the mach64 branch in DRI CVS has an almost complete 3D driver. You can find
In XF86Config-4 use "ati" as driver name. It automatically selects the correct driver. 16 September 2004: XFS Karamba is now running sata raid (md software raid) and the xfs file system; the raid array is now autodetected at boot. XFS is the file system developed by SGI for Hollywood and other IRIX users -- a powerful, fast, journaled file system. Ideally we'd put the metadata (the journal) in a small, separate RAID array, but that seems a bit too complicated at this point. This is the file system we should be using for the archives. Hardware RAID is currently limited to around 12 drives. Software RAID has a limit somewhere, but it's something like a couple of hundred drives. For practical purposes, however, we'll probably want to create several arrays of five or six drives. RAID systems are currently limited to the size you define when you establish them. To create file systems that can be shrunk and grown at will, we should use LVM, or logical volume manager. I don't think we need it, but there are cases where it may be useful. On karamba, the current setup is RAID1 and is ready for stress-testing. The RAID config file is at /etc/raidtab and you can see the status in cat /proc/mdstat. 15 September 2004: The software RAID array is not found at boot. To achieve this, you have to partition the individual drives to use the FD partition type (Linux autodetection). Then you have to build every component that's required for the array to get going into the kernel itself and not as modules. To get this working I had to build sata_sil, libata, scsi, md, and raid1 into the kernel, and use cfdisk to set up the component drives (sda and b) with the partitition type FD (Linux automount). I found mdadm a bit limited and installed raidtools2, which seems more robust -- though mdadm has some superior features. I created a raidtab: raiddev /dev/md0Using raidtools2, created the raid with this command: mkraid /dev/md0Nice and simple -- with good error messages. Use the -R switch to force overwriting an old array. 13 September 2004 update: creating a software RAID array We now need to configure the drives to be managed by MD. Once that is done, we can set mdadm to identify the RAID1 system at boot (and the RAID5 array once we have the new drive). I created the RAID1 array with this command: mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1I got the receipt mdadm: size set to 245111616KThen put the XFS file system on it: mkfs.xfs /dev/md0That took no time -- a lot faster than ext2 and ext3. Possible remaining issues: I've not testeed that this setup survives a reboot -- it probably won't, as some piece of configuration is likely missing. We should run tests on this to make sure the configuration is robust. 9 September 2004 update Andrey went straight for 2.6, and things looked good -- but the drives were seen as separate drives, not a raid array. The kernel was configured for a single 386, so I made a new kernel and booted to the dual Xeons. Note that you either need initrd or the drive modules compiled into the kernel, not as modules. Karamba appears to have a so-called watchdog card, a built-in chip in this case. Its function appears to be to reboot the machine under certain conditions of failure. It's supported by the i8xx_tco module. I've loaded the module, which we might regret, but not looked at the configuration. 30 August update Andrey booted into Knoppix 2.4 on 30 August 2004 and sent me a brief report, and gave me access. (Note that you can't ping the machines, but you can ssh to paco.) The ataraid module was loaded, but not medley. I attempted to insert medley, but it didn't find the hardware. This may be because it only supports RAID0 and the machine was configured for RAID1. In this case, we have to use the libata and sata_sil drivers. I adviced Andrey to install a new hard drive for the OS, and use the new Debian installer, with the 2.4 kernel with ataraid and medley for the RAID. Later, we should switch to the 2.6 kernel with libata and sata_sil. Hardware inventory IDE drives There's a CDRW and two Maxtor 250GB drives: hdparm -i /dev/hdc /dev/hdc (CDRW): Model=FX54++M, FwRev=Y01G, SerialNo= Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic } RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0 BuffType=unknown, BuffSize=0kB, MaxMultSect=0 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0 IORDY=yes, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 *mdma2 UDMA modes: udma0 udma1 udma2 AdvancedPM=no hdparm -i /dev/hde /dev/hde: Model=Maxtor 7Y250M0, FwRev=YAR51EW0, SerialNo=Y62QTH3E Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: (null): hdparm -i /dev/hdg /dev/hdg: Model=Maxtor 7Y250M0, FwRev=YAR51EW0, SerialNo=Y62QT4XE Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: (null): Knoppix 2.4 boot lspci 0000:00:00.0 Host bridge: Intel Corp. E7505 Memory Controller Hub (rev 03) 0000:00:00.1 Class ff00: Intel Corp. E7000 Series RAS Controller (rev 03) 0000:00:01.0 PCI bridge: Intel Corp. E7000 Series Processor to AGP Controller (rev 03) 0000:00:02.0 PCI bridge: Intel Corp. E7000 Series Hub Interface B PCI-to-PCI Bridge (rev 03) 0000:00:02.1 Class ff00: Intel Corp. E7000 Series Hub Interface B PCI-to-PCI Bridge RAS Controller (rev 03) 0000:00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 02) 0000:00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 02) 0000:00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 02) 0000:00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI Controller (rev 02) 0000:00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 82) 0000:00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 02) 0000:00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 Storage Controller (rev 02) 0000:00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 02) 0000:02:1c.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 0000:02:1d.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:02:1e.0 PIC: Intel Corp. 82870P2 P64H2 I/OxAPIC (rev 04) 0000:02:1f.0 PCI bridge: Intel Corp. 82870P2 P64H2 Hub PCI Bridge (rev 04) 0000:04:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) 0000:05:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 0000:05:03.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0d) 0000:05:04.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) Silicon Image Serial ATARaid Controller [ CMD/Sil 3112/3112A ] (rev 02) |