1. 什么是Block(Sector)背景:最近采購了一批新的服務器,底層的存儲設備的默認physical sector size從原有的 512B 改為了 4K。
裝完系統以后,在做數據庫物理備份恢復時xtrabackup報了這么一個錯。但是同樣的備份在512B sector size的老系統上卻可以恢復。
報錯如下:
InnoDB:Error:triedtoread2048bytesatoffset00.InnoDB:Wasonlyabletoread0.14030418:48:12InnoDB:Operatingsystemerrornumber22inafileoperation.InnoDB:Errornumber22means'Invalidargument'.InnoDB:SomeoperatingsystemerrornumbersaredescribedatInnoDB:
http://dev.MySQL.com/doc/refman/5.1/en/operating-system-error-codes.htmlInnoDB:Fileoperationcall:'read'.InnoDB:Cannotcontinueoperation.innobackupex-1.5.1:Error:innobackupex-1.5.1:ibbackupfailedat/usr/bin/innobackupex-1.5.1line386.
我們先不討論原因,先看一下解決方案:http://bazaar.launchpad.net/~akopytov/percona-xtrabackup/bug1190779-2.0/revision/561#src/xtrabackup.cc
升級到2.0.7以上的xtrabackup即可。
為什么同樣的程序在512B的block size 和在4K的block size上的行為結果不一樣呢?
我們先來看一下什么是device block (sector) size:block(又叫sector) 是一個塊設備的最小讀寫單位。也就是說對于一個512B block size的設備。即使上層調用只需要讀10個Byte的數據,它也會從設備讀取512B的數據,然后再把多余的剔除,返回給上層調用者。
在device block size的上層是filesystem block size:對于filesystem來說一個block也是最小的讀寫單位。也即只有一個字節的文件,在底層device上也會占一個block的大小。
更多對于block size的解釋,見鏈接
2. 什么是Aligned IO有了block size以后,自然就出現了對齊(align)的概念。所謂對齊就是IO請求的邊界和底層block的邊界重合。也就是說上層IO請求的起始點和偏移量是下層設備block size的整數倍。同樣讀取512B的數據,對齊后的請求只需要下層設備的一次IO,而非對齊的請求就需要下層設備的兩次IO再加上前后數據截斷。也因為如此,aligned IO的性能要比unaligned IO的性能好很多
下面就是從上自下(從DB到Disk)嚴格對齊的一張事例圖
然而,linux操作系統和MySQL并不嚴格要求IO對齊。unaligned IO只會造成IO請求性能略低,但并不應該出現訪問報錯。
那是什么樣的原因導致xtrabackup在4K sector size的設備上報錯了呢?
3. O_DIRECT 和 unaligned IO查閱Linux文檔以后我們發現,文件系統在O_DIRECT模式下打開的文件有IO對齊的限制。而xtrabackup在使用了O_DIRECT方式open file的情況,發起了unaligned IO。這種情況下,文件系統會拒絕IO請求。
具體文檔摘抄如下:
Users must always take care to use PRoperly aligned and sized IO. Thisis especially important for Direct I/O access. Direct I/O should bealigned on a 'logical_block_size' boundary and in multiples of the'logical_block_size'. With native 4K devices (logical_block_size is 4K)it is now critical that applications perform Direct I/O that is amultiple of the device's 'logical_block_size'. This means thatapplications that do not perform 4K aligned I/O, but 512-byte alignedI/O, will break with native 4K devices. Applications may consult adevice's "I/O Limits" to ensure they are using properly aligned andsized I/O. The "I/O Limits" are exposed through both sysfs and blockdevice ioctl interfaces (also see: libblkid).
而查看xtrabackup 2.0.7 對于這個bug的描述,我們也可以發現這個bug的修復實際上就是簡單的把 O_DIRECT的文件打開屬性去除。具體change log摘抄如下:
4. 相關文檔The problem was in an length-unaligned I/O request issued whilemanipulating xtrabackup_logfile with O_DIRECT enabled.
We don't actually need O_DIRECT in those cases, so the fix was todisable O_DIRECT.. The patch also removes userspace buffer alignmentcode and implements other minor cleanups.
http://www.orczhou.com/index.php/2009/08/innodb_flush_method-file-io/
https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1055547
https://bugs.launchpad.net/percona-xtrabackup/+bug/902567
https://bugs.launchpad.net/percona-server/+bug/1033051
http://www.linuxintro.org/wiki/Blocks,_block_devices_and_block_sizes
http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/
http://people.redhat.com/msnitzer/docs/io-limits.txt
新聞熱點
疑難解答