Python連接MySQL并使用fetchall()方法過濾特殊字符

來一個簡單的例子，看Python如何操作數據庫，相比Java的JDBC來說，確實非常簡單，省去了很多復雜的重復工作，只關心數據的獲取與操作。
準備工作
需要有相應的環境和模塊：

Ubuntu 14.04 64bit
Python 2.7.6
MySQLdb

注意:Ubuntu 自帶安裝了Python，但是要使用Python連接數據庫，還需要安裝MySQLdb模塊，安裝方法也很簡單：

sudo apt-get install MySQLdb

然后進入Python環境，import這個包，如果沒有報錯，則安裝成功了：

pythonPython 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import MySQLdb>>>

Python標準的數據庫接口的Python DB-API（包括Python操作MySQL）。大多數Python數據庫接口堅持這個標準。不同的數據庫也就需要不同額模塊，由于我本機裝的是MySQL，所以使用了MySQLdb模塊，對不同的數據庫而言，只需要更改底層實現了接口的模塊，代碼不需要改，這就是模塊的作用。
Python數據庫操作
首先我們需要一個測試表
建表語句：

CREATE DATABASE study;use study;DROP TABLE IF EXISTS python_demo;CREATE TABLE python_demo ( id int NOT NULL AUTO_INCREMENT COMMENT '主鍵，自增', user_no int NOT NULL COMMENT '用戶編號', user_name VARBINARY(50) NOT NULL COMMENT '用戶名', password VARBINARY(50) NOT NULL COMMENT '用戶密碼', remark VARBINARY(255) NOT NULL COMMENT '用戶備注', PRIMARY KEY (id,user_no))ENGINE =innodb DEFAULT CHARSET = utf8 COMMENT '用戶測試表';INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1001,'張三01','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1002,'張三02','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1003,'張三03','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1004,'張三04','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1005,'張三05','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1006,'張三06','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1007,'張三07','admin','我是張三');INSERT INTO python_demo(user_no, user_name, password, remark) VALUES (1008,'張三08','admin','我是張三');

Python代碼

# --coding=utf8--import ConfigParserimport sysimport MySQLdbdef init_db():  try:    conn = MySQLdb.connect(host=conf.get('Database', 'host'),                user=conf.get('Database', 'user'),                passwd=conf.get('Database', 'passwd'),                db=conf.get('Database', 'db'),                charset='utf8')    return conn  except:    print "Error:數據庫連接錯誤"    return Nonedef select_demo(conn, sql):  try:    cursor = conn.cursor()    cursor.execute(sql)    return cursor.fetchall()  except:    print "Error:數據庫連接錯誤"    return Nonedef update_demo():  passdef delete_demo():  passdef insert_demo():  passif __name__ == '__main__':  conf = ConfigParser.ConfigParser()  conf.read('mysql.conf')  conn = init_db()  sql = "select * from %s" % conf.get('Database', 'table')  data = select_demo(conn, sql)  pass

fetchall()字段特殊字符過濾處理
最近在做數據倉庫的遷移工作,之前數據倉庫的數據都是用的shell腳本來抽取,后來換了python腳本.
但是在把數據抽取存放到hadoop時,出現了一個問題:
由于數據庫字段很多,提前也不知道數據庫字段會存儲什么內容,hive建表是以/t/n做分隔,這就導致了一個問題,如果mysql字段內容里面本身含有/t/n,那么就會出現字段錯位情況,并且很頭疼的是mysql有100多個字段,也不知道哪個字段會出現這個問題.
shell腳本里的做法是在需要抽取的字段上用mysql的replace函數對字段進行替換,例如,假設mysql里的字段是column1 varchar(2000),那么很可能就會出現有特殊字符的情況,在查詢的sql語句里加上

select replace(replace(replace(column1,'/r',''),'/n',''),'/t','')

之前一直是這么干的,但是這樣寫sql特別長,特別是有100多個字段,也不知道哪個有特殊字符,只要都加上.
所以在python中對字段不加處理,最終導致hive表字段對應出現偏差,所以在python里從mysql查詢到的字段在寫到文件之前需要對每個字段進行過濾處理
看個例子,我就以mysql測試為例,首先建一張測試表

CREATE TABLE `filter_fields` ( `field1` varchar(50) DEFAULT NULL, `field2` varchar(50) DEFAULT NULL, `field3` varchar(50) DEFAULT NULL, `field4` varchar(50) DEFAULT NULL, `field5` varchar(50) DEFAULT NULL, `field6` varchar(50) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;

有六個字段,都是varchar類型,插入新數據可以在里面插入特殊字符.簡單插入條數據測試看看:

insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test01','test02','test03','test04','test05','test06');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test11/ntest11','test12/n/n','test13','test14','test15','test16');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test21/ttest21','test22/ttest22/ttest22','test23/t/t/t','test4','test5','test6');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test21/rest21','test22/r/rest22/r/rest22','test23/r/r/r','test4','test5','test6');

其中數據里插入的特殊字符,可能連在一起,也有不連在一起的.
python測試代碼:

# coding=utf-8import MySQLdbimport sysdb_host = '127.0.0.1'  # 數據庫地址db_port = 3306     # 數據庫端口db_user = 'root'    # mysql用戶名db_pwd = 'yourpassword' # mysql用戶密碼,換成你的密碼db_name = 'test'    # 數據庫名db_table = 'filter_fields' # 數據庫表# 過濾sql字段結果中的/t/ndef extract_data(table_name):  try:    conn = MySQLdb.connect(host=db_host, port = db_port, user=db_user,                passwd = db_pwd, db = db_name, charset = "utf8")    cursor = conn.cursor()  except MySQLdb.Error, e:    print '數據庫連接異常'    sys.exit(1)  try:    sql = 'select * from %s;'%(table_name)    cursor.execute(sql)    rows = cursor.fetchall()    print '====字段未過濾查詢結果===='    for row in rows:      print row    print '====字段過濾之后結果===='    rows_list = []    for row in rows:      row_list = []      for column in row:        row_list.append(column.replace('/t', '').replace('/n', '').replace('/r', ''))      rows_list.append(row_list)      print rows_list[-1] # [-1]表示列表最后一個元素    return rows_list  except MySQLdb.Error, e:    print '執行sql語句失敗'    cursor.close()    conn.close()    sys.exit(1)if __name__ == '__main__':  print 'begin:'  rows = extract_data(db_table)  pass

看看輸出結果:

字段未過濾查詢結果

(u'test01', u'test02', u'test03', u'test04', u'test05', u'test06')(u'test11/ntest11', u'test12/n/n', u'test13', u'test14', u'test15', u'test16')(u'test21/ttest21', u'test22/ttest22/ttest22', u'test23/t/t/t', u'test4', u'test5', u'test6')(u'test21/rest21', u'test22/r/rest22/r/rest22', u'test23/r/r/r', u'test4', u'test5', u'test6')

字段過濾之后結果

[u'test01', u'test02', u'test03', u'test04', u'test05', u'test06'][u'test11test11', u'test12', u'test13', u'test14', u'test15', u'test16'][u'test21test21', u'test22test22test22', u'test23', u'test4', u'test5', u'test6'][u'test21est21', u'test22est22est22', u'test23', u'test4', u'test5', u'test6']

可以看到,制表符,換行符,回車都被過濾了.
建議:最后說點題外話,不要小視/r,回車符.很多人以為回車符就是換行符,其實不是的,/r表示回車符,/n表示新行.之前代碼里其實是過濾掉了/t/n的,但是抽取的數據還是不對,后來看了源碼之后才發現,原來是沒有過濾/r,就這個不同導致了很多數據抽取不對.