快速刪除重復(fù)記錄

2024-09-06 23:58:10

字體：大中小

供稿：網(wǎng)友

數(shù)據(jù)庫中刪除重復(fù)記錄一直是件挺煩人的事，本人收集了oracle跟sqlserver的快速刪除重復(fù)記錄的方法，供大家參考，希望對(duì)大家有所幫助。

sql server
　　想必每一位sql server開發(fā)人員都有過類似的經(jīng)歷，在對(duì)數(shù)據(jù)庫進(jìn)行查詢或統(tǒng)計(jì)的時(shí)候不時(shí)地會(huì)碰到由于表中存在重復(fù)的記錄而導(dǎo)致查詢和統(tǒng)計(jì)結(jié)果不準(zhǔn)確。解決該問題的辦法就是將這些重復(fù)的記錄刪除，只保留其中的一條。

　　在sql server中除了對(duì)擁有十幾條記錄的表進(jìn)行人工刪除外，實(shí)現(xiàn)刪除重復(fù)記錄一般都是寫一段代碼，用游標(biāo)的方法一行一行檢查，刪除重復(fù)的記錄。因?yàn)檫@種方法需要對(duì)整個(gè)表進(jìn)行遍歷，所以對(duì)于表中的記錄數(shù)不是很大的時(shí)候還是可行的，如果一張表的數(shù)據(jù)達(dá)到上百萬條，用游標(biāo)的方法來刪除簡直是個(gè)噩夢(mèng)，因?yàn)樗鼤?huì)執(zhí)行相當(dāng)長的一段時(shí)間。

　　四板斧——輕松消除重復(fù)記錄

　　殊不知在sql server中有一種更為簡單的方法，它不需要用游標(biāo)，只要寫一句簡單插入語句就能實(shí)現(xiàn)刪除重復(fù)記錄的功能。為了能清楚地表述，我們首先假設(shè)存在一個(gè)產(chǎn)品信息表products，其表結(jié)構(gòu)如下：

create table products (
productid int,
productname nvarchar (40),
unit char(2),
unitprice money
)

　　表中的數(shù)據(jù)如圖1：

圖表

　　圖1中可以看出，產(chǎn)品chang和tofu的記錄在產(chǎn)品信息表中存在重復(fù)?，F(xiàn)在要?jiǎng)h除這些重復(fù)的記錄，只保留其中的一條。步驟如下：

　　第一板斧——建立一張具有相同結(jié)構(gòu)的臨時(shí)表

create table products_temp (
productid int,
productname nvarchar (40),
unit char(2),
unitprice money
)

　　第二板斧——為該表加上索引，并使其忽略重復(fù)的值

　　方法是在企業(yè)管理器中找到上面建立的臨時(shí)表products _temp，單擊鼠標(biāo)右鍵，選擇所有任務(wù)，選擇管理索引，選擇新建。如圖2所示。

　　按照?qǐng)D2中圈出來的地方設(shè)置索引選項(xiàng)。

圖2

　　第三板斧——拷貝產(chǎn)品信息到臨時(shí)表

　　insert into products_temp select * from products

　　此時(shí)sql server會(huì)返回如下提示：

　　服務(wù)器: 消息 3604，級(jí)別 16，狀態(tài) 1，行 1

　　已忽略重復(fù)的鍵。

　　它表明在產(chǎn)品信息臨時(shí)表products_temp中不會(huì)有重復(fù)的行出現(xiàn)。

　第四板斧——將新的數(shù)據(jù)導(dǎo)入原表

　　將原產(chǎn)品信息表products清空，并將臨時(shí)表products_temp中數(shù)據(jù)導(dǎo)入，最后刪除臨時(shí)表products_temp。
　　delete products
　　insert into products select * from products_temp
　　drop table products_temp

　　這樣就完成了對(duì)表中重復(fù)記錄的刪除。無論表有多大，它的執(zhí)行速度都是相當(dāng)快的，而且因?yàn)閹缀醪挥脤懻Z句，所以它也是很安全的。

　　小提示：上述方法中刪除重復(fù)記錄取決于創(chuàng)建唯一索引時(shí)選擇的字段，在實(shí)際的操作過程中讀者務(wù)必首先確認(rèn)創(chuàng)建的唯一索引字段是否正確，以免將有用的數(shù)據(jù)刪除。

oracle
　　在oracle中，可以通過唯一rowid實(shí)現(xiàn)刪除重復(fù)記錄；還可以建臨時(shí)表來實(shí)現(xiàn)...這個(gè)只提到其中的幾種簡單實(shí)用的方法，希望可以和大家分享（以表employee為例）。

　　sql> desc employee

　　name null? type

　　emp_id number(10)
　　emp_name varchar2(20)

　　salary number(10,2)

　　可以通過下面的語句查詢重復(fù)的記錄：
　　sql> select * from employee;

　　emp_id emp_name salary

　　1 sunshine 10000

　　2 semon 20000

　　3 xyz 30000

　　2 semon 20000

　　sql> select distinct * from employee;

　　emp_id emp_name salary

　　1 sunshine 10000

　　2 semon 20000

　　3 xyz 30000

　　sql> select * from employee group by emp_id,emp_name,salary having count (*)>1

　　emp_id emp_name salary

　　1 sunshine 10000

　　2 semon 20000

　　sql> select * from employee e1

　　where rowid in (select max(rowid) from employe e2
　　where e1.emp_id=e2.emp_id and

　　e1.emp_name=e2.emp_name and e1.salary=e2.salary);

　　emp_id emp_name salary

　　1 sunshine 10000

　　3 xyz 30000

　　2 semon 20000

　　2. 刪除的幾種方法：

　?。?）通過建立臨時(shí)表來實(shí)現(xiàn)
　　sql>create table temp_emp as (select distinct * from employee)

　　sql> truncate table employee; (清空employee表的數(shù)據(jù)）

　　sql> insert into employee select * from temp_emp; (再將臨時(shí)表里的內(nèi)容插回來）

　　( 2）通過唯一rowid實(shí)現(xiàn)刪除重復(fù)記錄.在oracle中，每一條記錄都有一個(gè)rowid，rowid在整個(gè)數(shù)據(jù)庫中是唯一的，rowid確定了每條記錄是在oracle中的哪一個(gè)數(shù)據(jù)文件、塊、行上。在重復(fù)的記錄中，可能所有列的內(nèi)容都相同，但rowid不會(huì)相同，所以只要確定出重復(fù)記錄中那些具有最大或最小rowid的就可以了，其余全部刪除。

　　sql>delete from employee e2 where rowid not in (
　　select max(e1.rowid) from employee e1 where

　　e1.emp_id=e2.emp_id and e1.emp_name=e2.emp_name and e1.salary=e2.salary);--這里用min(rowid)也可以。

　　sql>delete from employee e2 where rowid <(
　　select max(e1.rowid) from employee e1 where
　　e1.emp_id=e2.emp_id and e1.emp_name=e2.emp_name and e1.salary=e2.salary);

　?。?）也是通過rowid，但效率更高。

　　sql>delete from employee where rowid not in (
　　select max(t1.rowid) from employee t1 group by t1.emp_id,t1.emp_name,t1.salary);--這里用min(rowid)也可以。

　　emp_id emp_name salary

　　1 sunshine 10000

　　3 xyz 30000

　　2 semon 20000

上一篇：DB2中的限制之二數(shù)值的限制

下一篇：合并復(fù)制