跳转至

检索排名前三和删除重复


准备工作

创建表

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `score` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

插入数据

INSERT INTO test.test (name,score) VALUES
     ('zhangsan',100),
     ('zhangsan',100),
     ('lisi',99),
     ('whangwu',99),
     ('lisi',98),
     ('zhaoliu',97);

题目1: 查出来分数前三的同学

步骤一: 检索分数最高的前三个, 同时考虑到可能出线分数相同(名次并列)的情况

-- 方法1, 分数去重
select distinct score from test limit 3
-- 方法2, 分数分组
select score from test group by score order by score desc limit 3

步骤二: 检索分数等于最高的前三个的同学即是答案

直接执行sql: select * from test where score in (select distinct score from test limit 3)

会报错并提示: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

即不支持在子查询中使用limit, 解决办法就是把子查询变成一张派生表: select * from (select distinct score from test limit 3) as t

然后再使用子查询(即多嵌套一层):

select * from test where score in
(select * from
(select distinct score from test limit 3) as t
)

题目2: 删除名称重复的数据

步骤一: 检索出名称重复的数据

直接执行sql: select name, count(name) as counts from test where count(name) > 1 group by name

会报错并提示: Invalid use of group function

原因是where后面的条件中不能使用聚合函数, 要使用having语法替代:

select name, count(name) as counts from test group by name having counts > 1

步骤二: 删除重复的数据

通过max()或min()函数选择保留ID最大或最小的数据: select min(id) as min_id from test group by name

剩余的即是需要删除的

直接执行sql: delete from test where id not in (select min(id) as min_id from test group by name)

会报错并提示: You can't specify target table 'test' for update in FROM clause

原因是, 不能先查出同一个表中的数据以后再进行更新(包括删除), 解决办法与题目1中的步骤2一样, 生成派生表即可, 然后通过派生表删除:

delete from test where id not in
(select min_id from
(select min(id) as min_id from test group by name) as t
)

其他解法:

-- 方法1
select a.id 
from (
   select ROW_NUMBER() over (partition by name order by id) as d
   ,name
   ,id 
   from test) as a
where a.d>1

-- 方法2
select t.id  from test t 
inner join 
  (select count(1),name,min(id) as id from test group by name having count(1)>1) as a 
 on t.id <> a.id and t.name =a.name

 -- 上面两个都能直接查出来需要删除的数据, 把select改成delete即可, 且效率比嵌套子查询高

题目3: 删除名称和分数重复的数据

题目3解答思路同题目2, 但是所有字段重复才算重复, 所以select和group by要列出所有字段:

delete from test where id in
(select max_id from
(select max(id) as max_id, name, score, count(*) as counts from test group by name, score having counts > 1) as t
)

PS: count(name) count(score) count(*) 效果一样