4399理论片午伦夜理片,大又大粗又爽又黄少妇毛片,大肉大捧一进一出好爽视频百度

Home

Web Front-end

JS Tutorial

How to remove duplicates in Postgres SQL

Mary-Kate Olsen

Nov 26, 2024 pm 03:48 PM

How to remove duplicates in Postgres SQL

Crossposted on my blog
You can read it here

Our schema

create table "post" (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL
);

create table "user" (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
)

create table "post_like" (
  id SERIAL PRIMARY KEY,
  post_id INTEGER NOT NULL REFERENCES post(id),
  user_id INTEGER NOT NULL REFERENCES user(id)
)

Now we want to ensure that each user cannot like the same post more than once.
This can be prevented by:

using a unique constraint on the pair post_id user_id columns of the post_like table.
or removing the id column of the post_like table and use a composite primary key on post_id user_id

But, assuming we are at a point where duplicates are already there, we need to remove them.

Check if there are duplicates

select 
  post_id, 
  user_id,
  count(*)
from post_like
group by post_id, user_id
having count(*) > 2
;

| post_id | user_id | count |
| ------- | ------- | ----- |
| 3       | 2       | 2     |

This output tells us that user 2 has liked post 3 more than one time, specifically 2 times.

Remove duplicates

Now that we know that there are duplicates, we can remove them.

We split this process in two step:

read duplicates
remove duplicates (dry run)
remove duplicates (real run)

Read duplicates

Transaction rollback

To test our queries without removing real data, until we are sure the query is correct, we use the transaction rollback feature.

By doing this our query will never be committed, is similar to the
"dry run" concept that you can find on other applications (like
rsync).

CTE

We use CTE because it provides a good DX.

With CTE, we can run a query, store the results in a temporary table, and then use the same table for subsequent queries.

This mental model is similar to what we usually do in coding by creating a temporary variable.

The CTE syntax is
 with 
 <cte_name> as (
   <query>
 ),
 <cte_name_2> as (
   <query_2> -- here we can refernce <cte_name>
 )
 <final_query> -- here we can refernce <cte_name> and <cte_name_2>

With both transaction and CTE, we can do the following:

begin; -- start transaction

with
duplicates_info as (
  select
    row_number() over (
      partition by post_id, user_id order by user_id
    ) as group_index,
    id,
    post_id,
    user_id
  from post_like
)
select *
from duplicates_info
;

rollback; -- ends transaction discarding every changes to the database

| group_index | id | post_id | user_id |
| ----------- | -- | ------- | ------- |
| 1           | 1  | 1       | 1       |
| 1           | 2  | 2       | 2       |
| 1           | 3  | 3       | 2       |
| 2           | 4  | 3       | 2       |

The latest row of results, where group_index is 2, means that this row is the second one in the group with post_id = 3 and user_id = 2.

What happens here with the syntax?

row_number() over (partition by ...) as group_index is a window function that, first group rows by the columns in the partition by clause, and then assigns a number to each row, based on the index of the row in the group.

partition is similar to group by, because it groups the rows by a common column, but if group by return only 1 row for each group, partition let us add new columns to the source table based on groups.

group_index is a column name alias, regular sql syntax.

Filter only duplicates

Now let's keep only items with group_index > 1, which means that the row is not the first one in the group, or in other words, it is a duplicate.

create table "post" (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL
);

create table "user" (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
)

create table "post_like" (
  id SERIAL PRIMARY KEY,
  post_id INTEGER NOT NULL REFERENCES post(id),
  user_id INTEGER NOT NULL REFERENCES user(id)
)

select 
  post_id, 
  user_id,
  count(*)
from post_like
group by post_id, user_id
having count(*) > 2
;

We need to remove only this row, with id 4.

Remove duplicates - dry run

Now rewite the final query so that we read from post_like table and not anymore from the cte duplicates_info.
We still use the cte duplicates_info to get the id of the duplicates.

| post_id | user_id | count |
| ------- | ------- | ----- |
| 3       | 2       | 2     |

We will see the records that we want to remove.

After we checked that they are correct, we swap select with delete.

 with 
 <cte_name> as (
   <query>
 ),
 <cte_name_2> as (
   <query_2> -- here we can refernce <cte_name>
 )
 <final_query> -- here we can refernce <cte_name> and <cte_name_2>

This last query is what we finally want to execute.

But becuase we still have rollback statement, these chhanges are simulated, and not applied to the database.

Remove duplicates - real run

Finally we can remove the duplicates for real.
Here we use commit instead of rollback, so that the changes are applied to the database.

begin; -- start transaction

with
duplicates_info as (
  select
    row_number() over (
      partition by post_id, user_id order by user_id
    ) as group_index,
    id,
    post_id,
    user_id
  from post_like
)
select *
from duplicates_info
;

rollback; -- ends transaction discarding every changes to the database

Final Code

| group_index | id | post_id | user_id |
| ----------- | -- | ------- | ------- |
| 1           | 1  | 1       | 1       |
| 1           | 2  | 2       | 2       |
| 1           | 3  | 3       | 2       |
| 2           | 4  | 3       | 2       |

Conclusion

I write articles mainly to help future myself or to help the growth of tools I use in my work.

If this article was helpful to you leave a like.

Would you like me to talk about a particular topic?

Tell me in the comments !

The above is the detailed content of How to remove duplicates in Postgres SQL. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Guide: Stellar Blade Save File Location/Save File Lost/Not Saving

4 weeks ago By DDD

Oguri Cap Build Guide | A Pretty Derby Musume

2 weeks ago By Jack chen

Agnes Tachyon Build Guide | A Pretty Derby Musume

1 weeks ago By Jack chen

Dune: Awakening - Advanced Planetologist Quest Walkthrough

4 weeks ago By Jack chen

Date Everything: Dirk And Harper Relationship Guide

4 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

8637

Java Tutorial

1783

CakePHP Tutorial

1728

Laravel Tutorial

1577

PHP Tutorial

1442

Related knowledge

Java vs. JavaScript: Clearing Up the Confusion Jun 20, 2025 am 12:27 AM

Java and JavaScript are different programming languages, each suitable for different application scenarios. Java is used for large enterprise and mobile application development, while JavaScript is mainly used for web page development.

Javascript Comments: short explanation Jun 19, 2025 am 12:40 AM

JavaScriptcommentsareessentialformaintaining,reading,andguidingcodeexecution.1)Single-linecommentsareusedforquickexplanations.2)Multi-linecommentsexplaincomplexlogicorprovidedetaileddocumentation.3)Inlinecommentsclarifyspecificpartsofcode.Bestpractic

How to work with dates and times in js? Jul 01, 2025 am 01:27 AM

The following points should be noted when processing dates and time in JavaScript: 1. There are many ways to create Date objects. It is recommended to use ISO format strings to ensure compatibility; 2. Get and set time information can be obtained and set methods, and note that the month starts from 0; 3. Manually formatting dates requires strings, and third-party libraries can also be used; 4. It is recommended to use libraries that support time zones, such as Luxon. Mastering these key points can effectively avoid common mistakes.

Why should you place tags at the bottom of the ? Jul 02, 2025 am 01:22 AM

PlacingtagsatthebottomofablogpostorwebpageservespracticalpurposesforSEO,userexperience,anddesign.1.IthelpswithSEObyallowingsearchenginestoaccesskeyword-relevanttagswithoutclutteringthemaincontent.2.Itimprovesuserexperiencebykeepingthefocusonthearticl

JavaScript vs. Java: A Comprehensive Comparison for Developers Jun 20, 2025 am 12:21 AM

JavaScriptispreferredforwebdevelopment,whileJavaisbetterforlarge-scalebackendsystemsandAndroidapps.1)JavaScriptexcelsincreatinginteractivewebexperienceswithitsdynamicnatureandDOMmanipulation.2)Javaoffersstrongtypingandobject-orientedfeatures,idealfor

What is event bubbling and capturing in the DOM? Jul 02, 2025 am 01:19 AM

Event capture and bubble are two stages of event propagation in DOM. Capture is from the top layer to the target element, and bubble is from the target element to the top layer. 1. Event capture is implemented by setting the useCapture parameter of addEventListener to true; 2. Event bubble is the default behavior, useCapture is set to false or omitted; 3. Event propagation can be used to prevent event propagation; 4. Event bubbling supports event delegation to improve dynamic content processing efficiency; 5. Capture can be used to intercept events in advance, such as logging or error processing. Understanding these two phases helps to accurately control the timing and how JavaScript responds to user operations.

JavaScript: Exploring Data Types for Efficient Coding Jun 20, 2025 am 12:46 AM

JavaScripthassevenfundamentaldatatypes:number,string,boolean,undefined,null,object,andsymbol.1)Numbersuseadouble-precisionformat,usefulforwidevaluerangesbutbecautiouswithfloating-pointarithmetic.2)Stringsareimmutable,useefficientconcatenationmethodsf

What's the Difference Between Java and JavaScript? Jun 17, 2025 am 09:17 AM

Java and JavaScript are different programming languages. 1.Java is a statically typed and compiled language, suitable for enterprise applications and large systems. 2. JavaScript is a dynamic type and interpreted language, mainly used for web interaction and front-end development.

See all articles

国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

How to remove duplicates in Postgres SQL

Our schema

Check if there are duplicates

Remove duplicates

Final Code

Conclusion

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics