国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Table of Contents
Quickly deduplicate with the command line
Keep the original order to deduplicate
How to handle large files more efficiently?
A few tips
Home System Tutorial LINUX How to remove duplicate lines from a file?

How to remove duplicate lines from a file?

Jul 15, 2025 am 01:25 AM
File processing Delete duplicate lines

When deduplication of duplicate lines in files, you need to pay attention to key points such as retaining order and processing large files. 1. Use sort and uniq combination to quickly deduplicate, but it will disrupt the original order; 2. If the original order is to be preserved, it can be implemented by the awk command; 3. When processing large files, you can use chunking processing, database import or memory optimization scripts; 4. Python scripts are suitable for medium-sized files and support more custom details; 5. It is recommended to back up the file and check the impact of hidden characters before deduplication. Just choose the appropriate method according to the specific needs.

How to remove duplicate lines from a file?

It is actually not difficult to deduplicate duplicate lines in the file, but you have to pay attention to a few key points. The most direct way is to use command line tools, such as sort and uniq combinations on Linux or macOS, or write a simple script to process. If you just want to quickly remove the exact same repeat lines, it doesn't have to be too complicated; but if you want to consider order retention, large file processing or partial matching, you have to choose the right method.


Quickly deduplicate with the command line

This is the most common and fastest way to suit most text files. The basic idea is to sort first and then merge duplicates:

 sort filename.txt | uniq > output.txt
  • sort is to alphabetically sort the contents so that the same guilds will be next to each other.
  • uniq is responsible for merging adjacent repeat rows into one row.
  • The output result is saved to output.txt , and the original file remains unchanged.

Note: This method will disrupt the original order. If you want to preserve the original order, you can't use it directly.


Keep the original order to deduplicate

If you want to keep each line that appears in the first place and remove the repeated ones afterwards, you can use awk :

 awk '!seen[$0] ' filename.txt > output.txt

The meaning of this command is:

  • Every time a line is read, it is recorded ( seen[$0] ).
  • If this line has not appeared ( !seen[$0] ), output and count up one.
  • The subsequent repeated lines will not be output.

This method is very practical, especially when log and list data need to be kept in order.


How to handle large files more efficiently?

If the file is particularly large (such as a few hundred MB or over GB), it may be unrealistic to load it into memory at once. You can consider this at this time:

  • Block processing: First split the file into multiple small files, deduplicate it separately and then merge it.
  • Use database: Import to a lightweight database like SQLite and use DISTINCT to deduplicate it.
  • Memory optimization script: Use Python generator to read line by line to avoid loading all content at once.

Python example (for medium-sized files):

 see = set()
with open('output.txt', 'w') as out_file:
    with open('filename.txt', 'r') as in_file:
        for line in in_file:
            if line not in see:
                seen.add(line)
                out_file.write(line)

Although this method is a little slower, it can control more details, such as ignoring blank lines, comparing case insensitive comparisons, etc.


A few tips

  • It is best to back up the original file before removing the heavy load.
  • Make sure there are no hidden characters that affect judgment, such as end-of-line spaces and line break differences.
  • If you are not sure if it is really duplicated, you can first use diff to compare the original file with the deduplication file.

Basically all is it, the method is not complicated but the details are easy to ignore. Just choose the right method according to your specific needs.

The above is the detailed content of How to remove duplicate lines from a file?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

File Uploading and Processing in Laravel: Managing User Uploaded Files File Uploading and Processing in Laravel: Managing User Uploaded Files Aug 13, 2023 pm 06:45 PM

File Uploading and Processing in Laravel: Managing User Uploaded Files Introduction: File uploading is a very common functional requirement in modern web applications. In the Laravel framework, file uploading and processing becomes very simple and efficient. This article will introduce how to manage user-uploaded files in Laravel, including verification, storage, processing, and display of file uploads. 1. File upload File upload refers to uploading files from the client to the server. In Laravel, file uploads are very easy to handle. first,

Getting started with PHP file processing: step-by-step guide to reading and writing Getting started with PHP file processing: step-by-step guide to reading and writing Sep 06, 2023 am 09:58 AM

Getting started with PHP file processing: Step-by-step guide for reading and writing In web development, file processing is a common task, whether it is reading files uploaded by users or writing the results to files for subsequent use. Understand how to use PHP Document processing is very important. This article will provide a simple guide to introduce the basic steps of reading and writing files in PHP, and attach code examples for reference. File reading in PHP, you can use the fopen() function to open a file and return a file resource (file

Read last line of file in PHP Read last line of file in PHP Aug 27, 2023 pm 10:09 PM

To read the last line of a file from PHP, the code is as follows -$line='';$f=fopen('data.txt','r');$cursor=-1;fseek($f,$cursor, SEEK_END);$char=fgetc($f);//Trimtrailingnewlinecharactersinthefilewhile($char===""||$char==="\r"){&

PHP file handling: Allow writing in English but not Chinese? PHP file handling: Allow writing in English but not Chinese? Mar 07, 2024 am 08:30 AM

Title: PHP file processing: English writing is allowed but Chinese characters are not supported. When using PHP for file processing, sometimes we need to restrict the content in the file to only allow writing in English and not support Chinese characters. This requirement may be to maintain file encoding consistency, or to avoid garbled characters caused by Chinese characters. This article will introduce how to use PHP for file writing operations, ensure that only English content is allowed to be written, and provide specific code examples. First of all, we need to be clear that PHP itself does not actively limit

How to handle file upload in PHP? How to handle file upload in PHP? May 11, 2023 pm 10:31 PM

With the continuous development of Internet technology, the file upload function has become an essential part of many websites. In the PHP language, we can handle file uploads through some class libraries and functions. This article will focus on the file upload processing method in PHP. 1. Form settings In the HTML form, we need to set the enctype attribute to "multipart/form-data" to support file upload. The code is as follows: <formaction="upload.

Unlock Linux file processing tips for decompressing gz format files Unlock Linux file processing tips for decompressing gz format files Feb 24, 2024 pm 09:12 PM

Linux file processing skills: Master the trick of decompressing gz format files. In Linux systems, you often encounter files compressed using gz (Gzip) format. This file format is very common in network transmission and file storage. If we want to process these .gz format files, we need to learn how to decompress them. This article will introduce several methods of decompressing .gz files and provide specific code examples to help readers master this technique. Method 1: Use the gzip command to decompress in Linux systems, the most common

How to use PHP to find and replace files on an FTP server How to use PHP to find and replace files on an FTP server Jul 28, 2023 pm 08:02 PM

Introduction to how to use PHP to find and replace files on the FTP server: In the process of website maintenance and updates, we often need to find and replace files on the FTP server. Using PHP language can help us realize this function, simplify the operation process and improve efficiency. This article will introduce how to use PHP to find and replace files on an FTP server, and provide corresponding code examples. Step 1: Connect to the FTP server First, we need to connect to the FTP server. Use PH

Understanding file streams in C++ Understanding file streams in C++ Aug 21, 2023 pm 11:15 PM

The file stream in C++ is a convenient data input and output method. Data in the file can be read and written through the file stream. In C++, file streams mainly involve the iostream library and the fstream library. The iostream library is mainly responsible for console input and output, while the fstream library is responsible for file input and output. The fstream library is included within the iostream library, so we only need to include the header file <iostream> or <fstream

See all articles