国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Backend Development Golang Large CSV Processing Using Go

Large CSV Processing Using Go

Nov 27, 2024 am 12:54 AM

The idea is:

Given a large dummy CSV (1 million rows) contains sample of customer data and do processing with goals below:

  • Extract the data from the CSV
  • Calculate how many data / rows
  • Grouping how many customers for each city
  • Sort cities by customers count from highest to lowest
  • Calculate processing time

Sample CSV of the customers can be downloaded here https://github.com/datablist/sample-csv-files

Load And Extract Data

Apparently Go has standard lib for CSV processing. We don't need 3rd party dependency to solve our problem anymore which is nice. So the solution is pretty straightforward:

  // open the file to a reader interface
  c, err := os.Open("../data/customers-1000000.csv")
  if err != nil {
    log.Fatal(err)
  }
  defer c.Close()

  // load file reader into csv reader
  // Need to set FieldsPerRecord to -1 to skip fields checking
  r := csv.NewReader(c)
  r.FieldsPerRecord = -1
  r.ReuseRecord = true
  records, err := r.ReadAll()
  if err != nil {
    log.Fatal(err)
  }
  1. Open the file from the given path
  2. Load opened file to csv reader
  3. Holds all extracted csv records / rows value into records slice for later processing

FieldsPerRecord is set to -1 because I want to skip fields checking on the row since fields or column count could be different in each format

At this state we already able to load and extract all the data from csv and ready to next processing state. We also will able to know how many rows in CSV by using function len(records).

Grouping Total Customer to Each City

Now we are able to iterate the records and create the map contains city name and total customer looks like this:

["Jakarta": 10, "Bandung": 200, ...]

City data in csv row is located in 7th index and the code will look like this

  // create hashmap to populate city with total customers based on the csv data rows
  // hashmap will looks like be ["city name": 100, ...]
  m := map[string]int{}
  for i, record := range records {
    // skip header row
    if i == 0 {
    continue
    }
    if _, found := m[record[6]]; found {
      m[record[6]]++
    } else {
      m[record[6]] = 1
    }
  }

If the city map is not exists, create new map and set the customer total as 1. Otherwise just increment the total number of given city.

Now we have map m contains collection of city and how many customer inside it. At this point we already solved problem of grouping how many customer for each city.

Sorting Highest Total Customer

I tried to find is there any function in standard lib to sort the map but unfortunately I couldn't find it. Sorting only possible for slice because we can rearrange the data order based on the index position. So yeah, let's make a slice from our current map.

// convert to slice first for sorting purposes
dc := []CityDistribution{}
for k, v := range m {
  dc = append(dc, CityDistribution{City: k, CustomerCount: v})
}

Now how we sorted it by the CustomerCount from highest to lowest? The most common algorithm for this is using bubble short. Although it's not the fastest but it could do the job.

Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity is quite high.

Reference: https://www.geeksforgeeks.org/bubble-sort-algorithm/

Using our slice, it will loop over the data and check the next value of the index and swap it if current data is less than next index. You can check the detail algorithm on the reference website.

Now our sorting process could be like this

  // open the file to a reader interface
  c, err := os.Open("../data/customers-1000000.csv")
  if err != nil {
    log.Fatal(err)
  }
  defer c.Close()

  // load file reader into csv reader
  // Need to set FieldsPerRecord to -1 to skip fields checking
  r := csv.NewReader(c)
  r.FieldsPerRecord = -1
  r.ReuseRecord = true
  records, err := r.ReadAll()
  if err != nil {
    log.Fatal(err)
  }

By the end of the loop, the final slice will give us a sorted data.

Calculate Processing Time

Calculate processing time is quite simple, we get timestamp before & after executing the main process of the program and calculate the difference. In Go the approach should be simple enough:

["Jakarta": 10, "Bandung": 200, ...]

The Result

Run the program with command

  // create hashmap to populate city with total customers based on the csv data rows
  // hashmap will looks like be ["city name": 100, ...]
  m := map[string]int{}
  for i, record := range records {
    // skip header row
    if i == 0 {
    continue
    }
    if _, found := m[record[6]]; found {
      m[record[6]]++
    } else {
      m[record[6]] = 1
    }
  }

The printed out would be rows count, sorted data, and processing time. Something like this below:

Large CSV Processing Using Go

As expected of Go performance, it handled 1 million rows csv under 1 second!

All the completed codes already publish on my Github Repository:

https://github.com/didikz/csv-processing/tree/main/golang

Lesson Learned

  • CSV processing in Go is already available in standard lib, no need to use 3rd party lib
  • Processing the data is quite easy. The challenge was to find out how to sort the data because need to do manually

What's Come in Mind?

I was thinking my current solution might can be optimized further because I looped all the records extracted csv to map and if we checked at ReadAll() source, it also have loop to create the slice based on the given file reader. By this, 1 Mil rows could produce 2 x loops for 1 Mil data which is not nice.

I thought if I could read data directly from File reader it only needs 1 loop because I could create map directly from it. Except the records slice will be used elsewhere but not in this case.

I still have no time to figure it out yet, but I also thought some downside if I will do it manually:

  • Probably need handle more errors of the parsing process
  • I am not sure how significant it will reduce the processing time to consider the workaround will be worth it or not

Happy Coding!

The above is the detailed content of Large CSV Processing Using Go. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the implications of Go's static linking by default? What are the implications of Go's static linking by default? Jun 19, 2025 am 01:08 AM

Go compiles the program into a standalone binary by default, the main reason is static linking. 1. Simpler deployment: no additional installation of dependency libraries, can be run directly across Linux distributions; 2. Larger binary size: Including all dependencies causes file size to increase, but can be optimized through building flags or compression tools; 3. Higher predictability and security: avoid risks brought about by changes in external library versions and enhance stability; 4. Limited operation flexibility: cannot hot update of shared libraries, and recompile and deployment are required to fix dependency vulnerabilities. These features make Go suitable for CLI tools, microservices and other scenarios, but trade-offs are needed in environments where storage is restricted or relies on centralized management.

How do I create a buffered channel in Go? (e.g., make(chan int, 10)) How do I create a buffered channel in Go? (e.g., make(chan int, 10)) Jun 20, 2025 am 01:07 AM

To create a buffer channel in Go, just specify the capacity parameters in the make function. The buffer channel allows the sending operation to temporarily store data when there is no receiver, as long as the specified capacity is not exceeded. For example, ch:=make(chanint,10) creates a buffer channel that can store up to 10 integer values; unlike unbuffered channels, data will not be blocked immediately when sending, but the data will be temporarily stored in the buffer until it is taken away by the receiver; when using it, please note: 1. The capacity setting should be reasonable to avoid memory waste or frequent blocking; 2. The buffer needs to prevent memory problems from being accumulated indefinitely in the buffer; 3. The signal can be passed by the chanstruct{} type to save resources; common scenarios include controlling the number of concurrency, producer-consumer models and differentiation

How does Go ensure memory safety without manual memory management like in C? How does Go ensure memory safety without manual memory management like in C? Jun 19, 2025 am 01:11 AM

Goensuresmemorysafetywithoutmanualmanagementthroughautomaticgarbagecollection,nopointerarithmetic,safeconcurrency,andruntimechecks.First,Go’sgarbagecollectorautomaticallyreclaimsunusedmemory,preventingleaksanddanglingpointers.Second,itdisallowspointe

How can you use Go for system programming tasks? How can you use Go for system programming tasks? Jun 19, 2025 am 01:10 AM

Go is ideal for system programming because it combines the performance of compiled languages ??such as C with the ease of use and security of modern languages. 1. In terms of file and directory operations, Go's os package supports creation, deletion, renaming and checking whether files and directories exist. Use os.ReadFile to read the entire file in one line of code, which is suitable for writing backup scripts or log processing tools; 2. In terms of process management, the exec.Command function of the os/exec package can execute external commands, capture output, set environment variables, redirect input and output flows, and control process life cycles, which are suitable for automation tools and deployment scripts; 3. In terms of network and concurrency, the net package supports TCP/UDP programming, DNS query and original sets.

How do I call a method on a struct instance in Go? How do I call a method on a struct instance in Go? Jun 24, 2025 pm 03:17 PM

In Go language, calling a structure method requires first defining the structure and the method that binds the receiver, and accessing it using a point number. After defining the structure Rectangle, the method can be declared through the value receiver or the pointer receiver; 1. Use the value receiver such as func(rRectangle)Area()int and directly call it through rect.Area(); 2. If you need to modify the structure, use the pointer receiver such as func(r*Rectangle)SetWidth(...), and Go will automatically handle the conversion of pointers and values; 3. When embedding the structure, the method of embedded structure will be improved, and it can be called directly through the outer structure; 4. Go does not need to force use getter/setter,

What are interfaces in Go, and how do I define them? What are interfaces in Go, and how do I define them? Jun 22, 2025 pm 03:41 PM

In Go, an interface is a type that defines behavior without specifying implementation. An interface consists of method signatures, and any type that implements these methods automatically satisfy the interface. For example, if you define a Speaker interface that contains the Speak() method, all types that implement the method can be considered Speaker. Interfaces are suitable for writing common functions, abstract implementation details, and using mock objects in testing. Defining an interface uses the interface keyword and lists method signatures, without explicitly declaring the type to implement the interface. Common use cases include logs, formatting, abstractions of different databases or services, and notification systems. For example, both Dog and Robot types can implement Speak methods and pass them to the same Anno

How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) Jun 20, 2025 am 01:06 AM

In Go language, string operations are mainly implemented through strings package and built-in functions. 1.strings.Contains() is used to determine whether a string contains a substring and returns a Boolean value; 2.strings.Index() can find the location where the substring appears for the first time, and if it does not exist, it returns -1; 3.strings.ReplaceAll() can replace all matching substrings, and can also control the number of replacements through strings.Replace(); 4.len() function is used to obtain the length of the bytes of the string, but when processing Unicode, you need to pay attention to the difference between characters and bytes. These functions are often used in scenarios such as data filtering, text parsing, and string processing.

How do I use the io package to work with input and output streams in Go? How do I use the io package to work with input and output streams in Go? Jun 20, 2025 am 11:25 AM

TheGoiopackageprovidesinterfaceslikeReaderandWritertohandleI/Ooperationsuniformlyacrosssources.1.io.Reader'sReadmethodenablesreadingfromvarioussourcessuchasfilesorHTTPresponses.2.io.Writer'sWritemethodfacilitateswritingtodestinationslikestandardoutpu

See all articles