国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home Backend Development Golang How I processed over Entries per request - With Go!

How I processed over Entries per request - With Go!

Nov 05, 2024 am 04:45 AM

How I processed over Entries per request - With Go!

Before we start, a bit of context of who I am and why it matters in this case. I'm a software developer at Notebook Manufacturing Company, working here for the past two years. In the team I'm at right now, I'm the sole developer, responsible for creating, monitoring and maintaining data pipelines, automations, and a monitoring system with Go and Grafana.
Another point that I may add, in the previous year, I was a intern. And I'm self taught.

Ok, but why it matters?

Well, I had no senior developer, or any form of guidance from inside that could guide me when I was facing roadblocks or problems I could not figure out on my own, and that's the main reason I'm writting. I think it was a fun thing to do and want to share. It won't be a amazing piece of software, a breakthrough or anything like that, but a reminder that it's possible to build things from ground up. You don't need to be the one in one million 10x developer or something like that.

P.S: I use neovim btw.

One Million Lines

This number may look like a exageration or something like that, and, although I wish it was the case, it isn't. See, when working in the manufactory context, sometimes we don't take into account how many items are needed, and how much data it generates when you gotta keep track of every point of the many manufacturing queues, for each component.
Be it a little screw, be it a last gen CPU, it must be tracked, with timestamps!
So, yes, the amount of data generated is correct, and that's just the start. Here are the other cool points that make the problem cool (in my perspective):

  • The source of this data wasn't compressed or anything like that, so the average request time was close to 2 to 3 minutes on a good day. It could go over 5 minutes if the source was lagging (It's a older than me java system.).
  • About the format, I could choose between dealing with a XML response or a CSV response, of course I took the CSV route. I'm no masochist.
  • And there was a need to choose the interval of the data fetching, with a minimum of one hour (No, I don't know why, yes, I've tried minor intervals).
  • The source had no lazy loading or cache system, so two equal request would generate the same response, with different times.
  • And finally, to spicy things up, it's a Microsoft thing, known as SQL Server Reporting Services, with it's own obscure caveats. Oh, and the database I'm required to use is MSSQL, which is a pain.

That's the whole context, and now the fun part begin.

The first iteration - DMP

The way I like to approach software is pretty simple. Make it ugly and barely functional -> Keep it ugly and improve functionality -> Still ugly, but better optimized -> Pretty, Optimized and Working -> And then you manager says that the work you did is way too good and the source can't handle it, causing outages. FML.

Anyway, DMP stands for Dumb Mode Protocol, just make it work in the dumbest way you can, which means doing it to barely work.

So, for the first round of it, my goal was simple, authenticate, make the request, parse the data, send to the database. Pretty simple, right? And, in paper, it was, until I discovered that the authentication and authorization method I had to used was ntlmssp, which is a challenge-response auth method that I didn't knew existed. Matter of fact, I had to go into legacy .NET 4 code to find it. I never did C#.
After going through the legacy code older than me, suffering trying to understand it because, for some reason, who wrote it thought it was a good idea to hide it into 5 layers of abstraction, constructors and OOP things. Not fun, humbling experience. And then I made it work. A hour later I got my user blocked because, apparently, the source has a rate limiting. BUT NO CACHE OR LAZY LOADING.

Well, after all of that, I just need to pass the query paremeters and get the data, no more complications with the source. Ok, let's look up the documentation for the query parameters.

...

At this point, all things considered, you probably guessed it right. Documentation? Is this some kind of exotic food that is only served in the Silicon Valley?
Anyway, after racking my brain out, I decided to inspect the site/interface of this SQL Server Reporting Services, and, to my joy, and then hatred, it had a way to know the filters and it's values.

Only to know that the ("All") filter in the page, to select, let's say, all the manufacturing lines, is just a abstraction of having all the boxes checked. Ok, let's copy the filters and put it into the querystring, encode it and be happy!

IT WORKED! Or so I thought.

Remember when I said that my user got blocked? Well, looks like having the admin permissions to perform such tasks, being authorized by C-Suite to do it (In fact, it was requested bt them), wasn't enough to allow me to do multiple requests. My user got blocked, but, to my manager it was the same as saying "Nothing happened". Fortunately, I managed to get it unblocked fairly quickly.

In the mean time, I decided to work on the way I would approach the rest of this project. At this point I already had a sample of that, and it lived up to the title of the post. Seeing the seven digits made me question myself if I was able to do it, since I had zero experience with this data volume.

To get my ideas in shape, I hoped into excalidraw and designed what I wanted to do. I knew the concept of worker pools but hadn't implemented it before. So, I read about it, saw few implementations and asked a friend of mine who helped me a lot. He's the senior I didn't had.

After writting it out in pseudo code, I thought to myself:

"Wow, this is pretty neat, it surely won't generate memory leaks with all those channels and goroutines, right?"

Ok, it might not have been those exact words, but it was along those lines. After doing the first iteration of it, managing to do the request (without getting blocked), and loading it into memory, I applied the worker pool concept.

And then I faced the BSOD. Funnily enough, it was at the same day of the CrowdStrike strike, I surely didn't think I caused major outages at the manufactury.
Also yes, I gotta use windows to work, but fret not! I use WSL2.

After going through the enourmous stack track of my first stack overflow, I discovered the mistake. When sending the data to the consumer channel, I didn't took into account that some of the data could error, mainly due violation of primary key or simply because, maybe, I wasn't properly handling errors.

Lesson learned, use a error channel to avoid major problems. And handle your errors, be it via a simple string check (ugly, but works), or simply top level log it, and be happy. Since the errors I faced weren't critical, I could keep going.

The second iteration. Memory leaks.

The amount of memory leaks I generated in this step made me think I was doing C. But it was just major skill issues.
Anyway, you may be thinking to yourself:

"How did you created memory leaks in such a simple process?"

It's simple, I was learning and trying.

The main problem during this step was how I, naively, didn't made sure that the data was being properly inserted, and the amount of violation of primary keys I was getting was violating my memory. This was a problem I knew how to solve, let's cache the data!
Wait how do I create a Unique Identifier for each row, considering each one of them is unique?

Now, that's the part many will laugh at me, because, to be fair, that's the part that cracks me the most. I simply joined the current row of information and parsed it into a hash function, and that hash became my keys in the map.

"Why not Redis?" - You may be asking yourself.

It's simple, bureaucracy. It's not a small company. Matter of fact, many of you are, probably, using a laptop that they manufactured. The simple act of requesting a Redis instance would take, at least, three work days, four meetings, a sacrifice for the bureaucracy god, and a complete document explaining why and how.
Sooo, yeah, let's go with a simple hash map and pre-initialize it before the first run. It will add to the overall loading time but it will be quicker than the request.

By doing that, the overall process improved like it had a new motor, the memleaks stopped, the batches weren't failing every time, the amount of errors and disconnections decreased as well, pretty nice, right? Right?

By now, you know something got jumbled.

The third iteration. Life could be good.

One point I did not take into account was the fact that, by doing batch inserts, was the validation of data. Here's a simple representation of the flow.

Fetch data -> Check if the the hash of the data exists in the hashmap -> Batch & Insert

And what is the problem with that? Well, what happens if a single insert in my batch fails, would it retry without the entry? If so, how much retries can I have without cluttering the system and losing the advantages of the worker pool implementation?
Only one way to find out! Let's check it.
One point I may add is the fact this source returned over 25 columns, so I had to be careful regarding the amount of data I was inserting per batch to not go over 2100 parameters, which is MSSQL limit.

At this point, I was already running things in a docker container that mimics the production space with limited resources. To add context, this process was running with 1GB of RAM and about 0.5CPU. I could've allocated more resources, but that would simply be brute forcing my way out.
By running this new iteration inside the container, adding some timestamps and logging it out to a file to later analyze. I discovered a increase of about 5 minutes due the amount of retries. This wouldn't work, removing the 'dirty' entry wasn't a option.

The fourth iteration. Life IS good.

To solve this issue, I increased the amount of workers. I was using about 50 workers or so, and thanks to a random guess of a discord user in ThePrimagen top-shelf, I increased it to 1000 workers, and made it so that each worker validated each entry in a transaction. Case the transaction failed, I simply rolled it back.
By doing that, I was able to solve the core issue and improve the overall speed of this process in general. Now it was time to put it into prod and monitor it, because, you know, the prod goblins may jumble your software. (They're also known as skill issues, but this name is forbidden.)

Knowing that decreasing the fetch interval of this system, where it was requested to make it near real time (which means fast enough for them to not notice delays), I created a new container, this time with a little more robust to check if it would be able to hold the load, and set the interval to 3 minutes. Considering the mean time to fetch, it would mean that I could have some overlaps, but I really wanted to see what would happen.

I let it run overnight, logging the results to check it out later, and, to my surprise, before the workday ended, I received a call from my manager. Mind you, he's not a tech person or something like that.

"Hey, have 'we' deployed something that interacts with the [System Name I Can't Disclose]?"

"Yeah, I deployed the data fetching system in real time that was requested. Why?"

"Can you, hm, stop it? It caused a outage and we can't work here."_

This, gadies and lentleman, is whole another level of breaking prod. I, literally, stopped over 20 production lines for a minute or so. Anyway, I stopped the system and everything went back to normal.
The next day, I was asked to increase the interval between fetches to 30 minutes instead of 3, ok, fine I guess. I won't lie, I was a bit sad that I wouldn't be able to see it performing at max speed, but hey, at least I made it work.

With this new system the average update time of the reports that used this data decreased to 8~10 seconds, which caused a lot more of reports of the same source to be requested by managers in the same way. Good work is rewarded with more work!

Considerations

It was a fun experience, mainly due the fact it made me truly realize how powerful Go really is. Using less memory than google chrome, less storage than a Microsoft app and less CPU power than windows calculator, I was able to improve a old process that literally brute forced it's way through (it literally checked each line in the database before inserting. I don't know how the previous person thought it would be a good idea.). It was fun, really.

Anyway, feel free to share your thoughts in this whole process, how would you approach and what would you've done differently. Since I have no dev co-workers, I want to know more views on that.

The above is the detailed content of How I processed over Entries per request - With Go!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the implications of Go's static linking by default? What are the implications of Go's static linking by default? Jun 19, 2025 am 01:08 AM

Go compiles the program into a standalone binary by default, the main reason is static linking. 1. Simpler deployment: no additional installation of dependency libraries, can be run directly across Linux distributions; 2. Larger binary size: Including all dependencies causes file size to increase, but can be optimized through building flags or compression tools; 3. Higher predictability and security: avoid risks brought about by changes in external library versions and enhance stability; 4. Limited operation flexibility: cannot hot update of shared libraries, and recompile and deployment are required to fix dependency vulnerabilities. These features make Go suitable for CLI tools, microservices and other scenarios, but trade-offs are needed in environments where storage is restricted or relies on centralized management.

How does Go ensure memory safety without manual memory management like in C? How does Go ensure memory safety without manual memory management like in C? Jun 19, 2025 am 01:11 AM

Goensuresmemorysafetywithoutmanualmanagementthroughautomaticgarbagecollection,nopointerarithmetic,safeconcurrency,andruntimechecks.First,Go’sgarbagecollectorautomaticallyreclaimsunusedmemory,preventingleaksanddanglingpointers.Second,itdisallowspointe

How do I create a buffered channel in Go? (e.g., make(chan int, 10)) How do I create a buffered channel in Go? (e.g., make(chan int, 10)) Jun 20, 2025 am 01:07 AM

To create a buffer channel in Go, just specify the capacity parameters in the make function. The buffer channel allows the sending operation to temporarily store data when there is no receiver, as long as the specified capacity is not exceeded. For example, ch:=make(chanint,10) creates a buffer channel that can store up to 10 integer values; unlike unbuffered channels, data will not be blocked immediately when sending, but the data will be temporarily stored in the buffer until it is taken away by the receiver; when using it, please note: 1. The capacity setting should be reasonable to avoid memory waste or frequent blocking; 2. The buffer needs to prevent memory problems from being accumulated indefinitely in the buffer; 3. The signal can be passed by the chanstruct{} type to save resources; common scenarios include controlling the number of concurrency, producer-consumer models and differentiation

How can you use Go for system programming tasks? How can you use Go for system programming tasks? Jun 19, 2025 am 01:10 AM

Go is ideal for system programming because it combines the performance of compiled languages ??such as C with the ease of use and security of modern languages. 1. In terms of file and directory operations, Go's os package supports creation, deletion, renaming and checking whether files and directories exist. Use os.ReadFile to read the entire file in one line of code, which is suitable for writing backup scripts or log processing tools; 2. In terms of process management, the exec.Command function of the os/exec package can execute external commands, capture output, set environment variables, redirect input and output flows, and control process life cycles, which are suitable for automation tools and deployment scripts; 3. In terms of network and concurrency, the net package supports TCP/UDP programming, DNS query and original sets.

How do I call a method on a struct instance in Go? How do I call a method on a struct instance in Go? Jun 24, 2025 pm 03:17 PM

In Go language, calling a structure method requires first defining the structure and the method that binds the receiver, and accessing it using a point number. After defining the structure Rectangle, the method can be declared through the value receiver or the pointer receiver; 1. Use the value receiver such as func(rRectangle)Area()int and directly call it through rect.Area(); 2. If you need to modify the structure, use the pointer receiver such as func(r*Rectangle)SetWidth(...), and Go will automatically handle the conversion of pointers and values; 3. When embedding the structure, the method of embedded structure will be improved, and it can be called directly through the outer structure; 4. Go does not need to force use getter/setter,

What are interfaces in Go, and how do I define them? What are interfaces in Go, and how do I define them? Jun 22, 2025 pm 03:41 PM

In Go, an interface is a type that defines behavior without specifying implementation. An interface consists of method signatures, and any type that implements these methods automatically satisfy the interface. For example, if you define a Speaker interface that contains the Speak() method, all types that implement the method can be considered Speaker. Interfaces are suitable for writing common functions, abstract implementation details, and using mock objects in testing. Defining an interface uses the interface keyword and lists method signatures, without explicitly declaring the type to implement the interface. Common use cases include logs, formatting, abstractions of different databases or services, and notification systems. For example, both Dog and Robot types can implement Speak methods and pass them to the same Anno

How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) How do I use string functions from the strings package in Go? (e.g., len(), strings.Contains(), strings.Index(), strings.ReplaceAll()) Jun 20, 2025 am 01:06 AM

In Go language, string operations are mainly implemented through strings package and built-in functions. 1.strings.Contains() is used to determine whether a string contains a substring and returns a Boolean value; 2.strings.Index() can find the location where the substring appears for the first time, and if it does not exist, it returns -1; 3.strings.ReplaceAll() can replace all matching substrings, and can also control the number of replacements through strings.Replace(); 4.len() function is used to obtain the length of the bytes of the string, but when processing Unicode, you need to pay attention to the difference between characters and bytes. These functions are often used in scenarios such as data filtering, text parsing, and string processing.

How do I use the io package to work with input and output streams in Go? How do I use the io package to work with input and output streams in Go? Jun 20, 2025 am 11:25 AM

TheGoiopackageprovidesinterfaceslikeReaderandWritertohandleI/Ooperationsuniformlyacrosssources.1.io.Reader'sReadmethodenablesreadingfromvarioussourcessuchasfilesorHTTPresponses.2.io.Writer'sWritemethodfacilitateswritingtodestinationslikestandardoutpu

See all articles