Memory Performance Boosts with Generators and Nikic/Iter
Feb 16, 2025 am 09:17 AMPHP iterator and generator: a powerful tool for efficient processing of large data sets
Arrays and iterations are the cornerstone of any application. As we get new tools, the way we use arrays should also improve.
For example, a generator is a new tool. At first we only have arrays, and then we gain the ability to define our own class array structure (called iterators). But since PHP 5.5, we can quickly create class iterator structures called generators.
The generator looks like functions, but we can use them as iterators. They provide us with a simple syntax for creating essentially interruptible, repeatable functions. They are amazing!
We will look at several areas where generators can be used and explore some issues that need to be paid attention to when using generators. Finally, we will learn a great library created by the talented Nikita Popov.
The sample code can be found at https://github.com/sitepoint-editors/generators-and-iter.
Key Points
- Generators (available since PHP 5.5) are powerful tools for creating iterators that allow the creation of interruptible, repeatable functions, simplifying processing of large datasets and improving memory performance.
- Nikita Popov creates Nikic/Iter library that introduces functions that can be used with iterators and generators, saving significantly memory by avoiding creating unnecessary intermediate arrays.
- The generator and Nikic/Iter libraries are especially useful when working with large CSV files, which can handle large data sets without loading them all into memory at once.
- While generators can significantly improve memory performance, they also present some of their own challenges, such as incompatible with
array_filter
andarray_map
, requiring other tools such as Nikic/Iter to handle such data.
Question
Suppose you have a lot of relational data and want to do some preloading. Maybe the data is comma-separated, you need to load each data type and group them together.
You can start with the following simple code:
function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );
You may then try to concatenate related elements by iterating or higher order functions:
function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 對 $author 進(jìn)行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 對 $category 進(jìn)行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 對 $post 進(jìn)行其他更改 return $post; }, $posts);
Looks good, right? So, what happens when we have a large number of CSV files to parse? Let's analyze the memory usage a little...
function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());
(The sample code contains generate.php
, which you can use to create these CSV files...)
If you have large CSV files, this code should show how much memory it takes to link these arrays together. At least the same size as the file you have to read, because PHP has to keep everything in memory.
Generator comes to rescue!
One way to improve this problem is to use a generator. If you are not familiar with them, now is a good time to learn more.
The generator allows you to load a small amount of total data at once. You don't have to do much with the generator:
function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );
If you iterate through CSV data, you will notice that the amount of memory required will be reduced immediately:
function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 對 $author 進(jìn)行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 對 $category 進(jìn)行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 對 $post 進(jìn)行其他更改 return $post; }, $posts);
If you've seen megabytes of memory usage before, you'll now see kilobytes. This is a huge improvement, but it is not without its problems.
First of all, array_filter
and array_map
do not work with generators. You must find other tools to process this type of data. Here is a tool you can try!
function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());
This library introduces some functions that can be used with iterators and generators. So how do you still get all this relevant data without saving any data in memory?
function readCSVGenerator($file) { $handle = fopen($file, "r"); while (!feof($handle)) { yield fgetcsv($handle); } fclose($handle); }
This can be simpler:
foreach (readCSVGenerator("posts.csv") as $post) { // 使用 $post 執(zhí)行某些操作 } print "memory:" . formatBytes(memory_get_peak_usage());
(Rereading each data source is inefficient every time. Consider saving smaller related data (such as authors and categories) in memory...)
Other interesting things
For Nikic's library, this is just the tip of the iceberg! Ever wanted to flatten an array (or iterator/generator)?
composer require nikic/iter
You can use functions such as slice
and take
to return slices of iterable variables:
// ... (后續(xù)代碼與原文類似,但使用iter庫函數(shù)進(jìn)行優(yōu)化,此處省略以節(jié)省篇幅) ...
When you use generators more, you may find that you don't always have to reuse them. Consider the following example:
// ... (使用iter庫函數(shù)簡化代碼,此處省略以節(jié)省篇幅) ...
If you try to run the code, you will see an exception prompting: "Cannot traverse closed generator". Each iterator function in this library has a swappable corresponding function:
// ... (使用iter\flatten和iter\toArray函數(shù)的示例代碼,此處省略以節(jié)省篇幅) ...
You can use this mapping function multiple times. You can even make your own generator rewindable:
// ... (使用iter\slice和iter\toArray函數(shù)的示例代碼,此處省略以節(jié)省篇幅) ...
What you get from it is a reusable generator!
Conclusion
For every loop operation you need to consider, the generator may be an option. They are even useful for other things. Where language features are insufficient, Nikic's library provides a large number of higher-order functions.
Are you already using the generator? Do you want to see more examples on how to implement them in your own application for some performance improvements? Please tell us!
(The FAQs part is similar to the original text, and is omitted here to save space. The FAQs part can be optionally retained or reorganized as needed.)
The above is the detailed content of Memory Performance Boosts with Generators and Nikic/Iter. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

TosecurelyhandleauthenticationandauthorizationinPHP,followthesesteps:1.Alwayshashpasswordswithpassword_hash()andverifyusingpassword_verify(),usepreparedstatementstopreventSQLinjection,andstoreuserdatain$_SESSIONafterlogin.2.Implementrole-basedaccessc

To safely handle file uploads in PHP, the core is to verify file types, rename files, and restrict permissions. 1. Use finfo_file() to check the real MIME type, and only specific types such as image/jpeg are allowed; 2. Use uniqid() to generate random file names and store them in non-Web root directory; 3. Limit file size through php.ini and HTML forms, and set directory permissions to 0755; 4. Use ClamAV to scan malware to enhance security. These steps effectively prevent security vulnerabilities and ensure that the file upload process is safe and reliable.

In PHP, the main difference between == and == is the strictness of type checking. ==Type conversion will be performed before comparison, for example, 5=="5" returns true, and ===Request that the value and type are the same before true will be returned, for example, 5==="5" returns false. In usage scenarios, === is more secure and should be used first, and == is only used when type conversion is required.

The methods of using basic mathematical operations in PHP are as follows: 1. Addition signs support integers and floating-point numbers, and can also be used for variables. String numbers will be automatically converted but not recommended to dependencies; 2. Subtraction signs use - signs, variables are the same, and type conversion is also applicable; 3. Multiplication signs use * signs, which are suitable for numbers and similar strings; 4. Division uses / signs, which need to avoid dividing by zero, and note that the result may be floating-point numbers; 5. Taking the modulus signs can be used to judge odd and even numbers, and when processing negative numbers, the remainder signs are consistent with the dividend. The key to using these operators correctly is to ensure that the data types are clear and the boundary situation is handled well.

Yes, PHP can interact with NoSQL databases like MongoDB and Redis through specific extensions or libraries. First, use the MongoDBPHP driver (installed through PECL or Composer) to create client instances and operate databases and collections, supporting insertion, query, aggregation and other operations; second, use the Predis library or phpredis extension to connect to Redis, perform key-value settings and acquisitions, and recommend phpredis for high-performance scenarios, while Predis is convenient for rapid deployment; both are suitable for production environments and are well-documented.

TostaycurrentwithPHPdevelopmentsandbestpractices,followkeynewssourceslikePHP.netandPHPWeekly,engagewithcommunitiesonforumsandconferences,keeptoolingupdatedandgraduallyadoptnewfeatures,andreadorcontributetoopensourceprojects.First,followreliablesource

PHPbecamepopularforwebdevelopmentduetoitseaseoflearning,seamlessintegrationwithHTML,widespreadhostingsupport,andalargeecosystemincludingframeworkslikeLaravelandCMSplatformslikeWordPress.Itexcelsinhandlingformsubmissions,managingusersessions,interacti

TosettherighttimezoneinPHP,usedate_default_timezone_set()functionatthestartofyourscriptwithavalididentifiersuchas'America/New_York'.1.Usedate_default_timezone_set()beforeanydate/timefunctions.2.Alternatively,configurethephp.inifilebysettingdate.timez
