PHP performance

PHP & Parallelism: A Web Powerhouse's Achilles' Heel?

PHP and the lack of parallelism

Can PHP, the language powering most of the web, truly conquer parallelism?

PHP, a behemoth in the web development world, has an undeniable dominance. Yet, a glaring omission has plagued it for years: native, seamless support for parallelism. In an era where hardware boasts multiple cores and users expect snappy responses, this limitation is increasingly conspicuous.

The Need for Speed (and Parallelism):

Past few months, I was contemplating approaches that can give a performance boost to Google’s PHP libraries. Especially the handwritten ones which are for Spanner, Cloud Storage, BigQuery, Firestore, Datastore, to name a few. I stumbled on a problem that I cannot just accept is there in the world. We all know the famous stats on how PHP powers as much as 70% of the web, but I am always amazed that it still lacks parallelism out of the box. It was so weird that I am thinking of switching teams internally within Google, rather than being optimistic about changing the state of affairs.

PHP’s Parallelism Predicament:

So, How big is the impact? See PHP was created to support websites. And PHP considers each thread of execution as a single web request. While this may have been the case for pre-historic websites, even popular browsers allow parallelism nowadays. But PHP does NOT! It has developed a whole ecosystem without parallelism. PHP’s philosophy seems to be that if there’s anything which needs parallel execution (like downloading large files in chunks in non-blocking parallel scripts), do NOT do it using PHP.

Simple goals in life:

My goal was to upgrade MultipartUploader to somehow do parallel calls. There were several approaches I considered before jumping off the ship and not doing anything in this domain:

1. PHP implementation based on curl-multi (promising)

Curl multi is already available in PHP, but requires a redesign of MultipartUploader to achieve results. Currently, Google’s libraries create a multipart guzzle stream and send it over to network requests.

I was hopeful that if I initialize several curl_multi_* handles and assigns each part of the network request to a handle, it can theoretically parallelize the uploads. But it actually just uses concurrency.

// create both cURL resources
$ch1 = curl_init();
$ch2 = curl_init();

// set URL and other appropriate options
curl_setopt($ch1, CURLOPT_URL, "http://example.com/");
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch2, CURLOPT_HEADER, 0);

//create the multiple cURL handle
$mh = curl_multi_init();

//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);

//execute the multi handle
do {
    $status = curl_multi_exec($mh, $active);
    if ($active) {
        // Wait a short time for more activity
        curl_multi_select($mh);
    }
} while ($active && $status == CURLM_OK);

2. Redesigning MultipartUploader with Fibers

PHP recently released Fibers as of PHP8.1, which offers controlled concurrency to PHP. Though the min php version currently supported by Google is php8.0, once it’s upgraded to php8.1, (already reached end-of-life EOL) I saw merits in this approach for consurrency. But once I read about it, I was disappointed with Fibers being just a lollipop and not much else to PHP developers. It’s CONCURRENT. After all these years, Fibers was released and it proved to be such a dud for my use case. PHP Fibers

3. Writing my own extension

At one point, I did really consider my own zend extension (GRPC is also a zend extension) to allow parallelism. I learnt that even though PHP extensions are written in C, it becomes a part of the php-fpm process and the code lives in the same address space. So, it seemed like adding a pthread would be magical. But I was met with surprises even here. I would have to handle all the resources across threads, which means synchronization problems, memory leaks and what not! It’s really not easy to do this. I realized I am drifting towards re-inventing pThreads, which is only available via PHP CLI.

4. Inspiration from Aws\CommandPool

I observed that Aws\CommandPool does a lot of magic to achieve my requirements. AWS libraries employ an async pool to achieve concurrency. Frankly, I was not at all happy with the state of affairs even with AWS libs.

5. External Libraries (ReactPHP, Spatie, Parallel)

As a library developer, I cannot afford to have a dependency on these packages, especially those whose maintenance is not guaranteed and ours have to be because of Enterprise agreements. But it was really heartening that there are so many people who echo my pain and go as deep as creating their own libraries. :salute:

What Does This Mean for Developers?

  1. I realized that some developer (me) users might be happier using other language libraries, their code and system utilization might be more optimal given they use any other language library.

  2. Embrace Asynchronous Programming to work with limitations. But this also means adding software development costs (think of higher debugging time as costs).

  3. Really consider ReactPHP, AmpPHP, or similar libraries to work with limitations. Otherwise, you will waste a lot of your time.

  4. Consider running a performance sensitive workload via a more promising language. Say, Go or Node.js. However, you might be paying for serializing and de-serializing the data.

Conclusion

So, who’s to blame? IMHO, it’s the PHP maintainers who should take a call to modernize the language. The historical design of PHP prioritized simplicity and ease of use, which has contributed to its widespread adoption. However, modern web applications require more advanced concurrency and parallelism capabilities.

Parallelism was one key reason Facebook was forced to fork their own programming language called Hacklang based on PHP. And believe it or not, it’s much more performant than PHP itself. Take a bow, Mr. Zuckerberg.

The PHP community and maintainers are aware of these limitations and are making incremental improvements (Ex Fibers). As developers, we should push for these changes while also exploring existing tools and libraries that can help us bridge the gap in the meantime.

TECH
PHP Parallelism Concurrency Fibers Asynchronous Programming Performance

Dialogue & Discussion