Performance Zone is brought to you in partnership with:

Open-source software enthusiast, Node.js and PHP expert, Lean & Agile management fan, Leadership & motivation experimenter, CEO at marmelab Francois has posted 12 posts at DZone. You can read more from them at their website. View Full User Profile

Node.js for PHP Programmers #1: Event-driven programming... and Pasta.

01.15.2013
| 16559 views |
  • submit to reddit

For a PHP developer, asynchronicity is the most puzzling aspect of the Node.js runtime. It's simply a new way to write programs. And once you pass the first learning steps, event-driven programming opens a world of possibilities PHP programmers would never dream of. I'll try to explain you how it works, but first, let's talk about pasta.

Spaghetti_with_tomato_sauce

A Simple Pasta Recipe

I love good pasta. But it's very easy to cook bad pasta, so I have a motto: Do Not Eat Pasta Cooked by People You Don't Trust. Since you're a person I trust, I'll give you a simple recipe that you can't screw up, so that when we meet in real life, we can eat together.

The recipe is spaghetti with fresh tomato sauce. Put a large quantity of water in a pan, together with a pinch of salt - it will take approximately 10 minutes to boil. In the meantime, peel a bunch of fresh tomatoes and chop them roughly. Also peel and mince an onion, a carrot, some garlic, and a stalk of celery. This mix is called a mirepoix, or, as the Italians say, a soffritto. At that time, the water should be boiling, so you can put the spaghetti in the pan. While it boils, put some olive oil in a second pan and heat the mirepoix for a few minutes. Then add the tomatoes, some herbs, salt and pepper, and stirr. You should heat the tomatoes for five minutes at most, because after that they create a certain acidity that only disappears after an hour of heating. Taste the spaghetti regularly to be sure to take it out of the water when it's cooked al dente. Drain it, serve in a warm plate with the sauce, add some basil cut in a chiffonade, and some parmiggiano cheese. Voilà.

Synchronous Programming

If you're not starving after reading this, you can probably follow the next exercise: how would you ask a computer to cook my spaghetti with fresh tomato sauce? In PHP, that would look like the following:

<?php
// time is 0
$pastaPan = new Pan();
$water = new Water();
$pastaPan->fill($water);
$pastaPan->warm($duration = 10);
// now time is 10
$pastaPan->fill(new Spaghetti());
$pastaPan->warm($duration = 8);
// now time is 18
$pastaPan->remove($water);

$saucePan = new Pan();
$saucePan->fill(new OliveOil());
$saucePan->warm($duration = 2);
// now time is 20
$saucePan->fill(MirepoixFactory::create($withGarlic = true));
$saucePan->warm($duration = 5);
// now time is 25
$saucePan->fill(TomatoFactory::create());
$saucepan->warm($duration = 4);
// now time is 29

$plate = new Plate();
$plate->addContentsOf($pastaPan);
$plate->addContentsOf($saucePan);
$plate->serve('Voilà');

There is a big problem with this program: the whole recipe takes 29 minutes, and the spaghetti arrives in the plate already cold, because it waits for more than ten minutes until the sauce is ready. Cold spaghetti is an offense, you should avoid it by all means. When I cook this spaghetti, I boil the water at the same time as I chop the vegetables. Read again the recipe: it contains indications such as "in the meantime", "while it boils", and "taste regularly". The PHP program doesn't do that, no wonder the spaghetti it produces is inedible.

Asynchronous Programming

Let's teach PHP to cook properly. For this purpose, the best design pattern is event-driven programming. The program now relies on a central EventLoop class, which represents a service that loops, incrementing an internal tick at each loop. Other services can add callbacks to be executed at a given tick. When all added callbacks are executed, the loop stops.

<?php
class EventLoop
{
  protected $tick = 0;
  protected $callbacksForTick = array();

  public function start()
  {
    while ($this->callbacksForTick) {
      $this->tick++;
      $this->executeCallbacks();
    }
  }

  public function executeCallbacks()
  {
    echo "Tick is " . $this->tick . "\n";
    if (!isset($this->callbacksForTick[$this->tick])) {
      return; // no callback to execute
    }
    foreach ($this->callbacksForTick[$this->tick] as $callback) {
      call_user_func($callback, $this);
    }
    // clean up
    unset($this->callbacksForTick[$this->tick]);
  }

  public function executeLater($delay, $callback)
  {
    $this->callbacksForTick[$this->tick + $delay] []= $callback;
  }
}

Taking advantage of this event loop, a new kind of service class can be created: asynchronous services. For instance, the asynchronous pan:

<?php
class AsynchronousPan extends Pan
{
  protected $eventLoop;

  public function __construct(EventLoop $eventLoop)
  {
    $this->eventLoop = $eventLoop;
  }

  public function warm($duration, $callback)
  {
    $this->eventLoop->executeLater($duration, $callback);
  }
}

Combining the EventLoop and the AsynchronousPan basically requires passing a callback to warm(), and this callback can be a simple anonymous function:

<?php
$eventLoop = new EventLoop();
$pan = new AsynchronousPan($eventLoop);
echo "Starting to warm\n";
$pan->warm(10, function() {
  echo "Now it's cooked\n";
});
$eventLoop->start();

When you execute this script, it produces the following output:

Starting to warm
Tick is 1
Tick is 2
Tick is 3
Tick is 4
Tick is 5
Tick is 6
Tick is 7
Tick is 8
Tick is 9
Tick is 10
Now it's cooked

Asynchronous Cooking

Now everything is ready for another spaghetti sauce. This time, can the sauce and the spaghetti be ready at the same time? Since the spaghetti takes 18 minutes and the sauce 11 minutes, the program should wait about 7 minutes to start cooking the sauce.

<?php
$eventLoop = new EventLoop();

$plate = new Plate();

$pastaPan = new AsynchronousPan($eventLoop);
$water = new Water();
$pastaPan->fill($water);
echo "pastaPan: Starting to boil water\n";
$pastaPan->warm($duration = 10, function() use ($pastaPan, $plate, $water) {
  echo "pastaPan: Water is boiling\n";
  $pastaPan->fill(new Spaghetti());
  echo "pastaPan: Starting to boil spaghetti\n";
  $pastaPan->warm($duration = 8, function() use ($pastaPan, $plate, $water) {
    echo "pastaPan: Spaghetti is ready\n";
    $pastaPan->remove($water);
    $plate->addContentsOf($pastaPan);
  });
});

$eventLoop->executeLater($delay = 7, function() use ($plate, $eventLoop) {
  $saucePan = new AsynchronousPan($eventLoop);
  $saucePan->fill(new OliveOil());
  echo "saucePan: Starting to warm olive oil\n";
  $saucePan->warm($duration = 2, function() use($saucePan, $plate) {
    echo "saucePan: Olive oil is warm\n";
    $saucePan->fill(MirepoixFactory::create($withGarlic = true));
    echo "saucePan: Starting to cook the Mirepoix\n";
    $saucePan->warm($duration = 5, function() use($saucePan, $plate) {
      echo "saucePan: Mirepoix is ready to welcome tomato\n";
      $saucePan->fill(TomatoFactory::create());
      echo "saucePan: Starting to cook tomato\n";
      $saucePan->warm($duration = 4, function() use($saucePan, $plate) {
        echo "saucePan: Tomato sauce is ready\n";
        $plate->addContentsOf($saucePan);
      });
    });
  });
});

$eventLoop->start();

$plate->serve('Voilà');

Ask PHP to execute the program, and asynchronicity starts to shine:

pastaPan: Starting to boil water
Tick is 1
...
Tick is 7
saucePan: Starting to warm olive oil
Tick is 8
Tick is 9
saucePan: Olive oil is warm
saucePan: Starting to cook the Mirepoix
Tick is 10
pastaPan: Water is boiling
pastaPan: Starting to boil spaghetti
Tick is 11
....
Tick is 14
saucePan: Mirepoix is ready to welcome tomato
saucePan: Starting to cook tomato
Tick is 15
...
Tick is 18
pastaPan: Spaghetti is ready
saucePan: Tomato sauce is ready

Awesomesauce! Everything is ready at the same time after a short 18 minutes.

One side note: When you compare the first script and the last one, you immediately recognize a visual difference. In the first script, all lines of code align vertically; if you want to understand what happens sequentially, you just need to read from top to bottom. In the last script, on the other hand, it's indentation that means that statements get executed one after the other. In order to follow the course of the program, you must read from left to right. In event-driven programming, indentation means sequence.

When Asynchronous Callbacks Must Synchronize

There is just one problem. The plate gets served at the end of the script because the event loop runs out of callbacks. But what if the script had to schedule more callbacks for later execution, like washing the dishes for instance? The plate would be served very late, with cold spaghetti. That's something you should never, ever do. It is sacrilege.

So the script must not rely on the end of the event loop. That means no code should appear after the $eventLoop->start() line. But how to synchronize the fact that the plate gets the contents of both pans, since they run in separate sequences?

You could wait until the tick is 18 and then serve the plate, but that would be betting on chance. Although the asynchronous services of this example are based on timing, this is seldom the case. Most asynchronous callbacks are triggered by the return of a third-party service, like a database query, the opening of a file on disk, or the response from an HTTP request. You never really know when this is going to happen. So you have to rely on new techniques for resynchronization.

On the case of the spaghetti with tomato sauce, it's quite easy:

<?php
class PlateOfSpaghettiWithSauce extends Plate
{
  protected $hasSpaghetti = false;
  protected $hasSauce = false;

  public function addContentsOf(Pan $pan)
  {
    parent::addContentsOf($pan);
    if ($pan->contains('Spaghetti')) {
      $this->hasSpaghetti = true;
    }
    if ($pan->contains('Tomato')) {
      $this->hasSauce = true;
    }
    if ($this->hasSpaghetti && $this->hasSauce) {
      $this->serve('Voilà');
    }
  }
}

The plate needs to know what it expects, and check whether it is complete each time it receives the contents of a pan. The example doesn't really matter, but you must remember that you are in charge of synchronization in event-driven programming, not the runtime.

There Is Still Only One Cook

The first script takes 29 minutes to cook bad pasta, the second one takes only 18 minutes for simple yet delicious pasta. And yet, there is still only one cook - or, in programming terms, one execution thread. You could spawn a new thread to cook faster, but they are things that can't be accelerated: If you use a flamethrower to warm up the tomato, it won't be ready sooner: it will be burned.

Let me rephrase this: This is not parallel programming. The code executed in the last script executes sequentially, however the event loop gives the illusion that code gets executed by "someone else". Also, this last script packs more actions in a shorter time. Seen from a CPU, that means less cycles wasted waiting for blocking operations, and more cycles actually spent computing. Event-driven programming allows you to use more of your CPU.

In sequential computing, the wasted CPU cycles by a given thread are salvaged by other, parallel threads. That's why, when using PHP with Apache and mod_php, you define the number of concurrent server processes. The trouble is, each time you spawn a new thread to recuperate wasted CPU cycles by another thread, it consumes a lot of memory. Conclusion: to use the CPU as efficiently as an event-driven program, a sequential program needs a lot more memory. And memory is the most expensive part of web servers.

What About Node.js?

How does Node.js compare to PHP? The same way the first script compares to the last one. Node.js promotes event driven programming, and facilitates it. First, it relies on JavaSript, which provides an advanced event system. Second, Node provides an EventLoop object, and starts it for you at the end of the main script. Last, Node provides new (native) implementations for most blocking I/O operations that takes advantage of asynchronicity.

For example, to delete a file on the disk using Node, the code looks like this:

var fs = require('fs');
fs.unlink('/tmp/hello', function (err) {
  if (err) throw err;
  console.log('successfully deleted /tmp/hello');
});
// more code
console.log('deletion script');

Node starts by executing the main script until the end. The unlink() function tells the disk to look for a particular file, remove it, and call back node when it is done. In the meantime, Node can continue executing more code. The 'deletion script' message is therefore logged before the deletion actually occurs. When it reaches the end of the script, it starts the event loop. It is now ready to receive a signal from the filesystem signaling the actual deletion, which will trigger the execution of the anonymous function that logs the 'successfully deleted' message.

JavaScript also adds some syntactic sugar to PHP closures. Thanks to lexical scoping, it is not necessary to mention the use statement when declaring an anonymous function, as JavaScript remembers its state between executions. So a PHP asynchronous call:

<?php
$plate = new Plate();
$pastaPan = new AsynchronousPan($eventLoop);
$water = new Water();
$pastaPan->fill($water);
echo "pastaPan: Starting to boil water\n";
$pastaPan->warm($duration = 10, function() use ($pastaPan, $plate, $water) {
  echo "pastaPan: Water is boiling\n";
  $pastaPan->fill(new Spaghetti());
  echo "pastaPan: Starting to boil spaghetti\n";
  $pastaPan->warm($duration = 8, function() use ($pastaPan, $plate, $water) {
    echo "pastaPan: Spaghetti is ready\n";
    $pastaPan->remove($water);
    $plate->addContentsOf($pastaPan);
  });
});

Translates to JavaScript as:

plate = new Plate();
pastaPan = new AsynchronousPan();
water = new Water();
pastaPan.fill(water);
console.log('pastaPan: Starting to boil water');
pastaPan.warm(duration = 10, function() {
  console.log('pastaPan: Water is boiling');
  pastaPan.fill(new Spaghetti());
  console.log('pastaPan: Starting to boil spaghetti');
  pastaPan.warm(duration = 8, function() {
    console.log('pastaPan: Spaghetti is ready');
    pastaPan.remove(water);
    plate.addContentsOf(pastaPan);
  });
});

Lastly, Node allows to write dynamic servers, while PHP needs a server on its own (like Apache) to handle the raw HTTP request. That means that a Node server stays in memory and is compiled only once. Compared to mod_php, which loads the entire PHP runtime and all the required scripts for every single HTTP request, that's a big boost.

Note: One important JavaScript gotcha PHP programmers should be aware of is the evil nature of the this keyword. I already mentioned it in a previous post.

Conclusion

Overall, Node.js is faster than PHP, and consumes less memory. However, Node programming becomes increasingly difficult as the server codebase increases, because you then need to synchronize manually. That makes Node suitable for small to medium size servers doing simple operations - REST APIs, for instance, are a good fit.

To me, Node's true value, besides the event loop, is the new asynchronous implementation of database queries, file operations, HTTP queries, etc. This is pure gold, especially when you realize that other implementations basically waste your CPU cycles.

Now that you're aware of the most distinctive feature of Node, I hope you're convinced to give it a try. I also hope you're convinced that cooking your own pasta is easy and that you will try my open-source recipe.



Published at DZone with permission of its author, Francois Zaninotto. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Farrukh Shahzad replied on Thu, 2013/10/10 - 2:35am

Thanks! This is a great article about Node.JS that can give me (a php guy) a CLEAR sense of what is the fuss about "Non-blocking I/O Event driven etc etc"

5 stars (Y)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.