Sensible Development

Software Development Blog
Recent Tweets @lprimak

Apparently I’ve been living under a rock, and just crawled out of it, but reactive is all the rage right now.  Maybe I am showing my age, but reactive seems like cooperative multitasking.
So, my question is… Why do it and why all the fuss around it?
Let me give you a bit of background…
I have tried to build a complex high-performance reactive system about 10 years ago.  It seems like a long time ago, but the reality is that Linux kernel has not changed all that much in that time, nor the environment in which user-space programs run under Linux, so the takeaway from my experience then fully applies to the systems environment today.

Let’s list the supposed advantages of a reactive system:

  • Don’t need as many threads as client connections
  • memory footprint is decreased due to fewer threads
  • capacity is increased due to decreased resource usage (threads being the resource)

What are reactive’s disadvantages?

  • Programming model is much more complex
  • Potential for increased latency due to less threads processing concurrent connections
  • Lower throughout and higher latency (will discuss this below)

Let’s compare what both user-level and kernel-level steps that are required for a synchronous, blocking, multi-threaded system vs. non-blocking reactive system.
The task here is to do one cycle either through event loop in case of a reactive system, or perform one read/write in a synchronous blocking system, which accomplishes the same thing.

Blocking / Thread-Pool:

  1. (users space) read/write() - calls into the kernel
  2. (kernel) multiplex multiple blocked threads and dispatch ones that are ready to read/write
  3. (kernel) fill in read() result or send out the write() result
  4. (user space) return from read/write() with data sent or filed in

Reactive / Non-Blocking / Single-threaded:

  1. (user space) set up select/poll/epoll/kqueue/etc. data structures with appropriate descriptors
  2. (user space) call poll() or it’s relatives
  3. (kernel) figure out which descriptors are ready to read/write and dispatch
  4. (user space) fill in poll() etc. results into appropriate data structures
  5. (user space) user program parses poll() results and dispatch appropriate user code
  6. (user space) user code calls read/write with appropriate descriptor
  7. (kernel) fill in read() result with data or send out the write() result
  8. (user space) return from read/write() with data sent or filled in
  9. (user / kernel) repeat steps 6 through 8 for other descriptors

As you can see, the steps involved in reactive architecture are more involved, and there are a lot, really a LOT more steps involved to do the same task.  Also, in case of a blocking system, all the tasks are exactly the ones the kernel is does very well, and kernel is optimized to perform exactly those tasks.  This can’t be said for a reactive system, as the tasks are done that are not optimized to be done this way.
Reactive architecture executes much more slowly, using much more CPU resources for the same task, and the benchmarks support this.  There are also latency issues since there is a single thread dealing with multiple connections, latency is naturally increased.
The benchmarks used are basically a load test.

Of course, real life is not like a load test.  There are HTTP 1.1, WebSocket, and other scenarios that require many idle connections.  This is where reactive has it’s only advantage.  
Let’s talk about this and other advantages that reactive has:
Idle connections, and only idle connections can be handled effectively in a reactive way.  Web server knows when a connection is likely to be idle, i.e. after an HTTP request is done in HTTP 1.1, and server can put that connection in a pool that’s handled by a poll() loop in a separate thread.  The connection is never marked non-blocking.
This way, when a connection stops being idle, it can be put back in a thread pool which will process it in a regular, blocking way.
Another false advantage of reactive is lower memory footprint.  This can also be mitigated by setting thread stack size low for the connection processing thread pool.

Conclusions:

  • Reactive really has no place in user non-infrastructure code
  • User Code should be written in a threaded, blocking way, at least in a web, request-response environment. Exception to this would be service orchestration, but it would still be handed synchronously, just in a separate thread using a Future-based APIs 
  • Reactive can be used in a web server infrastructure only, and only to handle idle connections
  • Stack size can be lowered in the connection processing threads by the web server infrastructure to reduce memory footprint