Hi Lukas,
On Tue, Nov 25, 2014 at 7:14 AM, Lukas Eder <
lukas...@gmail.com> wrote:
> Hi Ben,
>
> Hmm, I thought that this was indeed a queue implementation. Very
> sophisticated, with recursive SQL. I'm personally curious (having recently
> written a somewhat controversial article on the subject:
>
http://blog.jooq.org/2014/09/26/using-your-rdbms-for-messaging-is-totally-ok/).
> Were you at the point where you also evaluated the performance and locking
> implications of your implementation? It looks like you have implemented
> something similar to Oracle's magical FOR UDPATE SKIP LOCKED clause, which
> is the implementation basis of Oracle AQ that has been made public (=
> documented) with Oracle 10g, I believe.
TL;DR: I haven't benchmarked this implementation in real world
conditions, so I can't comment on real world performance.
And to be clear - the idea with the advisory lock is not my own - I
shamelessly stole it from here:
https://github.com/chanks/que/blob/master/lib/que/sql.rb
Back in the day I worked at an organization where AQ was our only
officially supported distributed choice of FIFO (i.e. Oracle was only
the piece of middleware we were allowed to run in production). 10g
used to have a limitation (not sure whether this is still true) that
the AQ proc for enqueuing and consuming point to point "messages" was
generally available, but the equivalent proc for pub-sub "messaging"
was implemented in a different package called something like
DBMS_AQ_ADMIN. Because of the name x_ADMIN, the DBAs refused execute
privileges to this package for non-DBA usage. So we ended up hacking a
polling based pub-sub solution. Basically we re-invented the wheel and
made turned it into a pentagon, but even pentagons can roll.
At the time, I was also part of the core RabbitMQ dev team, so I
benchmarked our solution against a Rabbit solution, and the AQ based
solution turned out to be a bit of a tortoise (legal disclaimer - for
the use case we were looking at, and also because I was fully aware of
the internals of Rabbit, so I knew where its sweet spot lay), but it
was fast enough for our purposes, so we stuck with. Rabbit was still
pre-1.0 and besides which, we would never have been allowed to run it
in production anyway.
I did briefly peruse at the AQ proc at the time, saw that it was using
a skip list, but I didn't go as far as profiling the IO
characteristics on the server, so I'm not exactly sure how smart the
AQ implementation is under the covers.
The main point of contention with using a DB as a queue is that to get
the FIFO semantics right in conjunction with exclusive consumption, in
the general case you need to lock a row. That's where Postgres has
this handy little proc which allows you to try to acquire an in-memory
mutex of your own liking. So cobbled together with a re-entrant query,
you can try to lock head of the queue, and in the instant that you
have an outstanding acknowledgement (i.e. a row has gone out over the
wire without a commit), you can skip to the nearest unlocked row.
Because the lock is in memory, there is no IO penalty. The downside is
that you can't scale this across multiple instances (assuming you've
mastered the art of multi-master writes in Postgres). But for old
school people like me who still believe in CAP, I avoid this issue by
only having a single master (i.e. I'm trading off write-availability
for consistency).
If you need distributed FIFOs, using a proper MQ usually doesn't get
you sacked these days. But even grown-up MQ's are subject to the CAP
triangle, so YMMV.
Ironically, back in the day, the design philosophy of RabbiMQ was to
deliver messages, not to queue them. It was only the fact that the
product had the letter Q in the name that led people to believe it was
a good idea to queue up messages. So then the MQ turned into a
database. Everything went full circle.
Personally I don't have any strong feelings either way. If you can
achieve you FIFO distribution requirements with your DB, you don't
have any flow control considerations and the performance penalty is
negligible, why not go for that? It saves you from having to deploy,
manage monitor another piece of infrastructure.