I'm looking at starting a new project with Event Sourcing, and I just keep running into the same fatal flaw in every PHP implementation I can find. First reported by Fritz on the Prooph project, the Prooph people thought it was an unavoidable side-effect of using a document DB to store log information (it is, but that's beside the point).
The problem is that some records are not available when doing a SELECT, but records that have been inserted at a later time do appear in the result set. We can tell that some records were actually inserted before others because the PKEYs hare been allocated. When you get a result set with IDs like:
100
106
107
108
...
If you repeat the query, you'll get:
100
101
102
103
104
105
106
107
108
...
MySQL Projections skipping events (SingleStreamStrategy)
Concurrency
When writing to a database, you always have concurrency. You will have more than 1 client writing to the DB if you have any kind of Web site. At first pass, you might think, oh, I'll use transactions to get a consistent view of the data. But, transactions are only self-consistent, there's no way to get an absolute view of the data at any point in time.
There is no now. ~Justin Sheehy
All of your SELECTs are broken
Yes, all of your regular SELECT
statements are subject to bad reads.
Wait, why don't I see problems all the time with regular SELECT
statements?
Because you normally select WHERE some condition, and you're not trying to tail a log table. You rarely encounter a section of data that is currently being written to because you are normally selecting rows that meet a criteria that isn't under active writing. When you are trying to tail a log table, you're constantly interescting the latest records - records that are currently being written to - with your query.
Transactions and isolation levels
So, you're probably thinking - like I did - that if you increase the isolation level of the transactions that write to the table, you'll get serialized writes and you won't see gaps in the pkeys. Well, that doesn't fix the situation.
To avoid reading gaps in the DB if you're using a polling select, you need that select to have a consistent view of the DB,
which means the SELECT
needs to be in a transaction.
Every implementation I've seen of ES polling has the transactions on the inserts and not on the selects.
This leads to the inconsistent gaps seen in the original bug.
Take away
Databases are hard. Actually, yes they are. Things seem simple on a local dev environment, but that's because your dev environment lacks concurrency.
This problem is similar to "Help, Laravel ate my session data" where I detail how PHP's sessions' file locking has no good database cognate. Yeah, transactions are locks, but you can't hold one transaction open from the beginning of the request to the end (with MySQL). The session callbacks are designed to work with file locks and nothing else. If you are doing lots of session updates during ajax, you're bound to have some data get erased.
https://github.com/EventSaucePHP/DoctrineMessageRepository/blob/master/src/DoctrineMessageRepository.php
https://github.com/broadway/event-store-dbal/blob/master/src/DBALEventStore.php
MySQL Projections skipping events (SingleStreamStrategy)