So I thought that instead of some generic talk I'll work on a presentation where I would describe in details how we worked on reliability matters @Codewise. And the story should be interesting to people, as our company is well known for running low - latency, high traffic systems (400k requests/s total).
I run this presentation three times, second even as a keynote during DevopsDays Poznań event.
The interesting part is originally this presentation was meant to tell a story of a no-ops company running own Cassandra clusters, however after Geecon edition I changed it a bit and removed Cassandra - related slides, as it turned out that people are very interested in reliability subjects but in a generic approach (not focused only on one service).
So here are slides (videos are still not available): click