A Maelstrom of Pythons (using Python in disaster recovery at scale)

Facebook services are deployed in multiple data centers across the globe. In order to ensure uptime in the face thousands of incidents, we have developed Maelstrom, a Python based engine which let us test disaster recovery (DR) scenarios and mitigate actual incidents. In this talk I will present how we leverage Maelstrom to improve our service resilience across the globe and how our engine helps us understand the complex nature of inter dependent services deployed over multiple physical locations.