A post this week at the streaming giant’s tech blog explained a suite of tools dubbed “The Simian Army.” Developed to test the limits of the company’s cloud-based services, the various programs share the monkey surname and each perform a specific task – poking and prodding different parts of the infrastructure for weaknesses and promptly correcting them. For example, “Janitor Monkey” cleans up unnecessary resources in the cloud, while “Security Monkey” identifies possible gaps in protection and maintains DRM and SSL certificates.
Yury Izrailevsky, Director of Cloud and Systems Infrastructure at Netflix, explained that the new members of the Simian family are based off the previously deployed “Chaos Monkey.”
“[Chaos Monkey] randomly disables our production instances to make sure we can survive this common type of failure without any customer impact,” Izrailevsky wrote. “The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables – all the while we continue serving our customers without interruption.” He proclaimed that so far “Chaos Monkey” has been successful and provides Netflix engineers with unique insight into possible system flaws and how to avoid or counter them in the future.
Izrailevsky admitted that “no single component can guarantee 100 percent uptime.” The goal is to make it so even if one component fails the system perseveres, he said. “But just designing a fault tolerant architecture is not enough. We have to constantly test our ability to actually survive these ‘once in a blue moon’ failures.”
The new batch of monkey-inspired programs is only the beginning.
“Ideas for new simians are coming in faster than we can keep up and if you have ideas, we’d love to hear them,” said Izrailevsky, adding that the virtual monkey family is just “one of many initiatives” utilized to maintain constant streaming to subscribers.
Because placing all your faith in a gang of virtual primates would just be bananas. (Sorry!)