I was talking to a colleague the other day and we were talking about some proposed tool and architecture that had been talked about, which seemed to need a lot of human intervention in its weekly, or monthly existence and it may even just randomly shit on everything. Figuratively and this is when we coined the term “tamagotchi design”.
Tamagotchi design is an anti pattern. So not one to follow. If you are not familiar with what a tamagotchi is:
“The Tamagotchi (たまごっち Tamagocchi?) is a handheld digital pet, created in Japan by Akihiro Yokoi of WiZ and Aki Maita of Bandai. It was first sold by Bandai in 1996 in Japan.”
The idea is that you have to look after it or it dies, or gets in a mess. So you regularly feed it, clean up after it, play with it and cuddle it, maybe?
Do you regularly have to restart servers or micro-services?
What we were saying was, if you find yourself in a world where you have to regularly restart your servers or micro-services, need to check they haven’t lost connection to plug ins or maybe you have implemented a queuing system that needs a manual kick once in a while to keep on processing, then you may have tamagotchi design on your hands.
To avoid tamagotchi design, I would always try and have a plan form an architectural point of view and talk about points of failure a lot and how you will recover from those. Of course some playbooks may include stuff like restarting servers and sometimes that is inevitable but you don’t want this to be in your general run book.
Use an architectural diagram
Drawing or asking your team member to draw an architectural design is so useful. Then just take away the lines and ask what happens. Are there transactions in the system that can still happen if that line is missing? Do we know what the customer sees when this happens? If you don’t know, start testing and providing that information.
Can you also test your playbooks, if they will actually work?
If your system does require frequent human intervention, can these be automated? Can you create a feeding schedule so to speak? Or a cleaning up schedule? How about alerts so you can make sure the system does not get into a right mess?
Monitoring and alerting
How well are you monitoring your system? Do you really know what it needs to be happy and what happy and unhappy really looks like? Maybe you are playing with the system when really it just wants to be cleaned up? Monitoring can tell you a lot about what is going on and learning what happy looks like versus different unhappy states can be invaluable when working on a system.
The worst bit about a tamagotchi was always having to react to it. If you could prepare and feed it a bit earlier or play with it before it got bored, you would have a much happier pet. Alerts could do the same for you in your tamagotchi architecture. They could warn you that queues are getting full for example and you could try and remedy this early.
Do you work with a tamagotchi design? I hope no-one does but maybe you do and can share a bit about your experience?
EDIT: When I told my other half about this anti pattern he mentioned pets and cattle analogy which is very similar to this and maybe I am just muddying the waters with my tamagotchi design message? You can read the history of pets vs cattle on the linked blog. Here a quick explanation from the same blog.
Servers or server pairs that are treated as indispensable or unique systems that can never be down. Typically they are manually built, managed, and “hand fed”. Examples include mainframes, solitary servers, HA loadbalancers/firewalls (active/active or active/passive), database systems designed as master/slave (active/passive), and so on.
Arrays of more than two servers, that are built using automated tools, and are designed for failure, where no one, two, or even three servers are irreplaceable. Typically, during failure events no human intervention is required as the array exhibits attributes of “routing around failures” by restarting failed servers or replicating data through strategies like triple replication or erasure coding. Examples include web server arrays, multi-master datastores such as Cassandra clusters, multiple racks of gear put together in clusters, and just about anything that is load-balanced and multi-master.