Insights from Animal-AI Olympics
It’s a Jungle out there (Unity Arena) … and our AI Agent is learning to survive. He needs to catch food (Green Balls) … If he fails to feed himself within the stipulated time, he will die of starvation. He also needs to save himself from the predators (Red Balls), he’ll die a painful death if he gets caught. There are many kinds of obstacles too, few to avoid, few to be used as tools. There are a few zones that can cause significant damage and can even kill.
To create an Intelligent Agent which can survive and thrive, we explored the Reinforcement Learning Algorithm, these algorithms are motivated by two functions — One is Loss Function which needs to minimize and the other is a Reward Function which it needs to maximize. Under these two constraints, it builds its strategy.
We used a similar approach in creating the algorithm behind our AI Agent. We call it Zombie Intelligence (ZI).
APPROACH TO ZI
ZI while learning to survive in training arenas when encounters any unique situation where it needs to get any specific job done, it instantaneously creates one other agent (We call them Minions). These minions are specialized agents, they acquire skill in one particular maneuver. And to ensure minions do what they are required to do, we have coded the ZI in such a way that it redevises the reward function for each minion uniquely.
For Example, a minion that it has created to climb up the uphill ramps will get the reward only if he walks up the ramp.
This way ZI keeps creating hundreds of minions, each specialized and willing to do one particular job.
When ZI is sent to the test arena, it keeps summoning the specialized minions at each and every step. It only decides which minion to deploy in a given situation, ZI itself never decides actions.
Similarly, Evolution is the ZI and our universe is the arena and living beings are minions.
I was watching the Netflix Documentary on Bill Gates, here I noticed Bill was trying to build toilets and eradicate polio. I was perplexed, given the resources and intelligence he has access to, why isn’t he building space robots … but Gates is no fool, presumably :). That leaves only one hypothesis which makes sense. His reward function is placed in altruism and in serving fellow humans who are in his time-space, not in some distant future.
Evolution knew one planet can not hold life for long, there would be many threats to the planet and otherwise too, it is bound to die its own natural death someday. The only way to ensure the survival of life is to develop neo-cortex in the brains of few minions. It knew that neo-cortex, the faculty to think, is a double-edged sword. But it had no other option.
So it gifted neo-cortex to humans. But it couldn’t leave it at that, it had built few preventive measures too … It knew few minions would use this sword to exploit the environment which may bring death to the planet sooner, so to counterbalance it redevised reward functions in a way that minions like Greta Thunberg would get created, to delay the process.
To be continued …