home projects sketches open source talks

Touchy Simulations are fuel for Strong AI

Many companies are working to push the state of the art of machine intelligence forward and they are usually focused on narrow AI, specific problems like image classification or regression models using static data sets. It is mostly autonomous vehicle companies and research companies that are trying to solve Strong AI or AGI (artificial general intelligence) by using semi-realistic simulations.

“An unknown, but potentially large, fraction of animal and human intelligence is a direct consequence of the perceptual and physical richness of our environment, and is unlikely to arise without it.” - John Locke

These simulations, or virtual environments, are theoretically much better for us to train AGIs because it allows the AI to interact and explore a world similar to ours.

And for the cars/robotics companies, it is easier and cheaper to test models inside of a computer versus building a robot to go drive around our world and crash.

I believe that simulators are the fastest ways to build out AGIs, but I will argue that our simulators are toys compared to what we actually need. If large data sets like ImageNet power computer vision models, then simulations fuel AGIs.

Current Simulators

There are about a dozen open source simulators used for artificial intelligence research and many more obscure ones. An older simulation environment that many researchers still use is The Arcade Learning Environment. It is a collection of 50 Atari 2600 games that has an abstraction layer so that you can write simple code to interact with the games. Since it uses Atari games, they are low resolution: 210 x 160 pixels in RGB color, the games are in 2D, and the controls consist of a joystick and one action button.

Since deep learning took off, environments have been improving at a much faster rate. Most of them are in 3D now.


This has actually been around the longest and is used most in robotic simulations, not as much for artificial intelligence research. The project started in 2002. Their paper has been cited over 700 times.

Paper: http://ieeexplore.ieee.org/abstract/document/1389727/


This came out around May 2016. It uses the Doom engine and I believe is the first 3D simulator made for training reinforcement learning AIs.

Paper: https://arxiv.org/pdf/1605.02097.pdf

Project Malmo

Project Malmo is Minecraft customized to train artificial intelligent agents. Their project was released around August 2016. Honestly don’t see much difference besides being in Minecraft.

Source: https://github.com/Microsoft/malmo

DeepMind Lab

From Google’s DeepMind. It uses quake 3 as the base of their engine, so theoritically better graphics thatn VizDoom and Malmo. It was released in December, 2016.

Paper: https://arxiv.org/pdf/1612.03801.pdf


From OpenAI, it was released in March 2017. This replaces some of the simulations in OpenAI’s gym because they were using a non open source physics engine. You can use this with their gym and to try to teach the model different tasks What is interesting is they include joint information on the agent to control its locomotion.

Source: https://github.com/openai/roboschool/


Isaac is a robotics platform from NVIDIA, I have no idea what its like as they announced it in the summer of 2017 and has not been released yet. Their main use case is robotics simulation platform, and there seems to be some integration with OpenAI’s gym, so that could be interesting.

Udacity self-driving-car

From Udacity, this is the easiest to get running on your computer as it is built on top of the unreal engine. It is a simple and fast way to train autonomous cars.

Source: https://github.com/udacity/self-driving-car

All of these simulators focus on getting a good physics base and visual processing. They all give you RGB values from pixels on the screen. For controls it is x,y,z values and some action buttons. Deepmind Lab differs a little in that it can also give you RGBD, which includes a depth value for each pixel and Roboschool gives some joint information. Most of these simulations allow for only 1 agent to be trained at a time, meaning you can’t have multiple agents in the same simulation learning and interacting, but that is changing.

models and simulations are 2 sides of the same coin

Simulations are models. Simulations are just more complicated models made to mimic specific environments and run over time. With reinforcement learning, you have 2 models, the simulation and the model you are building,they interact together with the hopes of your model being able to learn.

There are so many different tradeoffs to make in your simulation, do you go for realistic graphics, realistic physics, gravity,lots of input/outputs, speed, realistic sound, molecular chemistry,etc. Computational power is a limiting factor to what we can realistically model. So we must be deliberate and careful in what we choose to model given the constraints of the system we have. When robotics/car companies train models in simulators and then move them to production, they still have to do a lot of tweaking because of all the differences from simulator to the real world. Imagine how much faster we could move if the model you trained in a simulator could be directly deployed to a physical robot.

There is no doubt that we do not have the right level of abstraction for our agents and simulators.

To improve our simulator/models, maybe the key is to model accurate sound that mimics wavelengths, amplitudes, and oscillations.

Or maybe the key is to incorporate solar radiation and the effects of it on the “skin” of the AI agent. Or maybe we need to model smell with objects that can leave chemical compounds and air flow can spread those molecules allowing agents to smell.
We can’t know unless we do extensive research, theorizing, implementing, and testing repeatedly. It feels at times hopeless.

But just like so many researchers spend time creating lots of new model architectures, I think there needs to be a lot more people studying and building simulations to train our AI models in. I think OpenAI was mostly on the right path when they implemented gym, a collection of environments for you to try and train your models. I wish they had focused on tooling to make it easier to create and add new environments as well. Building new simulations is not a small task though, the reason why there are only a handful of them is because they are extremely hard to build and you need a lot of knowledge to know what are the right parts to build. Some people do implement new simulation libraries, an awesome physics library that came out just in the past year is matter.js.

I think to find the right abstraction, we might need to build the simulator and the model at the same time and test them out together, sort of how DCGANs train 2 related, but opposing models concurrently.

Touch and embodied cognition

All those simulations I mentioned above focus on vision. Vision is something we all use throughout the day, you cannot live a normal human life without vision, no driving the car, no watchig tv, reading,etc. But close your eyes and you might notice there is another sense that is even more pervasive. Touch is everywhere, it is so integrated into our lives that it feels like its not even there, most people don’t think about it.
Touch is so pervasive because it is not just at a specific point on our body like our eyes,ears or nose, it literally covers our whole body via sensory neurons embedded in our skin. You can lose your vision, but you can never lose touch, it is always there while you are conscious. Touch is also the first sense to become functional in utero, at about 8 weeks gestation. Ever since I read Helen Keller’s autobiography, I have been fascinated with the idea of building machine learning models that work with touch sensors. She lived her life primarily through touch because at 18 months old, she became ill and lost her vision and hearing, but still led a fulfilling and intelligent life.
There are many scientists who believe that there is one central learning algorithm in humans. If that is true, then the algorithms we use for vision should work on sound,touch and the other senses. Most of the recent success in AI has focused on Convolutional Neural Networks (CNNs) that process vision information. I think this focus on vision has led us closer to AGI, but has led us down the wrong path. CNNs are mostly static, they don’t have locomotion,location, or time components in the basic building blocks. Recently modifications such as attention and RNNs have been added to give some of these abilities, but they are not part of the core algorithm. With touch, locomotion,location, and time is always there. At the higher levels of cognition, you can’t recognize an object with your finger unless you are moving it. You cannot move and feel an object without knowing the location of where your hand is relative to the object. So to get our neural networks to work with touch, locomotion,location, and time must be built into the basic building blocks. We are currently optimizing for a local minimum versus a global minimum. For AIs to become intelligent like us, they need “bodies” that can interact,feel, and touch with our environment,just like us. Right now the agents in our environment get the screen pixels and the ability to move in the x,y,z axis and a couple of action buttons. We are missing touch in our simulation and the fidelty is too low.

In our hands alone, we possess at least 2500 receptors for every square centimeter, and that number is probably a low estimate since we can’t measure it accurately. The glabrous, or hairless parts of of our hands are comprised of 4 types of touch sensors:

glab from wikipedia

So given what we know about touch (hint: very little) , how can we implement a touch simulator that mimics reality and gets us closer to AGI? A simulation can only do a small subset of what is possible, so we have to figure how can we implement enough of touch that captures the properties we are looking for. I have been studying and thinking about the problem for a long time, trying to figure out what is realistic pathway forward. I definitely had a lot of paralysis analysis.

While reading some old cognitive science papers, I recently came up with an idea to kickstart touch sensors into machine learning in a short timeline. I’m targeting Winter 2018 for a prototype and I’m excited at the potential for this project. This is all still new territory for me, so if this sounds interesting, please contact me to collaborate. I hope to show it to you all soon.