Submitted to 7th Joint Symposium on Neural Computation
7 Apr 00
Once built, the NNet uses a version of standard Hebbian learning rule that has been adapted to our robotic task, to allow reinforcement information. Specifically, we imagine a single, extrinsic indicator of a reinforcement signal that is perfectly correlated with success in the enviroment. Because we are now only building relatively small bundles of $O(100)$ cells which can be expected to have diameters of $O(10)$ cells, we make the further assumption that this reinforcement signal is available to all neurons. With these simplifications, our learning rule becomes:
\[ \delta w_{ij} = \eta * a_i * Corr( (a_j > 0.5), Val) \]
where $a_i$ and $a_j$ are the activities of the pre- and post-synaptic neuron, and $Val$ is the reinforcement signal.
Our current implementation is motivated by our instantiation of these robot {\em Bauplans} as \URL{}{Mindstorm Lego creatures}. This means that we have constrained ourselves to the very limited capacities of the RCX controller (8-bit , 32k RAM, 6 I/O channels) and crude light sensors. All locomotion is accomplished by excitation or inhibition of forward and backward motor cells associated with each of a pair of coaxially-mounted wheels.
Testing the effectiveness of these designs can be accomplished in two ways. The most obvious is to actually build each Lego creature and test its performance. But this construction step is by far the most labor intensive and hence expensive, and so we also need a cheaper alternative. We have therefore also developed a simulation environment in which these designs can be tested. This enviroment allows us to model a single ``light'' source and the effects of our simulated robots' motions in this world.
From entirely random initial genomes, we reliably evolve simulated robots that are effective at the light-seeking behavior. An example of a good solution, as it behaves in the simulated environment, is shown in this Animation of evolved robot's behavior}. The same individual is shown at two scales at right and left. The larger image at the right shows sensor, motor, neuron and body cells (as blue, yellow, green, and black squares, respectively), together with excitatory (red) and inhibitory (blue) synapses connecting the neurons. These weighted edges change according to the learning rule and ambient reinforcement signalled with the red dot at the bottom. The simultaneous light-seeking {\em behavior} of the robot is shown in smaller scale at the left.
Despite our many simplifying assumptions, the range of alternative solutions discovered across multiple runs is quite remarkable; many of them are nearly equivalent in performance.
Last updated by rik on 7 Apr 00