Patent US7328196 Architecture for multiple interacting robot intelligences (2008)

Richard Alan Peters II:

The links between behaviors are created by the SAN agent during task planning but may also be created by a dream agent during the dream state. The links are task dependent and different behaviors may be linked together depending on the assigned goal.

When the robot is tasked to achieve a goal the spreading activation network SAN agent constructs a sequence of behaviors that will take the robot from its current state to the goal state active map in the DBAM by back propagating from the goal state to the current state For each behavior added to the active map the SAN agent performs a search for behaviors that have a pre condition state close to the postcondition state of the added behavior and adds a link connecting the close behavior to the added behavior An activation term characterizing the link and based on the inverse vector space distance between the linked behaviors is also added to the added behavior The SAN agent may create several paths connecting the current state to the goal state.

A command context agent enables the robot to receive a goal defined task and to transition the robot between active mode dream mode and training mode.

During periods of mechanical inactivity when not performing or learning a task or when the current task does not use the full processing capabilities of the robot the robot may transition to a dream state While in the dream state the robot modifies or creates new behaviors based on its most recent activities and creates new scenarios behavior sequences never before executed by the robot for possible execution during future activity.

Each time the robot dreams the dream agent analyzes R t for the recent active period since the last dream state by identifying episode boundaries and episodes Each recent episode is first compared to existing behaviors in the DBAM to confirm if the recent episode is another instance of the existing behavior The comparison may be based on the average distance or end point distances between the recent episode and the existing behavior or any other like criteria If the episode is close to the behavior the behavior may be modified to account for the new episode.

If the episode is distinct from the existing behaviors the dream agent creates a new behavior based on the episode and finds and creates links to the nearest behaviors The default activation link to the nearest existing behaviors may be based in part on the number of episodes represented in the exemplar behavior such that a new behavior generated from a single episode may be assigned a smaller activation value than behaviors generated from many episodes The new behavior is added to the DBAM for possible future execution.

If a robot is limited to behavior sequences learned only through teleoperation or other known training techniques the robot may not be able to respond to a new situation In a preferred embodiment a dream agent is activated during periods of mechanical inactivity and creates new plausible behavior sequences that may allow the robot during its active state to react purposefully and positively to contingencies never before experienced The dream agent randomly selects a pairs of behaviors from the DBAM and computes the end point distances between the selected behaviors The endpoint distances are the distances between the pre condition state of one behavior and the post condition state of the other behavior The distance may be a vector distance or any appropriate measure known to one of skill in the art If the computed distance is less than a cut off distance the preceding behavior the behavior with the post condition state close to the succeeding behavior’s pre-condition state is modified to include a link to the succeeding behavior.

The robots of Pfeifer and Cohen must be trained to identify episodes that lead to the accomplishment of a task The training usually involves an external handler that observes and rewards robot behaviors that advance the robot through the completion of the task The robot either makes a random move or a best estimate move and receives positive or negative feedback from the handler depending on whether the move advances the robot toward the goal This move feed back cycle must be repeated for each step toward the goal The advantage of such a training program is that robot learns both actions that lead toward a goal and actions that do not accomplish a goal The disadvantage of such a system is that the training time is very long because in addition to learning how to accomplish a task the robot learns many more methods not accomplishing a task.

A more efficient method of learning a task is to teach the robot only the tasks required to accomplish a goal Instead allowing the robot to make random moves the robot is guided through the completion of the task by an external handler via teleoperation During teleoperation the handler controls all actions of the robot while the robot records the state sensor and actuator information of the robot during the teleoperation The task is repeated several times under slightly differ ent conditions to allow the formation of episode clusters for later analysis After one or more training trials the robot is placed in the dream state where the recorded state information is analyzed by the robot to identify episodes episode boundaries and to create exemplar episodes for each episode cluster.



  • Dream-like
  • Dream agent
  • Dream architecture
  • Dream mode
  • Robot dream (dreams, dreaming, daydream, daydreams, reverie, reveries)
  • Robots dream (dreams, dreaming, daydream, daydreams, reverie, reveries)
  • Robotic dream (dreams, dreaming, daydream, daydreams, reverie, reveries)
  • Robotics dream (dreams, dreaming, daydream, daydreams, reverie, reveries)



