Get started Bring yourself up to speed with our introductory content.

Telepresence robotics with Nao and Kinect

Most of you have likely seen the film Avatar. In this movie, Jake Sully takes control over the humanoid body of a so called Na’vi, one of the indigenous people of the planet Pandora. The technical foundation for such a technology is called “telepresence robotics.” Most people don’t know that you can already create an immersive out-of-body experience today using only off-the-shelf hardware.

In this blog post, I will show you how a few technology enthusiasts achieved this goal. Our prototype includes the Nao Robot, an Oculus Rift, a Microsoft Kinect camera and lots of IoT hardware.

Have a look at the Gesture Guys’ newest video:

Multimodal telepresence system

Theoretical foundation


First, let’s have a look at the details of a multimodal telepresence robotics system. For such a system, the human operator is controlling a so-called haptic display (for example, a force-feedback joystick). The operator system interprets the input signals and sends them to a teleoperator system using some kind of communication channel. Now the robot can interpret the commands received via that communication channel. Then it performs the actions the human operator wants it to do. This way it manipulates the remote environment. Furthermore, the remote system is equipped with sensors like cameras or microphones. The data from these sensors is sent back to the operator site. For this data transfer, we normally use the same communication channel as for the control signals. We can then display them on an audio-visual display (for example, a computer screen). But we can also use the haptic display stated before.

Practical implementation


Let us now focus on the robotics system the software consultants from TNG Technology Consulting created. Here, the human operator uses his body movement to control the teleoperator. A Microsoft Kinect camera films his body and transforms that data into the joint angles of the robot. An Oculus Rift sensor controls the movement of the robot’s head. Our software prepares and combines the sensor data and sends it to the robot at the remote site. We are using Wi-Fi for all our communication. The Nao robot has two cameras attached to his head which are filming the remote environment. On the back of the robot, we installed an additional computer and connected it with the cameras. Using Wi-Fi, this computer sends the camera video streams back to the operator site. After receiving the streams, the Oculus Rift finally displays the video signals so the human operator can see what the robot is seeing. Furthermore, the robot is doing what the operator in front of the gesture camera is doing.



To make this work, we needed a computer located at the operator site. At the remote site, a Nao robot manipulated the environment. The laptop we used had a direct connection to the Kinect camera and the Oculus Rift. We sent the data of these controllers to the Nao using the WAMP communication protocol. It was first sent to a WAMP router called “Crossbar” running on a Raspberry Pi Revision 3. This Raspberry computer also contained the controller software. It transformed the incoming data into control commands and sent them to the Nao using Wi-Fi. The robot, on the other hand, sent some data, like its current posture, back to the laptop. To send this data, it used the reverse communication channel.

We connected the cameras to a second single board computer (like the Intel Edison) and attached them to the robot’s head using custom 3D printed glasses. The two horizontally aligned cameras then captured the video stream using the software Gstreamer and transferred them to the laptop using a UDP stream.


Network bandwidth
One of the main challenges of this design was the transfer of the camera images. We had to keep the latency of the video stream low while at the same time not exceeding the network bandwidth. Our solution for these problems consisted of two parts: First, we created two separate networks using a custom router board. We used the first 802.11n network solely for transferring the video streams; the second one was used for all other communication (such as control commands and feedback channels). To keep latency as well as CPU usage low for the camera computer, we used an MJPEG stream with a resolution of 640×480 pixels for each eye. This way we could limit the bandwidth to below 40 MBps most of the time and still get images without any lag. We still didn’t get two full HD pictures this way, but that would not have been possible with a USB 2.0 controller anyway.

Taking control
Another problem was telling the person in charge when he was in control. Therefore, we implemented a simple gesture for taking over the Nao; we chose a circle gesture using the left hand. In the following example, I will describe how you get started with the Avatar project. At the beginning, you put on the Oculus. There are two cameras mounted to its front, showing you the actual world around you. When performing the circle gesture, you can see a tunnel animation video. Then you see the video stream from the Nao glasses instead of your own field of view. At the same time, you are also taking control of the robot’s movement.

Oculus Rift
One last technical issue was getting the Oculus Rift to run on a laptop. Normally, it only works on a desktop system. However, using the newest Nvidia drivers and deactivating most of the Optimus features, we managed to work with it on a W540 laptop system. This Lenovo laptop uses a GeForce Mobile card and an Oculus Rift DK2. We are using Oculus SDK 0.8 and Windows 8.1 currently. Unfortunately, newer versions of the SDK or Windows did not work with that configuration.

The road ahead

At the moment, we are focusing our work on introducing some augmented reality and further robotics features into our Avatar experience. We are also trying to integrate the Intel RealSense camera on the remote site. Until then, please have a look at our promotional video and — as always — stay tuned for more.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.