Robots Learn to Use Kitchen Tools by Watching YouTube Videos

Imagine having a personal robot prepare your breakfast every morning. Now, imagine that this robot didn't need any help figuring out how to make the perfect omelet, because it learned all the necessary steps by watching videos on YouTube. It might sound like science fiction, but a team at the University of Maryland has just made a significant breakthrough that will bring this scenario one step closer to reality.

Researchers at the University of Maryland Institute for Advanced Computer Studies (UMIACS) partnered with a scientist at the National Information Communications Technology Research Centre of Excellence in Australia (NICTA) to develop robotic systems that are able to teach themselves. Specifically, these robots are able to learn the intricate grasping and manipulation movements required for cooking by watching online cooking videos. The key breakthrough is that the robots can "think" for themselves, determining the best combination of observed motions that will allow them to efficiently accomplish a given task.

The work will be presented on Jan. 29, 2015, at the Association for the Advancement of Artificial Intelligence Conference in Austin, Texas. The researchers achieved this milestone by combining approaches from three distinct research areas: artificial intelligence, or the design of computers that can make their own decisions; computer vision, or the engineering of systems that can accurately identify shapes and movements; and natural language processing, or the development of robust systems that can understand spoken commands. Although the underlying work is complex, the team wanted the results to reflect something practical and relatable to people's daily lives.

"We chose cooking videos because everyone has done it and understands it," said Yiannis Aloimonos, UMD professor of computer science and director of the Computer Vision Lab, one of 16 labs and centers in UMIACS. "But cooking is complex in terms of manipulation, the steps involved and the tools you use. If you want to cut a cucumber, for example, you need to grab the knife, move it into place, make the cut and observe the results to make sure you did them properly."

One key challenge was devising a way for the robots to parse individual steps appropriately, while gathering information from videos that varied in quality and consistency. The robots needed to be able to recognize each distinct step, assign it to a "rule" that dictates a certain behavior, and then string together these behaviors in the proper order.

"We are trying to create a technology so that robots eventually can interact with humans," said Cornelia Fermüller, an associate research scientist at UMIACS. "So they need to understand what humans are doing. For that, we need tools so that the robots can pick up a human's actions and track them in real time. We are interested in understanding all of these components. How is an action performed by humans? How is it perceived by humans? What are the cognitive processes behind it?"

Aloimonos and Fermüller compare these individual actions to words in a sentence. Once a robot has learned a "vocabulary" of actions, they can then string them together in a way that achieves a given goal. In fact, this is precisely what distinguishes their work from previous efforts.

"Others have tried to copy the movements. Instead, we try to copy the goals. This is the breakthrough," Aloimonos explained. This approach allows the robots to decide for themselves how best to combine various actions, rather than reproducing a predetermined series of actions.

The work also relies on a specialized software architecture known as deep-learning neural networks. While this approach is not new, it requires lots of processing power to work well, and it took a while for computing technology to catch up. Similar versions of neural networks are responsible for the voice recognition capabilities in smartphones and the facial recognition software used by Facebook and other websites.

While robots have been used to carry out complicated tasks for decades--think automobile assembly lines--these must be carefully programmed and calibrated by human technicians. Self-learning robots could gather the necessary information by watching others, which is the same way humans learn. Aloimonos and Fermüller envision a future in which robots tend to the mundane chores of daily life while humans are freed to pursue more stimulating tasks.

"By having flexible robots, we're contributing to the next phase of automation. This will be the next industrial revolution," said Aloimonos. "We will have smart manufacturing environments and completely automated warehouses. It would be great to use autonomous robots for dangerous work--to defuse bombs and clean up nuclear disasters such as the Fukushima event. We have demonstrated that it is possible for humanoid robots to do our human jobs."

In addition to Aloimonos and Fermüller, study authors included Yezhou Yang, a UMD computer science doctoral student, and Yi Li, a former doctoral student of Aloimonos and Fermüller from NICTA.

This research was supported by the European Union (project POETICON++), the National Science Foundation (Award No. SMA 1248056) and the U.S. Army (Award No. W911NF-14-1-0384 - MSEE DARPA). The content of this article does not necessarily reflect the views of these organizations.

The study, "Robot Learning Manipulation Action Plans by 'Watching' Unconstrained Videos from the World Wide Web," Yezhou Yang, Yi Li, Cornelia Fermüller and Yiannis Aloimonos, will be presented on Jan. 29, 2015, at the Association for the Advancement of Artificial Intelligence Conference in Austin, Texas.

Most Popular Now

Philips and Medtronic Advocacy Partnersh…

Royal Philips (NYSE: PHG, AEX: PHIA), a global leader in health technology, and Medtronic Neurovascular, a leading innovator in neurovascular therapies, today announced a strategic advocacy partnership. Delivering timely stroke...

Wearable Cameras Allow AI to Detect Medi…

A team of researchers says it has developed the first wearable camera system that, with the help of artificial intelligence (AI), detects potential errors in medication delivery. In a test whose...

New AI Tool Predicts Protein-Protein Int…

Scientists from Cleveland Clinic and Cornell University have designed a publicly-available software and web database to break down barriers to identifying key protein-protein interactions to treat with medication. The computational tool...

AI for Real-Rime, Patient-Focused Insigh…

A picture may be worth a thousand words, but still... they both have a lot of work to do to catch up to BiomedGPT. Covered recently in the prestigious journal Nature...

New Research Shows Promise and Limitatio…

Published in JAMA Network Open, a collaborative team of researchers from the University of Minnesota Medical School, Stanford University, Beth Israel Deaconess Medical Center and the University of Virginia studied...

G-Cloud 14 Makes it Easier for NHS to Bu…

NHS organisations will be able to save valuable time and resource in the procurement of technologies that can make a significant difference to patient experience, in the latest iteration of...

Start-Ups will Once Again Have a Starrin…

11 - 14 November 2024, Düsseldorf, Germany. The finalists in the 16th Healthcare Innovation World Cup and the 13th MEDICA START-UP COMPETITION have advanced from around 550 candidates based in 62...

Hampshire Emergency Departments Digitise…

Emergency departments in three hospitals across Hampshire Hospitals NHS Foundation Trust have deployed Alcidion's Miya Emergency, digitising paper processes, saving clinical teams time, automating tasks, and providing trust-wide visibility of...

MEDICA HEALTH IT FORUM: Success in Maste…

11 - 14 November 2024, Düsseldorf, Germany. How can innovations help to master the great challenges and demands with which healthcare is confronted across international borders? This central question will be...

A "Chemical ChatGPT" for New M…

Researchers from the University of Bonn have trained an AI process to predict potential active ingredients with special properties. Therefore, they derived a chemical language model - a kind of...

Siemens Healthineers co-leads EU Project…

Siemens Healthineers is joining forces with more than 20 industry and public partners, including seven leading stroke hospitals, to improve stroke management for patients all over Europe. With a total...

MEDICA and COMPAMED 2024: Shining a Ligh…

11 - 14 November 2024, Düsseldorf, Germany. Christian Grosser, Director Health & Medical Technologies, is looking forward to events getting under way: "From next Monday to Thursday, we will once again...