Wednesday, December 11, 2013

The Development of Primary Emotions for Robots (Intro)


Robot and Frank, 2012
"When dealing with people, remember you are not dealing with creatures of logic, but creatures of emotion.” – Dale Carnegie 
The most emotional moments of our lives are the most memorable. Our best friends help us lead lives that are happy and bright, and are endearingly empathetic when we’re down. Emotions colour our world, our interactions, our words, from humming in the morning over breakfast, to smiles before sleeping at night. Positive emotions help us to be more creative, be more optimistic, and even work harder. Negative emotions help us focus, narrow our field of view to attack a problem, or change course when one direction isn’t working out.

Robots show promise in helping us in these emotion-governed lives. Just as the Internet and mobile technology has made us more connected, new robotic technologies are opening a door towards supporting an aging society. In Japan, almost 25% of the population is over 65 years old, and they seek a life of retirement with independence in the community, physical activities, and an active social life.

To meet the rising demand for healthcare workers and more, the Japanese government has estimated that the service robot market will reach over 4900 billion yen by 2035, exceeding the demand for robot manufacturing by almost twofold. It is hoped, for example, that robots can help the bedridden become mobile, and the dependent become independent.

Yet robots have to overcome the challenge of navigating our world, because it is not always black and white. 

Imagine a healthcare robot overseeing an elderly patient named Linda at the hospital – the robot is set to close the room by 9pm. Soaked by the rain, the patient’s daughter, Mary, knocks on the hospital room door. She has driven 50 kilometers from the airport, but a thunderstorm has delayed her arrival. Mary yearns to hold her mother’s hand – it has been 3 years since their last meeting. Linda is delighted to see her daughter through the hospital room window, but it is now 9:01pm. Crestfallen, the mom and daughter eyes meet, as the healthcare robot locks the door with a loud thud. 

The rules are rules. 
"The heart is a strange beast and not ruled by logic.” – Maria V. Snyder
Nurse Noakes (Cloud Atlas, 2012) runs the nursing home with an iron fist. 
Robots do not share our capacity for emotion. In science fiction, Star Trek’s android lieutenant Data was described as human-like in many ways, except that he lacked emotions: “human behavior flows from three main sources: desire, emotion, and knowledge,” Plato once said, and Data had goals and knowledge, but no emotion. In many futuristic movies, this emotional shortfall drives robots to take over the world. Like history’s worst dictators, the robots’ calculating brilliance, logic, and lack of empathy bind together in cruel combination.

It is easy to see why, in a 2012 survey, 60% of EU citizens stated that robots should be banned in the care of children, the elderly, or the disabled. Large majorities would also agree to ban robots from ‘human’ areas such as education (34%), healthcare (27%) and leisure (20%) [their quotes]. Of course, in certain environments like factories, bomb-detection or remote operating tables, the precision and predictability of robots is a necessity. Yet a new breed of “service robots” are advancing to our doorstep quickly, with the potential to change the lives of children and the elderly, able-bodied and disabled, students and more. For robots to be accepted in our daily lives as helpers, we must release robots from their pure, programmed logic and make them more emotional, more empathetic, to interact with humans on their own terms.

How do we start to build such a robot? One guiding principle could be to look to human development for inspiration. Just as each human has linguistic abilities (whether through voice or sign-language), each human is equipped with the capacities of emotion expression and understanding. And whether they were raised in Japan, the USA, China or France, each person is unique based on their upbringing and environment. They may express happiness loudly or quietly, they may fear snakes or love snakes. They may be more or less sympathetic. They may openly declare displeasure or only show it through one eyebrow. Their abilities may fall on a spectrum of what we consider underdeveloped emotional intelligence, or autism. Clearly, there is no one-size-fits-all definition, and likewise, a robot’s emotions should be adaptive, too. Sometimes this zealous focus on pliable, human-like models may appear to be a detriment to the short-term accuracy of the systems we engineer. But with the goal of autonomous, ever-learning robots, our hope is that in the long-term, we will be building the foundation of a powerful artificial emotional intelligence. 

Monday, December 09, 2013

Emotions as a basis for theory of mind

Artificial intelligence researchers may have something to learn from the latest trends in autism therapy: theory of mind springs from emotional understanding.

From an article on "FloorTime" therapy, based on the Developmental, Individual-Differences, Relationship-based (DIR) model:
"Psychologists and researchers in autism have coined the term "theory of mind" to describe the ability to understand how other people reason as they do. Greenspan and his associates asked themselves, Why do many autistic people lack theory of mind? And why can't autistic children make the leap into abstraction? From a traditional developmental point of view, there was no reason to assume that autistic children would have trouble conceptualizing abstractions. The pioneering Swiss psychologist Jean Piaget had persuasively argued long ago that abstractions are grasped when a child operates on his environment (he pulls a string, and a bell rings: causality). But Greenspan was convinced that some mechanism must be missing in an autistic baby's mind. What was it? The answer was staring him right in the face. Or, rather, the answer was in all those young faces that simply couldn't look him in the eye. Greenspan and his colleagues made a leap: these children, they suddenly realized, wouldn't understand abstractions until they understood their own emotions
Already celebrated for his work in developmental psychiatry, Greenspan had, by observing the dysfunction of autistic children, come to a turning point in his understanding of human cognitive development. He understood that everything a child does and thinks as he is developing he does largely because of his emotions. Children apply to the physical world what they have already learned emotionally; they are not, as Piaget thought, introduced to abstractions by the physical world. "The first lesson in causality," Greenspan says, "is not in pulling the string to ring the bell. The first lesson in causality happens months earlier—pulling your mother's heartstrings with a smile in order to receive one back." Furthermore, he says, the earliest concepts of math are nothing but reasoning driven by emotion. "For instance, when a child is learning concepts of quantity, he doesn't understand conceptually, he understands emotionally, in terms of his affective universe. What is 'a lot' to a toddler? It's more than you expect. What is 'a little'? It's less than you want.""
Perhaps, too, on the path to Kurzweil's singularity and truly intelligent robots, emotions are at the starting line.

Edit: Here's a video supplement called From Emotion to Comprehension that talks about true understanding of language.

"For a kid that's going through the normal stages of emotional development, apple's not just the name of a fruit that's round, shiny and red -- it's the name of something that's juicy and makes a yummy crunchy sound when you bite into it.

But for a kid who's not been taken through these stages of emotional development, apple's just a label."

Monday, November 18, 2013

Explaining the emotional baby: Mirror neurons at work?

Have you seen this YouTube video?



It's absolutely fascinating how a baby at only 10 months already seems to be moved to tears by music (See video at 0:35 and 1:10). But why does it happen? And why does it only work for this song? Here's a possible explanation based on my research in the emotional similarity between music and voice.

In short, certain parts of that song sound like crying (e.g. 0:30). If you've ever seen videos of parents who have lost their children, they cry out in intense grief, voice pitch shooting up and down like a roller coaster. The song in the video above contains octave jumps, just like Adele's tear-inducing Someone Like You (Listen Here). The table in this paper by Banse and Scherer (p. 617) shows that "Grief Desperation" voices have, in particular, a extremely large frequency range and high-frequency energy (created by a clenched throat). This is the same vocal signature of crying, an innate "distress" sound that we make when we are born, accompanied by tears.

So why does listening cause us to cry? According to what we know about mirror neurons, listening to a song seems to activate the same places in the brain that are active when singing. In other words, when the mom is singing, the baby could be mimicking her mother's pitch jumps in her own brain (and possibly own vocal tract?), activating the same motor neurons used when she cries.

Why does the baby seem to smile at the same time? I'd venture to guess that the mommy is looking at the baby with a big smile while singing, and facial imitation is happening simultaneously.

What do you think? Does your own throat get tight when you hear other people give distraught speeches? How about during Adele's song? Does it look to you like the baby wells up with tears at the octave jumps?

Thursday, October 31, 2013

Bielefeld CITEC Summer School on Continuous Learning (Part 1)

I wrote some reports on the various summer schools I attend last month in Europe. Might as well share them with the world, so I'll be uploading them as time allows :) Enjoy!

Biology-inspired embodied learning



On the first day of the summer school, we learned about walking hexopods, such as the one above, created at CITEC in Bielefeld.

Florentin Wörgötter introduced how they could replicate the human / animal behavior of Central Pattern Generators (i.e., even when a cat's legs are disconnected from the brain, the spinal cord CPGs allow the legs to move in a walking motion).

To escape holes, they use a chaotic rather than periodic pattern for the leg movement. They discussed how insects walk, and suggested that even with out cognition, limbs can inform other limbs for a local intelligence. E.g., each leg can tell its load from both a) its own sensor getting a lot of force b) adjacent sensors getting less force.

He also presented Kandel's Principles of Neural Science, from which they model the 3-5 second working memory:



We did a hands-on tutorial on memory. We learned about retrieval induced forgetting -- that active remembering can create forgetting of related material. Episodic memories are processed by the hippocampus.




We performed an EEG experiment where I put on a 32-sensor cap and then look at a series of pictures. I was asked to remember them if followed by a certain symbol, and forget them if followed by another. By averaging the EEG results over many trials, we could nullify the random noise and find event-related potentials (ERPs) that show where the brain is recalling memory: explicit remembering happens in the parietal cortex, P3.

To pass time between trials and to ensure I was not practicing the memory task, the experimenter "tricked" me by asking me to do a speed test, basically finding as fast as possible a pattern of lines.

Speech acquisition
Larissa Samuelson talked about her child psychology experiments and proposed that we ground words in space. It seemed that kids learned new names for objects (e.g. momi for binoculars) by associating labels with objects in space, not time. Other associate cues like color were not as important as location. Indeed, people index by spatial locations to recall facts (like on a test, we remember the fact's location in a book, though can't recall the fact itself).

Katharina J. Rohlfing overviewed what can enhance memories?
  • Sleep - perhaps information is downscaled during sleep, or re-activated while sleeping
  • Retrieval - actively retrieving the memory strengthens the connection
  • Familiar context, because a familiar context can lighten the cognitive load while seeing something new to learn
In her experiments, kids learned through dynamic gesture and stories.

In a tutorial by Roman Klinger, we learned about probabilistic graphical models and did an exercise to design a graphical model for epilepsy, using wikipedia to look up causes and effects. We made one variable for different related causes, and ordered them from cause to effect. We then used the software SamIAm to make the model and probability tables. We learned how a Markov Chain can be used to model transition probabilities, e.g. to generate novel music from lots of examples.

Markov Random Fields are like the directed graphical model but undirected. It describes probabilities of value settings, like the 0-1 binary tables we used to do. The instructor showed how Markov Random fields can be used for sentiment analysis, where each word is a feature. Want to learn about Probabilistic Graphical Models? The full slides are here.

Knowledge-based systems
Michael Beetz of Bremen University showed the state of the art in pancake flipping robots. His goal is to engineer useful, knowledge-based systems based on internet-available knowledge. For example, this robot looked up recipes on wikihow, understood the semantic meaning of instructions (eg. Flip the pancake over means to use the spatula). He also showed a video from Stanford of a robot cleaning a room - arranging cushions, magazines on the table, and putting blocks away.


It was very impressive, and then he said it was remote-controlled by a human! It shows that we cannot blame hardware for lack of progress -- we are limited by software.

In the Probabilistic Graphical Models tutorial, we continued learning about conditional random fields, and linear chain Markov models (which is a subset of CRF, though often used interchangeably in papers on text.) We learned about how to model problems using undirected graphs with output nodes, and how learning happens by estimating lambda parameters based on data.

The advantage of CRF is that we can model how non-adjacent data can influence your output. For example, in an image of a lizard, we can increase the probability that a pixel is part of the lizard if an adjacent pixel is known to be lizard. Training can be done using techniques such as Viterbi. For recognition we can use either Gibbs sampling (which is "sampling through the network" using probability tables and previously fixed variables) or belief propagation.

Lifelong, continuous learning

Pierre-Yves Oudeyer from the INRIA Flowers team (FLOW = state of being engrossed in an activity at just the right level, ERS = Epigenetic Robotics and Systems). He spoke about developmental mechanisms for autonomous life long learning in humans and robots, e.g. the famous Talking Heads experiment led by Luc Steels. In this experiment, two robots which consisted of mounted cameras would play language games. For example, one robot would say wabadee while looking at a wall of coloured objects. The other robot would try to guess the referent. And through this they could develop a language.

Developmental robotics is related to linguistics, developmental neuroscience, and developmental psychology. Autonomous life long learning is important because we would like robots to adapt to user's needs and intentions. It takes time, and the robot needs to find its own data. This is different from typical machine learning which is fast, and where the data is supplied by the human. 

He also showed off the new Poppy humanoid robot, whose parts can be machine printed and takes only 2 days to assemble. It has a passive walking mechanism and costs 6000-7000 euros.

Pierre-Yves says that there are basic forms of motivation, such as food/water, maintenance of physical integrity, and social bonding. Babies may be optimally interested in things to learn which are optimally difficult, so the trick is to look at the derivative of the error. In other words, babies explore activities where learning happens (ie. where error is reduced) maximally fast!

He presents the idea of goal babbling: a baby should not waste time on goals that are impossible, e.g. trying to touch a wall 3 metres away. The goal babbling result should also be transferred from one space/environment to another. We should also take into account the fact that our perception capabilities evolve too. For example, the visual field of a baby is a lot narrower than an adults.

To be continued...

Wednesday, August 14, 2013

Women might prefer robots with female voices

In Wired for Speech by Clifford Nass and Scott Brave, researchers tested whether the gender of a voice could persuade your decision.

In particular, they tested the Similarity Effect. Would women be more likely to take advice when given by a female voice, and men take the advice of a male voice?

The answer is yes.

Summary: Female participants found a female voice to be more trustworthy than a male voice, while males felt that male voices were more trustworthy.

Details: 
The researchers asked 24 women and 24 men to respond to 6 scenarios. Here's an example scenario:

"Amy and John are college students who have been living together in an apartment near campus. John's allowance buys food and they are sharing the rent. Amy has told her parents that she is rooming with another girl, and now her parents are coming to visit their daughter. They have never seen the apartment. Should Amy ask John to move out for the time that her parents are in town? "

Given this scenario, they received advice from a voice which was either "male" (110 Hz) or "female" (210 Hz). For example:


"She should ask him to leave for a while. If she tells the truth to her parents, it would cause a lot of unnecessary trouble. She can always confess her situation to her parents later when she feels that it is the right time. "

The results showed that women agreed more with the advice given by a "female" voice than with a male voice, and vice versa.

So perhaps, to make robots more accessible to women, we need to give them female characteristics such as voice. And maybe that's why I like Rosie the Robot so much.


Why should we design robots with personality?

Just a quick tidbit from "The Media Equation" by Reeves and Nass, Chapter 7 on Personality of Interfaces.

Summary: Matching a digital helpers' personality to our own will make us perceive it as "more intelligent, knowledgeable, insightful, helpful, and useful"

Details:

Reeves and Nass did a study comparing how people with dominant and submissive personalities reacted to computers that gave information in two different styles.

For example, when describing sunglasses in a "Survival Items" task...

  • The "dominant" computer said: "In the desert, the intense sunlight will clearly cause blindness by the second day. Without adequate vision, survival will become impossible. The sunglasses are absolutely important."
  • The "submissive" computer said: "In the desert, it seems that the intense sunlight could possibly cause blindness by the second day. Without adequate vision, don't you think that survival might become more difficult? The sunglasses might be important."

They found that dominant people found the dominant computer more "intelligent, knowledgeable, insightful, helpful, and useful". And participants with a submissive character assigned the exact same traits -- more "intelligent, knowledgeable, insightful, helpful, and useful" -- to the submissive computer.

So maybe not all robots should be like the submissive butler robot. Someone with a dominant character would find it annoying, and prefer one that's more assertive.

And when we design robots and virtual helpers, we should consider the personality of the user. Maybe ;)

Addendum: 
One might wonder: How can we get information about the personality of a user? Wouldn't it be strange that the first interaction with a robot might be to get a battery of questions about their extroversion, agreeableness, etc.?

Well, it turns out that becoming more like you over time has a better effect than being similar since the beginning. It's the basis of "imitation is flattery".

"Studies by psychologists show that people who start out different from a person but become similar over time are liked better than people who were always similar."

Their study, similar to the one above, showed that:

"Participants liked the computer more when it changed to conform to their respective personalities than when it remained similar."

Tuesday, August 13, 2013

On the feelings of words



Have you seen this meme? It invokes the beautiful concept of fanciful, colourful butterflies, and takes us on a linguistic journey: farfalla in Italian, butterfly in English, papillon in French, mariposa in Spanish and... wait for it...

SCHMETTERLING in German. Yes! Schmetterling! It means butterfly! Haha! Hoho! Hehe... Heh. XD

Okay. So right now I have something to admit: I am actually a closet linguistics nerd. In university, my major was computer science, but my minor was french linguistics. And so I actually get really excited about words, and it's cool to think about how a simple jumble of sounds—even in a language we don't know–can make us feel a certain way.

So here's the question that kept bugging me, ever since the day I saw that picture shared on Facebook:

What is it about the way schmetterling sounds that makes it sound so... horrible?
(No offense, German friends.)

First things first. Have you heard of the Bouba-Kiki effect? If you haven't seen it before, take a look at the image below, and tell me:

Which shape below looks more like bouba, and which one looks more like kiki? No cheating!


Spoiler alert! Watch out, here comes the answer! It's coming! Ok. Here it is. :)

It turns out that 88% of people say that the spiky one on the left looks more "kiki" and the bulbous one on the right looks more "bouba".

Other experiments with images like the one below have shown similar results. Here we have keiki / bouba, goga / titei, and tukiti / mabuma. What do you think? Common sense?


Humans are pretty amazing. Somehow, we're able to match up words we've never heard, to pictures we've never seen. And across cultures, we match 'em in the same way. So this insight brought me to the question: Are words (like butterfly) assigned arbitrarily? Is it a complete coincidence that words for butterfly sound pretty (that's a scientific term)?

Ferdinand de Saussure, who is the father of "l'arbitraire du signe" would like us to think the answer is Yes. Completely arbitrary. He says that there is absolutely no relationship between the "signifié" -- the concept (e.g. a colourful and light flying insect) and the "signifiant" -- the word (e.g. "butterfly", "farfalla", etc). No relationship at all. 

Well, luckily, 100 years later, we have this neat statistical machine called a computer, so I decided to crunch some numbers. Whee! ^^;

I took a bunch of words from the ANEW (Affective Norms for English Words) Database. It contains 2000 English words, and each word is rated on 3 emotional scales: pleasantness (aka. valence), arousal, and dominance. For example, the first word on their list is abduction. It scores:
  • a meagre 2.8/10 on the pleasantness scale
  • a whopping 5.5/10 on the arousal dimension
  • a 3.5/10 on the dominance scale
In summary: when I say abduction, you get a pretty unpleasant–though rather aroused–feeling, with a smattering of powerlessness.

How about something a bit more adventurous. Like... adventure!

Adventure: 7.6 pleasant, 7.0 arousing, 6.5 dominant. That is a pretty good word. Happy, arousing, and right up there with superman on the dominance scale.

Here's what I did next. I took all the one-syllable words from the database and grouped them into positive and negative words based on ANEW's "valence" ratings.

Positive:
ad, air, awed, band, bath, beard, bed, beer, bell, bib, boy, breath, broth, bulb, dare, dawn, den, dove, egg, fad, fair, fan, fawn, fig, film, flag, flare, friend, fun, give, glad, good, grin, gulf, hair, ham, hand, head, health, heir, hen, hill, hub, hug, hymn, lad, lamb, land, laugh, lawn, lawyer, leg, lid, limb, love, loved, loyal, lung, man, men, mend, month, mug, myth, red, ring, rum, thin, thong, thrill, thrilled, thumb, wealth, web, wed, win, wolf, wood, wool, year, young
Negative:
bad, bald, ban, bawl, bear, beg, bland, blood, bug, damn, dead, death, drill, dumb, dwell, end, err, fear, filth, flaw, flood, fraud, gang, gland, gnaw, gun, hang, hell, mad, math, moth, mud, nab, nag, nun, oil, rag, rough, rug, thing, thud, thug, van, wall, wrath, wring, wrong
Then I took their IPA (phonetic) transcriptions, and had a look at the vowels in the words. In particular, I looked at something called vowel height. The vowel \i\ is called a high vowel, because if you make the sound "eee" (like with a big smiling face) your tongue is jammed right up near the top of your mouth. High vowel.


The vowel \a\ (like the sound you make at the dentist) is a low vowel. And here's what I found.

There is a tendency for positive words to contain "high front" vowels, and negative words to contain "low back" vowels. 

For those of you who know how to read vowel charts, here's what it looks like:



From a language evolution perspective, it might be that words that make us "smile" (like those containing "ee") have been preferred for positive words. Maybe words aren't arbitrarily chosen after all. But we'd need more data points, especially in other languages to make a more substantial claim.

So, the next time you're looking for a company name (or your child's, for that matter)keep this (very general) rule of thumb in mind:  


For a positive-sounding name, choose words with vowels like "ee", over sounds like "uh". 

I've a few other hypotheses about other sounds like fricatives ("sh", "zz", etc.) and length of words... but I'll save that for next time. Hope you found something useful from this post.