silo.ai
  • Services
  • Solutions
  • Work
  • Research
  • Contact
  • •••
    • About
    • Careers
    • Learn
Menu
  • Services
  • Solutions
  • Work
  • Research
  • Contact
  • •••
    • About
    • Careers
    • Learn
silo.ai
  • Services
  • Solutions
  • Research
  • About
  • Careers
  • Learn
  • Contact
Menu
  • Services
  • Solutions
  • Research
  • About
  • Careers
  • Learn
  • Contact

The latest advances in human-centered AI through the eyes of a professor of computer vision

  • March 12, 2021

As the pandemic changes our conception of what can be done remotely, virtual and augmented reality meetings and events online have become increasingly important. With the help of computers transmitting our communication, it is possible to create more elaborate and immersive gaming and meeting interfaces. These new ways of humans being with one another leverage advanced technologies such as computer vision, VR, and AR, or rely on user gesture and speech input via camera and microphone in novel ways. 

Our Lead AI Scientist Hedvig Kjellström has been dedicating her academic career to researching the interface between humans and robots, observing how AI and computer vision can help machines understand us humans better in terms of what we say, how we gesture, and how we move. In this interview, Hedvig lets us into her world of advanced human-centered AI.

Hedvig, your research at KTH Royal Institute of Technology in Stockholm has focused on AI technologies and human communication. How do you see this growing field of human-centered AI?

– In order to better communicate through virtual ways, we humans need technology to understand us better and then, synthesize our communication. The first need focuses on us humans successfully communicating to a machine what we want to be transferred, and the second aims at a machine communicating our intent to another human. 

When it comes to technology needed for understanding human communication, there have been tremendous advances in the field. Today, there exist various techniques based on deep learning that focus on extracting communicative content from observed human gaze and body motion from videos. Another related field that has been growing fast is the speech-to-text interpretation: with the emergence of deep learning, various commercial systems such as Siri and Alexa have been developed. The technology is now mature enough, and the use of the spoken interface will likely increase in the future as more use cases come up.

If we need machines to transfer what we want to say, we need to develop technology for synthesizing human-like communication. With the urgent need to improve remote work contexts through virtual conferences, meetings, and other gatherings, there’s also been a significant rise of interest to improve human communication synthesis in the gaming industry.

In computer graphics, generating human-like motion and animating avatars, e.g. for animating characters in games, is increasingly getting more attention. In 2020 and 2021 we’ve seen innovative virtual experiences and gatherings happen which in the future could leverage more the animation of body language in robots and virtual avatars in human-computer interfaces (see example of the Gesticulator below).

In the research field one major theme regarding social robots and systems such as Siri or Alexa is the dialogue systems they use to generate the semantic output of the agent, in other words, generating the text or words they “speak”. This can be seen as the “brain” of the AI-driven tool.

Describe one example of a human-centered AI project you’re involved in.

– At KTH Royal Institute of Technology I supervise a research project called Gesticulator, where we give robots and virtual avatars a human-like body language. The method is based on deep learning and trained with examples of how humans gesture while speaking. The method is then used together with the speech generation of the avatar, and gestures are generated that fit the avatar’s speech. 

In an extension project called GestureBot (see the image below), the gesture generation was integrated into a complete web-based dialogue system – you can follow the link and interact with the gesturing avatar in real-time!

In other projects, we develop methods for agents to interpret also the non-verbal communication of human users. As mentioned above, understanding and communicating back is crucial – a  complete human-agent interaction system will need to both interpret and generate both verbal and non-verbal communication.

In a gaming context, avatars having a more human-like way of gesturing could make the experience more immersive. The work done in Gesticulator and other similar projects is a step on the way towards that goal.  

GestureBot in action. Photo: Gesticulator project https://svito-zar.github.io/gesticulator/, and GestureBot https://nagyrajmund.github.io/project/gesturebot/

What is the main challenge in your work, trying to solve communications between machines and people?

– The biggest challenge is that human communication has a really complex structure. There exist also a lot of variability between individuals and cultures. It’s hard to program complex dependencies on underlying factors such as the person’s mood, previous experience, or contextual knowledge.

Moreover, the signals (a speech sound, images of the person) can be really noisy, so it is tricky for a computer to filter out the important information from all the unimportant factors, such as the color of the person’s clothing, sounds, and objects in the background or the variation of lighting, to mention a few.

As you already have 25+ years of experience in computer vision, give us some perspective on how things have been advancing during that time.


– Deep learning has made it all possible, before it was very difficult to reliably track a human in a video sequence. I spent my whole PhD work from 1997 to 2001 on that. Nowadays any iPhone can do this in real-time, which is incredible.

This is of course both thanks to better algorithms, us scientists being able to build on the earlier work done by other people, and better computing power. 

However, the major breakthrough has been the deep learning idea of formulating learning problems in a parallel fashion so that the computations can be split up on a huge number of computing units, GPUs. 

How do you see the future of human-centered AI unfold?

 – The current big thing to solve is to make deep learning methods interpretable and explainable – to enable humans to understand what is going on inside the learning algorithm. Deep learning methods often are referred to as “black-box methods”, where you don’t understand why the decision or prediction is what it is. It’s interesting to develop “grey-box methods” where you can trace the decisions being made by the network on its way from input to output. 

For example, if you have a method that classifies facial expressions into a number of emotions, a black-box method would only take in the image as input and produce a class label as output – “angry”, whereas a grey-box method might in addition explain that it paid attention to the shape of the eyebrows and how tight the lips were. 

It will also be important with methods that can make use of and weigh together information from many different sources in the learning process. For example, in the emotion recognition example, it’s relevant to encode knowledge from cognitive psychology on how humans express emotions. This is done to a certain extent in current systems, e.g. by using the Facial Action Coding System FACS.

Eventually, more interpretable and explainable technologies will help us humans leverage these technologies in everyday use, as only when we can understand, we can start building trust.


Image of Anna Mossberg

Would you like to increase the level of human-centered AI at your company? Get in touch with our Managing Director in Sweden, Anna Mossberg to talk more via LinkedIn or shoot an email.

Share

Share on twitter
Share on facebook
Share on linkedin

Author

  • Pauliina Alanen Pauliina Alanen

Topics

Computer VisionResearch

You might be interested in

Silo AI and Mila join forces to connect leading AI experts in the Nordics and Canada 

Pauliina Alanen 2.2.2023

Silo AI, one of Europe’s largest private Artificial Intelligence (AI) labs, is proud to announce a partnership with Mila – Quebec AI Institute, the world’s largest academic deep learning research center. Founded by the leading AI researcher, Yoshua Bengio, Mila brings together more than 1,000 academic researchers in machine learning (ML). The partnership aims to connect state-of-the-art AI research with industry needs. With a strong experience in building AI-driven products, Silo AI has gathered a unique pool of AI talent including 240 AI experts, out of which 120 have a PhD degree. 

Read more

Read More

Hype, hope or revolution: What is ChatGPT and do you need to care?

Peter Sarlin 31.1.2023

The hype is most definitely real. OpenAI’s conversational chatbot ChatGPT has in recent weeks provided hope. But is it a true technological revolution? Put simply, the answer is both yes

Read More

We challenge you to ask why

We don’t only deliver projects but we challenge you to think different.
Contact

Subscribe to Silo AI newsletter

Join 5000+ subscribers who read the Silo AI monthly newsletter

silo.ai
Contact

+358 40 359 1299

info@silo.ai

  • Helsinki, Finland
  • Stockholm, Sweden
  • Copenhagen, Denmark
Menu
  • Home
  • Services
  • Solutions
  • Research
  • Work
  • About
  • Careers
  • Contact
Menu
  • Home
  • Services
  • Solutions
  • Research
  • Work
  • About
  • Careers
  • Contact
Resources
  • Learn
  • Inference podcast
  • For media
  • MLOps
  • Predictive maintenance
  • Nordic State of AI report
Menu
  • Learn
  • Inference podcast
  • For media
  • MLOps
  • Predictive maintenance
  • Nordic State of AI report
Linkedin Facebook-square Twitter Instagram Spotify
©2017-2023 All Rights Reserved.

|

Website Privacy Policy / Cookie Policy / Newsletter Privacy Policy / Recruitment Privacy Policy

Manage cookies
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent. Read Cookie Policy
Cookie SettingsAccept All
Manage cookies

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo