Representation Learning for Multimodal Content Understanding

A talk by Rishubh Parihar
Deep Learning, Computer Vision & Data Science, ShareChat

Register to watch this content

By submitting your email you agree to the Terms of Service and Privacy Statement
Watch this content now

Stages covered by this talk

About this talk

Millions of content posts are posted each day on social media platforms, which have rich information present in more than one of these modalities: image, text, video, and audio. To holistically understand such content, AI models are required to learn a unified representation of multimodal data that effectively captures information from all of the present modalities. There are two important aspects of Multi-modal Representation Learning: firstly designing deep learning architectures to effectively integrate information from each modality, and secondly designing the training objectives that require good understanding from all the modalities to solve the task. In the talk, we will discuss some of these approaches for multi-modal representation learning.

Have you got yours yet?

Our All-Access Passes are a must if you want to get the most out of this event.

Check them out

Proudly supported by

Want to sponsor this event? contact us.