Area 7: Bridging the “Intention Gap” in Multimedia Search & Large-Scale Ontology-based Semantic Learning

Area 7: Bridging the “Intention Gap” in Multimedia Search & Large-Scale Ontology-based Semantic Learning

The proliferation of user-generated contents in the form of text, images, and videos has led to a surge of research into “user-centric” multimedia applications, such as multimedia search, recommendation, and advertisement etc. Pivotal to these research and development efforts is the understanding of users’ intents in large-scale media search. In particular, we have been working to narrow down the “Intention Gap” in multimedia search. As illustrated in Figure 7.1, there often exists an “Intention Gap” between a user’s search intent and the query, posing difficulty in understanding user’s search intent by the search engine and leading to unsatisfactory search results.

Figure 7.1. “Intention Gap” in multimedia search.


We working along two lines of research to tackle the “Intension Gap” problem. One is to exploit user feedbacks to infer user search intent, and the other is to sense user intent from user-generated content. Figure 7.2 shows the snapshots of some of our proposed techniques, including (a) Visual Query Suggestion, a new query suggestion scheme that provides users with both keyword and image suggestions; (b) Related Sample Feedback, introducing a new category of samples termed “related samples“, which are related to part of the query and may help to find the relevant samples in a relevance feedback mechanism; and (c) Attribute Feedback, a recently developed interactive multimedia search system that goes beyond conventional relevance feedback by allowing users to provide feedbacks on semantic attributes, which act as the bridge connecting user’s search intent and low-level visual features.

Figure 7.2. Snapshots of our proposed techniques towards bridging the “Intention Gap”.


In parallel with the above research, we are starting a new line of research on large-scale ontology-based learning of semantics from multimedia content, aiming for semantic representation of multimedia entities. We will further exploit semantic representation to replace or complement visual representation to advance current multimedia applications.