Research
Our multimedia research group focuses on the processing and analysis of both text and video. We welcome researchers from all over the world to discuss and share their ideas with us. Together we shall impact the world with new innovations and more intelligent systems.
Our research are classified into two main categories:
1) Web-Scale Media Search and Multimedia Question-Answering (MMQA)
2) Text Processing - Question-Answering (QA)
Web-Scale Media Search and Multimedia Question-Answering (MMQA)
With the exponential growth of media contents on the Web, the ability to search for media entities not just based on text annotations, but also visual contents, has become important. Although limited, commercial search engines, such as Bing and Google image search, are now offering separate search services based on text and visual contents, which have their own limitations and often produce unsatisfactory results. As we move toward the next phase of Web-scale media search, we need to tackle several critical issues in media research.
We will focus on tackling 3 critical issues in Web-scale media search. The first is the large-scale visual concept annotation in which we want to determine whether a media entity contains one or more pre-defined concept labels. Here we explore the use of sparse graph-based semi-supervised learning framework (as shown in Figure 1) to perform label noise removal, concept completion, training set construction, and large-scale human labeling. The second issue is indexing to support large-scale kNN content-based search. Here we explore semantic hashing method to help capture the semantic proximities between images and preserve them in the Hamming space. The third issue is interactive search which has been shown to be extremely effective in media search through TRECVID experience. Here we focus on alleviating the problem of insufficient relevant samples during the relevance feedback process. In particular, we introduce a new category of samples called related samples which are those that are related to part of the query and that may help to find other relevant samples in an active learning framework. To consolidate all these research, we plan to develop a Web-scale media search engine to support both general search as well as specialized search on products or surveillance data.

Figure 1. The sparse-graph-based semi-supervised concept annotation framework
In parallel with the above research, we will start a new line of research on multimedia QA to take advantage of the many video answers readily available on social network sites such as the YouTube. In particular, we will focus on answering the how-to and definition type video QA. The first stage of video QA research is to analyze the query to ascertain that the image/video is the appropriate medium for answer and that such answer is available. We next use Yahoo!Answers to expand the textual vocabulary of query to be in-lined with those used in the social network sites. For how-to QA, we explore the use of visual concept space to re-rank the video answers in the consumer electronics domains. The definition video QA aims to extract a video summary to help explain or define a topic. The key observation is that key-shots are those that are duplicated in multiple YouTube returned videos sharing the same theme. Hence by performing near-duplicate shot detection, we were able to identify key shots accurately. The framework of video definition QA is shown in Figure 2. By integrating with text-based QA technique, it enables us to move a step towards providing precise multimedia answers to users questions.

Figure 2. Framework for video definition QA
Text Processing - Question-Answering (QA)
Question answering (QA) aims to find exact answers to users natural language queries, instead of ranked lists of documents as is done in current search engines. It is a major step towards information retrieval instead of document retrieval. As technologies for QA is still inadequate to support precise reliable search in general domain, we focus our research on finding good answers from reliable Web resources such as the Wikipedia (wikiQA) or community QA site like Yahoo!Answers (cQA).
For wikiQA, we focus on answering factoid and definition questions. In particular, we adopt the ontology-based information extraction approach by leveraging on the concept structures and inter-linking between key entities available in Wikipedia to find good answers. For definition QA, we further optimize the information coverage criteria to generate the summary from a Wiki page.
Meanwhile, our research on cQA focuses on finding similar questions in which answers are readily available. It thus transforms the harder problem of finding good answers to one that finds similar questions. The key challenge here is in matching user questions that have large variations in structure, vocabulary and with grammatical errors. To tackle this problem, we explore a synaptic tree matching model that is robust to some variations in sentence structure and grammatical errors (Figure 3a). The other challenge in cQA is that users tend to ask long question with explanatory notes. We thus need to segment the verbose questions into single-topic sub-questions with context, as well as segmenting the answers. Here we will explore a graph-based approach to identify sub-questions and link them to their context (Figure 3b).

Figure 3. (a) Question Retrieval Framework with Syntactic Tree Matching in cQA; (b) Question Retrieval Framework with Multi-sentence Question Segmentation in cQA
Finally, cQA offers a rich information resource on topics frequently asked by many users. One key research is therefore, given a topic, how can we organize all QA pairs around the topic into a knowledge structures to help others better understand the topic.

Figure 4. Framework for automatic organization of QA paris in cQA
With the integration of wikiQA, cQA as well as real-time search with sentiment analysis, we hope to explore a comprehensive system to answer a broad range of user questions.
