LMS focuses on research on tackling large-scale real-life problems arising from social media. In particular, it focuses on analysing the live, big, multi-source and multi-faceted data arising from user-generated contents (UGCs). Such data is available in a myriad of sources, including live sharing sites like twitter, mobile sharing sites like 4Square, Instagram, forums and blogs, traditional image and video sharing sites like Flickr and YouTube, and the various community question-answering sites like Wiki-Answers, Yahoo!Answers, etc, as well as their counterparts in China. To analyse these contents, we need to deal with multimedia, multilingual and multimodal data sources. Most importantly we need to tackle the social aspects of these contents, such as the user (id) relations, communities, and key users with respect to any organizations or topics. We also need to deal with cross lingual issues and cross domain data types.
Key research focuses under this lab includes: (a) reliable strategies for harvesting representative UGCs; (b) indexing and retrieval of huge media resources arising from these media; (c) organization of huge amount of unstructured UGCs and users on any topic into structured knowledge and user communities; and (d) fusion of UGCs to generate analytics related to location, people, topic and organization. To support these high-level tasks, we also carry out basic common research in the areas of analysis, retrieval, fusion and question answering of text, live discussion streams, images and videos. Our research is motivated by real-life problems, and emphasizes on the ability to handle large and live data streams.
The research is organized into 7 sub-areas as follows: