Area 1: Generation of Structured knowledge from Heterogonous User-Generated-Content

With the growth in volume and variety of UGC on the web, the way that people seek and consume information and knowledge is changing. Given the sheer volume of UGCs in any given topic, it becomes harder and extremely time-consuming for users to grasp and follow the evolution of knowledge even in their domains of interests. To better enhance the aggregation, communication, and corroboration of insights and knowledge of the crowd, this research explores techniques to automatically analyse, organize, and summarize large amount of UGCs on a specific topic so as to encourage macro-level and micro-level information access and knowledge creation.


Our system embodies three major components and research topics:

1)  Construction of dynamic topic-specific knowledge structures from heterogonous UGCs such as the blogs, cQAs, forums and tweets.

2)  Organization and visualization of unstructured UGCs based on the knowledge structure for complex information needs (see Figure 1.1).

3)  Supports for browsing the guided and updated summary, as well as question-answering, based on the dynamic knowledge structure (see Figure 1.2).


Figure 1.1. Knowledge structures extracted from a collection of reviews on MAC Cosmetics. It shows the results of automatic organization of major MAC products and attributes such as Mascara and Gel in the graph-based structure (a), or as a hierarchical tree structure derived from (a) as shown in (b).



Figure 1.2: User interface for browsing and searching products related community-based question answer pairs with knowledge hierarchies as guides and overviews.


The generation of the above structures exploits a wide range of knowledge available on the Web on most topics. It leverages structured knowledge from Wikipedia or Blogs, semi-structured knowledge from cQA and forums, and the unstructured but live information sources from twitters. The resulting knowledge structures facilitate user browsing, querying, and question-answering on any topics. Most importantly, this structure is dynamic and can be re-generated from the latest UGC sources.