
NUS-WIDE
Citation:
Tat-Seng Chua, Jinhui
Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao Zheng. "NUS-WIDE: A Real-World Web Image
Database from National University of Singapore",
ACM International Conference on Image and Video Retrieval.
Greece. Jul. 8-10, 2009.
[pdf]
To
facilitate your research, we have uploaded the URLs of all the images except
those that have been removed or are inaccessible now.
[
URL
]
For more detailed descriptions of the dataset, see specification.
1. Low-Level Features
2. Tags
3. Groundtruth
4. Image List
5. Concept List
We provide a light version of NUS-WIDE dataset, named NUS-WIDE-LITE along with one object image dataset, named NUS-WIDE-OBJECT and one scene image dataset, named NUS-WIDE-SCENE. The image dataset can be obtained via sending a request email to us. Specifically, the researchers interested in the dataset should download and fill up the Agreement and Disclaimer Form and send it back to us. We will then email you the instructions to download the dataset at our discretion.
1. NUS-WIDE-LITE
[Link1]
[Link2]
(110 MB file)
(specification)
2. NUS-WIDE-OBJECT
[Link1]
[Link2]
(62 MB file)
(specification)
3. NUS-WIDE-SCENE
[Link1]
[Link2]
(70 MB file)
(specification)
Digital images have become more easily accessible following the rapid advances in digital photography, networking and storage technologies. Some photo sharing websites, such as Flickr and Picasa, are popular in daily life. For example, there are more than 2,000 images being uploaded to Flickr every minute. During peak times, up to 12,000 photos are being served per second, and the record for the number of photos uploaded per day exceeds 2 million photos. When users share their photos, they will typically give several tags to describe the contents of these images. Out of these archives, several questions naturally arise for multimedia research. For example, what can we do with millions of images and their related tags? How can general image indexing and search benefit from the community shared images and tags?
In fact, how to improve the performance of existing image annotation and
retrieval approaches by using machine learning and other artificial intelligent
technologies has attracted much attention in multimedia research community.
However, for learning based methods to be effective, a large number of balanced
labeled samples is required, which typically comes from users during an
interactive manual process. This is very time-consuming and labor-intensive. In
order to reduce this manual effort, many semi-supervised learning or active
learning approaches have been proposed. Nevertheless, there is still a need to
manually annotate many images to train the learning models. On the other hand,
the image sharing sites offer us great opportunity to "freely" acquire a large
number of images with annotated tags. The tags for the images are collectively
annotated by a large group of heterogeneous users. It is believed that although
most tags
are correct, there are many noisy and missing tags. Thus if we can learn the
accurate models from these user-shared images together with their associated
noisy tags, then much manual effort in image annotation can be eliminated. In
this case, content-based image annotation and retrieval can benefit much from
the community contributed images and tags.
In this paper, we present four research issues on mining the community
contributed images and tags for image annotation and retrieval. The issues are:
(1) How to utilize the community contributed images and tags to annotate
non-tagged images. (2) How to leverage the models learned from these images and
associated tags to improve the retrieval of web images with tags or surrounding
text. (3) How to achieve tag completion which means the removal of the noise in
the tag set and the enrichment of missing tags. (4) How to construct effective
training set for each concept and the overall concept network from the available
information sources. To these ends, we construct a benchmark dataset to focus
research efforts on these issues. The dataset includes a set of images crawled
from Flickr, together with the associated tags of these images, as well as the
semantic ground-truth of 81 concepts for these images. We also extract six
low-level visual features, including 64-D color histogram in LAB color space,
144-D color correlogram in HSV color space, 73-D edge distribution histogram,
128-D wavelet texture, 225-D block-wise LAB-based color moments, which are
extracted over 5*5 fixed grid partitions, and 500-D bag of visual words. For the
image annotation task, we also provide a baseline based on the k-NN method. The
set of low-level features for images, their associated tags, ground-truth, and
baseline results can be downloaded at http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.
To our knowledge, this is the largest real-world web image dataset comprising
over 269,000 images with over 5,000 user-provided tags, and ground-truth of 81
concepts for the entire dataset. The dataset is much larger than the popularly
available Corel and Caltech 101 datasets. Though some datasets comprise over 3
million images, they only have ground-truth for a small fraction of images. Our
proposed NUS-WIDE dataset has the ground-truth for the entire dataset.
For problems or questions regarding this web
site contact The Web Master.
Last updated: Jan, 28 2009