CUImage Dataset

CUImage Dataset

Pretraining visual features by image classification on ImageNet is an indispensable step towards many advanced perception systems in the last decade. ImageNet is the most prevalent database for supervised pretraining of image features. Unlike ImageNet assuming that the visual concepts are static and independent with each other, this work presents a neverending learning platform, termed CUImage, which learns visual representation on a knowledge graph of billions of images, whose data scale is several orders of magnitude larger than ImageNet. A novel dynamic graph convolutional network (GCN) is proposed to learn visual concepts. Once the new data are presented, the GCN is updated dynamically where new concepts can be discovered or existing concepts can be merged. This is enabled by three main components in CUImage, including Data Dispersion (DD), Data Management and Mining (DMM), and Data Evaluation (DE). These three components are built on top of a computer cluster with thousands of GPU/CPU cores and a parallel storage of petabytes. So far, CUImage has processed and managed more than 2 million visual concepts of 2 billion images. To evaluate the learned representation, we transfer the pretrained features to several challenging benchmarks such as image recognition on ImageNet and object detection in MS-COCO. We achieve state-of-the-art results, significantly surpassing the systems that used ImageNet for pretraining.

Diagram of CUImage, including a data dispersion (DD) system with an interactive user interface, a data management and mining (DMM) system driven by a large knowledge graph, and a data evaluation (DE) system equipped with various deep learning technologies. Each system has many different modules. These three systems are built on an internal cluster with thousands of GPU/CPU cores and a distributed file system of petabytes. The computing nodes and the storage are interconnected by using infiniband.

(a) shows the categories of ImageNet which have independent and plain relationships. (b) shows the hierarchy of the grammatical concept tree (GCT) in CUImage.

The mlKG is visualized in the upper-left corner, where a small region of ‘tiger cat’ is shown in detail. The concept of ‘tiger cat’ is a in the first level with score 0.94. It is related to the other concepts in the second level as shown in the upper-right corner. Given a set of images of each concept, the score indicates the probability of an image actually belongs to this concept. The scores help us select the importance images in a semi-supervised way.