Publications

(2024). SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution. Computer Vision and Pattern Recognition (CVPR) 2024.

PDF

(2024). RegionGPT: Towards Region Understanding Vision Language Model. Computer Vision and Pattern Recognition (CVPR) 2024.

PDF

(2024). GenTron: Diffusion Transformers for Image and Video Generation. Computer Vision and Pattern Recognition (CVPR) 2024.

PDF

(2024). Part123: Part-aware 3D Reconstruction from a Single-view Image. SIGGRAPH 2024.

(2024). MotionCtrl: A Unified and Flexible Motion Controller for Video Generation. SIGGRAPH 2024.

PDF Code

(2024). Vdt: General-purpose video diffusion transformers via mask modeling. International Conference on Learning Representation (ICLR) 2024.

PDF

(2024). PixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. International Conference on Learning Representation (ICLR) 2024.

PDF

(2024). OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models. International Conference on Learning Representation (ICLR) 2024.

PDF

(2023). Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023.

PDF

(2023). Raphael: Text-to-image generation via large mixture of diffusion paths. Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023.

PDF

(2023). Foundation Model is Efficient Multimodal Multitask Model Selector. Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023.

PDF

(2023). Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection. Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023.

PDF

(2023). Embodiedgpt: Vision-language pre-training via embodied chain of thought. Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS) 2023.

PDF

(2023). Diffusiondet: Diffusion model for object detection. International Conferenceon Computer Vision (ICCV) 2023.

PDF

(2023). pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation. International Conference on Machine Learning (ICML) 2023.

PDF

(2023). Chipformer: Transferable chip placement via offline decision transformer. International Conference on Machine Learning (ICML) 2023.

PDF

(2022). Compression of Generative Pre-trained Language Models via Quantization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022 Outstanding Paper Award.

PDF

(2022). VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix. International Conference on Machine Learning (ICML) 2022.

(2022). Flow-based Recurrent Belief State Learning for POMDPs. International Conference on Machine Learning (ICML) 2022.

PDF

(2022). CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer. International Conference on Machine Learning (ICML) 2022.

(2022). Scale-Equivalent Distillation for Semi-Supervised Object Detection. Computer Vision and Pattern Recognition (CVPR) 2022.

PDF

(2022). RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs. Computer Vision and Pattern Recognition (CVPR) 2022.

PDF Code

(2022). Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers. Computer Vision and Pattern Recognition (CVPR) 2022.

PDF Code

(2022). Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer. Computer Vision and Pattern Recognition (CVPR) 2022 (Oral).

PDF Code

(2022). Language as Queries for Referring Video Object Segmentation. Computer Vision and Pattern Recognition (CVPR) 2022.

PDF Code

(2022). DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion. Computer Vision and Pattern Recognition (CVPR) 2022.

PDF Project

(2022). Bridging Video-text Retrieval with Multiple Choice Questions. Computer Vision and Pattern Recognition (CVPR) 2022 (Oral).

PDF Project

(2022). Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning. The 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022.

PDF

(2022). Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization. International Conference on Learning Representation (ICLR) 2022.

PDF

(2022). Objects in Semantic Topology. International Conference on Learning Representation (ICLR) 2022.

PDF

(2022). Learning Versatile Neural Architectures by Propagating Network Codes. International Conference on Learning Representation (ICLR) 2022.

PDF Code Project

(2022). Dynamic Token Normalization improves Vision Transformers. International Conference on Learning Representation (ICLR) 2022.

PDF

(2022). CycleMLP: A MLP-like Architecture for Dense Prediction. International Conference on Learning Representation (ICLR) 2022 (Oral).

PDF Code

(2022). Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization. Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI), 2022.

PDF Code

(2021). Watch Only Once: An End-to-End Video Action Detection Framework. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). STAR: A Semantic-aware Transformer for Real-time Image Enhancement. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

PDF Code

(2021). Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

(2021). Rethinking the Pruning Criteria for Convolutional Neural Network. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

PDF

(2021). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. International Conferenceon Computer Vision (ICCV) 2021 Oral.

PDF

(2021). Model-Based Reinforcement Learning via Imagination with Derived Memory. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

(2021). End-to-End Dense Video Captioning with Parallel Decoding. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

(2021). DetCo: Unsupervised Contrastive Learning for Object Detection. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). Compressed Video Contrastive Learning. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

(2021). Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). An Empirical Investigation of Representation Learning for Imitation. Thirty-fifth Annual Conference on Neural Information Processing Systems (NeurIPS) 2021.

PDF

(2021). Adversarial Robustness for Unsupervised Domain Adaptation. International Conferenceon Computer Vision (ICCV) 2021.

PDF

(2021). RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021.

PDF

(2021). Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation. Proc. of Medical Image Computing and Computer-Assisted Interventions (MICCAI) 2021.

PDF

(2021). When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). What Makes for End-to-End Object Detection?. International Conference on Machine Learning (ICML), 2021.

(2021). VPIPE: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training. IEEE Transactions on Parallel and Distributed Systems (TPDS) 2021.

(2021). ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Sparse R-CNN: End-to-End Object Detection With Learnable Proposals. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Segmenting Transparent Objects in the Wild with Transformer. International Joint Conference on Artificial Intelligence (IJCAI) 2021.

PDF

(2021). PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021.

PDF

(2021). Parser-Free Virtual Try-On via Distilling Appearance Flows. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Multi-Compound Transformer for Accurate Biomedical Image Segmentation. Proc. of Medical Image Computing and Computer-Assisted Interventions (MICCAI) 2021.

(2021). HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021.

PDF

(2021). Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs. International Conference on Learning Representations (ICLR) Oral, 2021.

PDF

(2021). Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On. Computer Vision and Pattern Recognition (CVPR) 2021.

PDF

(2021). Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution. International Conference on Machine Learning (ICML), 2021.

(2021). A Unified Multi-Scenario Attacking Network for Visual Object Tracking. AAAI Conference on Artificial Intelligence (AAAI) 2021.

(2020). Switchable Normalization for Learning-to-Normalize Deep Representation. IEEE transactions on pattern analysis and machine intelligence (TPAMI) 2020.

PDF

(2020). UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation. Proc. of Medical Image Computing and Computer-Assisted Interventions (MICCAI) 2020.

PDF

(2020). Whole-Body Human Pose Estimation in the Wild. European Conference on Computer Vision (ECCV) 2020.

PDF

(2020). Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

PDF

(2020). Segmenting Transparent Objects in the Wild. European Conference on Computer Vision (ECCV) 2020.

PDF

(2020). Polarmask: Single shot instance segmentation with polar representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

PDF

(2020). Online Knowledge Distillation via Collaborative Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

PDF

(2020). Maskgan: Towards diverse and interactive facial image manipulation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

PDF

(2020). Learning depth-guided convolutions for monocular 3d object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR) 2020.

PDF

(2020). Learning a Reinforced Agent for Flexible Exposure Bracketing Selection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

PDF

(2020). How Does BN Increase Collapsed Neural Network Filters?. arXiv preprint arXiv:2001.11216.

PDF

(2020). Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. European Conference on Computer Vision (ECCV) 2020.

PDF

(2020). Exemplar Normalization for Learning Deep Representation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

PDF

(2020). Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.. AAAI Conference on Artificial Intelligence (AAAI).

PDF

(2020). Domain-Adaptive Few-Shot Learning. arXiv preprint arXiv:2003.08626.

(2020). Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation. European Conference on Computer Vision (ECCV) 2020.

PDF

(2020). Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning. Preprint Paper, arXiv preprint arXiv:2004.11627.

PDF

(2020). Channel equilibrium networks for learning deep representation. International Conference on Machine Learning (ICML) 2020.

PDF

(2020). AdaX: Adaptive Gradient Descent with Exponential Long Term Memory. arXiv preprint arXiv:2004.09740.

PDF

(2020). 3D Human Mesh Regression with Dense Correspondence. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

PDF

(2019). CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization. International Conference in Computer Vision (ICCV).

PDF

(2019). Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid. International Conference in Computer Vision (ICCV) 2019.

(2019). Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks. International Conference in Computer Vision (ICCV).

PDF

(2019). Deep Self-Learning From Noisy Labels. International Conference in Computer Vision (ICCV).

PDF

(2019). Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once. International Conference in Computer Vision (ICCV).

PDF

(2019). Vision-Infused Deep Audio Inpainting. International Conference in Computer Vision (ICCV).

(2019). Switchable Whitening for Deep Representation Learning. International Conference in Computer Vision (ICCV).

PDF

(2019). Differentiable Dynamic Normalization for Learning Deep Representation. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:4203-4211.

PDF

(2019). Towards Understanding Regularization in Batch Normalization. International Conference on Learning Representation (ICLR).

PDF

(2019). Human Centric Visual Analysis with Deep Learning. Springer; 1st ed. 2019 edition.

PDF

(2019). Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?. arXiv preprint arXiv:1811.07727.

PDF Code

(2019). Differentiable learning-to-normalize via switchable normalization. International Conference on Learning Representation (ICLR).

PDF Code

(2019). DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. Computer Vision and Pattern Recognition (CVPR).

PDF Code Dataset

(2018). Two at once: Enhancing learning and generalization capacities via ibn-net. Proceedings of the European Conference on Computer Vision (ECCV).

PDF Code

(2018). Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos. ACM MM.

(2018). Talking Face Generation by Adversarially Disentangled Audio-Visual Representation. AAAI Conference on Artificial Intelligence (AAAI), Oral.

PDF Code

(2018). Spatial as deep: Spatial cnn for traffic scene understanding. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI).

PDF Code Dataset

(2018). Scheduling Large-scale Distributed Training via Reinforcement Learning. 2018 IEEE International Conference on Big Data (Big Data).

PDF

(2018). SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification. IEEE Transaction on Image Processing (TIP), 2019.

(2018). Mix-and-match tuning for self-supervised semantic segmentation. Thirty-Second AAAI Conference on Artificial Intelligence.

(2018). Kalman Normalization: Normalizing Internal Representations Across Network Layers. Advances in Neural Information Processing Systems (NeurlPS).

PDF Code

(2018). From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision.

(2018). Faceness-net: Face detection through deep facial part responses. IEEE transactions on pattern analysis and machine intelligence.

(2018). FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

PDF

(2018). FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis. arXiv preprint arXiv:1812.01288.

(2018). Deep learning markov random field for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence (PAMI).

PDF

(2018). CUImage: A Neverending Learning Platform on a Convolutional Knowledge Graph of Billion Web Images. 2018 IEEE International Conference on Big Data (Big Data).

PDF

(2018). Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches. arXiv preprint arXiv:1802.03133.

(2017). Video object segmentation with re-identification. arXiv preprint arXiv:1708.00197.

PDF

(2017). Video Classification via Relational Feature Encoding Networks. Proceedings of the Workshop on Large-Scale Video Classification Challenge.

(2017). Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. Proceedings of the 2017 ACM on Multimedia Conference.

(2017). Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

PDF

(2017). Learning object interactions and descriptions for semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

(2017). Learning deep architectures via generalized whitened neural networks. International Conference on Machine Learning (ICML).

PDF

(2017). EigenNet: Towards Fast and Structural Learning of Deep Neural Networks. International Joint Conference on Artificial Intelligence (IJCAI).

PDF

(2017). DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence.

(2017). Deep Learning Face Attributes for Detection and Alignment. Visual Attributes.

(2017). Deep Dual Learning for Semantic Image Segmentation. International Conference on Computer Vision (ICCV).

PDF

(2016). Wider face: A face detection benchmark. Proceedings of the IEEE conference on computer vision and pattern recognition.

(2016). Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence.

(2016). Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.. IEEE Trans. Neural Netw. Learning Syst. (TNNLS).

PDF

(2016). Joint face representation adaptation and clustering in videos. European conference on computer vision.

(2016). Fashion Landmark Detection in the Wild. European Conference on Computer Vision (ECCV).

PDF Dataset

(2016). Face Model Compression by Distilling Knowledge from Neurons. AAAI Conference on Artificial Intelligence (AAAI).

PDF

(2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

PDF Dataset

(2016). Clothes Co-Parsing via Joint Image Segmentation and Labeling with Application to Clothing Retrieval. IEEE Transactions on Multimedia.

(2015). Semantic image segmentation via deep parsing network. Proceedings of the IEEE International Conference on Computer Vision (ICCV).

PDF

(2015). Pedestrian detection aided by deep learning semantic tasks. CVPR.

(2015). Learning to recognize pedestrian attribute. arXiv preprint arXiv:1501.00901.

PDF

(2015). Learning social relation traits from face images. Proceedings of the IEEE International Conference on Computer Vision.

(2015). From facial parts responses to face detection: A deep learning approach. Proceedings of the IEEE International Conference on Computer Vision.

(2015). Deepid-net: Deformable deep convolutional neural networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

(2015). Deep Representation Learning with Target Coding.. AAAI.

(2015). Deep Learning Strong Parts for Pedestrian Detection. ICCV.

(2015). Deep Learning Face Attributes in the Wild. IEEE International Conference on Computer Vision (ICCV).

PDF Dataset

(2015). A large-scale car dataset for fine-grained categorization and verification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

(2014). Switchable deep network for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

(2014). Recover canonical-view faces in the wild with deep neural networks. arXiv preprint arXiv:1404.3543.

(2014). Pedestrian attribute recognition at far distance. Proceedings of the 22nd ACM international conference on Multimedia (MM).

PDF Dataset

(2014). Multi-view perceptron: a deep model for learning face identity and view representations. Advances in Neural Information Processing Systems.

(2014). Learning and Transferring Multi-task Deep Representation for Face Alignment.

(2014). Facial landmark detection by deep multi-task learning. European Conference on Computer Vision.

(2014). Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint arXiv:1409.3505.

(2014). Deep learning multi-view representation for face recognition. arXiv preprint arXiv:1406.6947.

(2014). Clothing Co-Parsing by Joint Image Segmentation and Labeling. IEEE Conference on Computer Vision and Pattern Recognition.

(2013). Pedestrian parsing via deep decompositional network. Proceedings of the IEEE international conference on computer vision.

(2013). Deep learning identity-preserving face space. Proceedings of the IEEE International Conference on Computer Vision.

(2013). A deep sum-product architecture for robust facial attributes analysis. Proceedings of the IEEE International Conference on Computer Vision (ICCV).

PDF

(2012). Representing and recognizing objects with massive local image patches. Pattern Recognition.

(2012). Joint semantic segmentation by searching for compatible-competitive references. Proceedings of the 20th ACM international conference on Multimedia.

(2012). Hierarchical face parsing via deep learning. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

PDF Video

(2010). Semantics-driven portrait cartoon stylization. Image Processing (ICIP), 2010 17th IEEE International Conference on.

(2010). Learning shape detector by quantizing curve segments with multiple distance metrics. European Conference on Computer Vision.

(2010). A Discriminative Model for Object Representation and Detection via Sparse Features. Pattern Recognition (ICPR), 2010 20th International Conference on.

(2009). Hierarchical 3D perception from a single image. Image Processing (ICIP), 2009 16th IEEE International Conference on.