Ping Luo | Ping Luo (羅平)

Latest

GenTron: Diffusion Transformers for Image and Video Generation
RegionGPT: Towards Region Understanding Vision Language Model
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Part123: Part-aware 3D Reconstruction from a Single-view Image
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
PixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Vdt: General-purpose video diffusion transformers via mask modeling
Embodiedgpt: Vision-language pre-training via embodied chain of thought
Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Foundation Model is Efficient Multimodal Multitask Model Selector
Raphael: Text-to-image generation via large mixture of diffusion paths
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks
Diffusiondet: Diffusion model for object detection
Chipformer: Transferable chip placement via offline decision transformer
pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Compression of Generative Pre-trained Language Models via Quantization
CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer
Flow-based Recurrent Belief State Learning for POMDPs
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Bridging Video-text Retrieval with Multiple Choice Questions
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Language as Queries for Referring Video Object Segmentation
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs
Scale-Equivalent Distillation for Semi-Supervised Object Detection
Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning
CycleMLP: A MLP-like Architecture for Dense Prediction
Dynamic Token Normalization improves Vision Transformers
Learning Versatile Neural Architectures by Propagating Network Codes
Objects in Semantic Topology
Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
Adversarial Robustness for Unsupervised Domain Adaptation
An Empirical Investigation of Representation Learning for Imitation
Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames
Compressed Video Contrastive Learning
DetCo: Unsupervised Contrastive Learning for Object Detection
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
End-to-End Dense Video Captioning with Parallel Decoding
Model-Based Reinforcement Learning via Imagination with Derived Memory
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Rethinking the Pruning Criteria for Convolutional Neural Network
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
STAR: A Semantic-aware Transformer for Real-time Image Enhancement
Watch Only Once: An End-to-End Video Action Detection Framework
RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning
Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation
A Unified Multi-Scenario Attacking Network for Visual Object Tracking
Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On
Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs
HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers
Multi-Compound Transformer for Accurate Biomedical Image Segmentation
Parser-Free Virtual Try-On via Distilling Appearance Flows
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
Segmenting Transparent Objects in the Wild with Transformer
Sparse R-CNN: End-to-End Object Detection With Learnable Proposals
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
VPIPE: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
What Makes for End-to-End Object Detection?
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
Switchable Normalization for Learning-to-Normalize Deep Representation
UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation
Recruitment and Opportunities
3D Human Mesh Regression with Dense Correspondence
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Channel equilibrium networks for learning deep representation
Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
Domain-Adaptive Few-Shot Learning
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow.
Exemplar Normalization for Learning Deep Representation
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
How Does BN Increase Collapsed Neural Network Filters?
Learning a Reinforced Agent for Flexible Exposure Bracketing Selection
Learning depth-guided convolutions for monocular 3d object detection
Maskgan: Towards diverse and interactive facial image manipulation
Online Knowledge Distillation via Collaborative Learning
Polarmask: Single shot instance segmentation with polar representation
Segmenting Transparent Objects in the Wild
Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content
Whole-Body Human Pose Estimation in the Wild
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks
Deep Self-Learning From Noisy Labels
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
Vision-Infused Deep Audio Inpainting
Switchable Whitening for Deep Representation Learning
Differentiable Dynamic Normalization for Learning Deep Representation
Understanding Normalization in Deep Learning
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
Differentiable learning-to-normalize via switchable normalization
Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?
Human Centric Visual Analysis with Deep Learning
Towards Understanding Regularization in Batch Normalization
Learning-to-Learn-to-Normalize: Algorithms, Applications and Theory
浅谈深度学习：归一化中的正则与泛化
WIDER Face and Pedestrian Challenge 2018
Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches
CUImage: A Neverending Learning Platform on a Convolutional Knowledge Graph of Billion Web Images
Deep learning markov random field for semantic segmentation
FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis
FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis
Faceness-net: Face detection through deep facial part responses
From facial expression recognition to interpersonal relation prediction
Kalman Normalization: Normalizing Internal Representations Across Network Layers
Mix-and-match tuning for self-supervised semantic segmentation
SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification
Scheduling Large-scale Distributed Training via Reinforcement Learning
Spatial as deep: Spatial cnn for traffic scene understanding
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos
Two at once: Enhancing learning and generalization capacities via ibn-net
Deep Dual Learning for Semantic Image Segmentation
Deep Learning Face Attributes for Detection and Alignment
DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks
EigenNet: Towards Fast and Structural Learning of Deep Neural Networks
Learning deep architectures via generalized whitened neural networks
Learning object interactions and descriptions for semantic image segmentation
Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade
Unconstrained fashion landmark detection via hierarchical recurrent transformer networks
Video Classification via Relational Feature Encoding Networks
Video object segmentation with re-identification
Clothes Co-Parsing via Joint Image Segmentation and Labeling with Application to Clothing Retrieval
Deepfashion: Powering robust clothes recognition and retrieval with rich annotations
Face Model Compression by Distilling Knowledge from Neurons
Fashion Landmark Detection in the Wild
Joint face representation adaptation and clustering in videos
Learning Compositional Shape Models of Multiple Distance Metrics by Information Projection.
Learning deep representation for face alignment with auxiliary attributes
Wider face: A face detection benchmark
A large-scale car dataset for fine-grained categorization and verification
Deep Learning Face Attributes in the Wild
Deep Learning Strong Parts for Pedestrian Detection
Deep Representation Learning with Target Coding.
Deepid-net: Deformable deep convolutional neural networks for object detection
From facial parts responses to face detection: A deep learning approach
Learning social relation traits from face images
Learning to recognize pedestrian attribute
Pedestrian detection aided by deep learning semantic tasks
Semantic image segmentation via deep parsing network
Clothing Co-Parsing by Joint Image Segmentation and Labeling
Deep learning multi-view representation for face recognition
Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection
Facial landmark detection by deep multi-task learning
Learning and Transferring Multi-task Deep Representation for Face Alignment
Multi-view perceptron: a deep model for learning face identity and view representations
Pedestrian attribute recognition at far distance
Recover canonical-view faces in the wild with deep neural networks
Switchable deep network for pedestrian detection
A deep sum-product architecture for robust facial attributes analysis
Deep learning identity-preserving face space
Pedestrian parsing via deep decompositional network
Hierarchical face parsing via deep learning
Joint semantic segmentation by searching for compatible-competitive references
Representing and recognizing objects with massive local image patches
A Discriminative Model for Object Representation and Detection via Sparse Features
Learning shape detector by quantizing curve segments with multiple distance metrics
Semantics-driven portrait cartoon stylization
Hierarchical 3D perception from a single image