Biography
Currently, I am a Research Lead at ByteDance Ads GenAI, where I work on Generative Foundation models.
I got my Master's degree from the Department of Computer Science and Engineering, Zhejiang University.
My previous research focus is on large scale open world visual understanding and pretraining in images & videos.
Research Interests
Visual Foundation Models, Generative Pretrain Models and Large Language Models.
Unified visual generation and understanding the complex world for computer vision.
Large-scale multi-modality generative Pretrain and Alignment.
Open world/vocabulary Visual Recognition : Equip vision with knowledge and semantics.
Highlights
Visual AutoRegressive modeling: a new visual generation Framework elevates Autoregrssive models beyond diffusion, indicate scaling law in image generation.
GLEE is accepted by CVPR 2024 as Highlight, An object-level foundation model for locating and identifying objects in images and videos.
UNINEXT unifies 10 instance perception tasks using a single model with the same model parameters
ByteTrack ranks 1th of the most influential papers in ECCV 2022. Code is available on github with 4.3k stars
Unicorn accepted by ECCV'22 as Oral Presentation . Unicorn accomplishes the great unification of the tracking network architecture and learning paradigm
Sparse R-CNN accepted by CVPR'21. Sparse R-CNN is integrated into several famous frameworks(Detectron2, MMDetection, PaddlePaddle)
Publications [Google Scholar]
(* Equal contribution, † Corresponding Author or Project Lead)
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian, Yi Jiang†, Zehuan Yuan, Bingyue Peng, Liwei Wang
Neural Information Processing Systems NeurIPS 2024 Oral.
Visual AutoRegressive: a new visual generation Framework elevates GPT-style models beyond diffusion, indicate scaling law in image generation.
pdf Project Report code -
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang, Yi Jiang†, Zehuan Yuan, Bingyue Peng, Zuxuan Wu, Yu-Gang Jiang
Neural Information Processing Systems, NeurIPS 2024.
pdf Project code -
Optimization Efficient Open-World Visual Region Recognition
Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang†, Zehuan Yuan, Xiatian Zhu
Neural Information Processing Systems, NeurIPS 2024.
pdf code -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun, Yi Jiang†, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan
arxiv.2406.06525, 2024
Vanilla autoregressive models achieve state-of-the-art image generation performance.
pdf Project code -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang†, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi
European Conference on Computer Vision (ECCV), 2024.
A Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.
pdf Project code -
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu*, Yi Jiang*, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Highlight, 2024.
An object-level foundation model for locating and identifying objects in images and videos.
pdf Project code -
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin, Yi Jiang†, Lizhen Qu, Zehuan Yuan, Jianfei Cai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
pdf code -
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang†, Xiatian Zhu, Zehuan Yuan
British Machine Vision Conference BMVC 2024
pdf code -
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma, Yi Jiang†, Xin Wen, Zehuan Yuan, Xiaojuan Qi
Neural Information Processing Systems (NeurIPS), 2023.
pdf Project code -
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
arXiv:2312.15715, 2024. Extended version of ICCV2023 UniRef
pdf code -
Segment every reference object in spatial and temporal spaces
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
International Conference on Computer Vision (ICCV), 2023
pdf code -
Exploring Transformers for Open-world Instance Segmentation
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
International Conference on Computer Vision (ICCV), 2023
pdf -
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
Qiushan Guo, Chuofan Ma, Yi Jiang, Zehuan Yuan, Yizhou Yu, Ping Luo
International Conference on Computer Vision (ICCV), 2023
pdf Project code -
UNINEXT : Universal Instance Perception as Object Discovery and Retrieval
Bin Yan, Yi Jiang†, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
UNINEXT unifies 10 instance perception tasks and achieves state of the art performance using a single model with the same model parameters.
pdf code -
InstMove: Instance Motion for Object-centric Video Segmentation
Qihao Liu*, Junfeng Wu*, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
pdf code -
Spark : Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Keyu Tian, Yi Jiang††, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan
International Conference on Learning Representations (ICLR), 2023 [Spotlight, notable-top-25% of Accepted Papers].
Spark is the first successful BERT/MAE-style pretraining on any convolutional networks.
pdf code -
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
International Conference on Learning Representations (ICLR), 2023
pdf code -
Multi-Level Contrastive Learning for Dense Prediction Task
Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, Ping Luo
arXiv:2304.02010, 2023
pdf code -
Rethinking Resolution in the Context of Efficient Video Recognition
Chuofan Ma, Qiushan Guo, Yi Jiang†, Zehuan Yuan, Ping Luo, Xiaojuan Qi
Neural Information Processing Systems (NeurIPS), 2022.
pdf code -
Unicorn 🦄 : Towards Grand Unification of Object Tracking
Bin Yan, Yi Jiang†, Peize Sun, Dong Wang, Zehuan Yuan, Ping Luo, Huchuan Lu
European Conference on Computer Vision (ECCV), 2022. [Oral, Presentation Top 2.7%].
For the first time, we accomplish the great unification of various object tracking task with unify network architecture and learning paradigm.
pdf code -
In Defense of Online Models for Video Instance Segmentation
Junfeng Wu, QiHao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
European Conference on Computer Vision (ECCV), 2022. [Oral, Presentation Top 2.7%].
pdf code -
SeqFormer: Sequential Transformer for Video Instance Segmentation
Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai
European Conference on Computer Vision (ECCV), 2022. [Oral, Presentation Top 2.7%].
pdf code -
Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation
Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan
European Conference on Computer Vision (ECCV), 2022.
pdf code -
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang
European Conference on Computer Vision (ECCV), 2022.
ByteTrack ranks 1th of the most influential papers in ECCV 2022.
pdf code -
Language as Queries for Referring Video Object Segmentation
Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
pdf code -
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
pdf code -
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao, Yi Jiang, Bin Wen, Jia Sun, Zehuan Yuan
arXiv:2203.02751, 2022
pdf code -
Objects in Semantic Topology
Shuo Yang, Peize Sun, Yi Jiang, Xiaobo Xia, Ruiheng Zhang, Zehuan Yuan, Changhu Wang, Ping Luo, Min Xu
International Conference on Learning Representations (ICLR), 2022
pdf -
What Makes for End-to-End Object Detection?
Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo
International Conference on Machine Learning (ICML), 2021
pdf code -
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Peize Sun*, Rufeng Zhang*, Yi Jiang*, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. IEEE transactions on pattern analysis and machine intelligence (T-PAMI), 2023
Sparse R-CNN is integrated into several famous frameworks(Detectron2, MMDetection, PaddlePaddle)
pdf code -
TransTrack: Multiple Object Tracking with Transformer
Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo
arXiv:2012.15460, 2021.
pdf code -
Single person dense pose estimation via geometric equivariance consistency
Qinchuan Zhang, Yi Jiang, Qin Zhou, Yiru Zhao, Yao Liu, Hongtao Lu, Xian-Sheng Hua
IEEE Transactions on Multimedia (TMM), 2021
pdf -
Learning to Segment the Tail
Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
pdf code -
SimpleDet - A Simple and Versatile Framework for Object Detection and Instance Recognition
Yuntao Chen, Chenxia Han, Yanghao Li, Zehao Huang, Yi Jiang, Naiyan Wang
Journal of Machine Learning Research (JMLR), 2019
pdf code
Honors and Awards
-
Winner of CVPR 2022 Large-scale Video Object Segmentation Challenge: Video Instance Segmentation
-
Runner up of CVPR 2021 FGVC8 iNaturalist Challenge
-
Runner up of ICCV 2019 WIDER Face and Person Challenge: Face Detection
-
Excellent Mentor Award, ByteDance 2021
-
Outstanding Staff Award, Bytedance 2020, 2024
-
Competition Master in kaggle 2018
Professional Activities
-
Conference Reviewer: CVPR, ICCV, ECCV, ICLR, NeurIPS, ACM MM
-
Journal Reviewer: T-PAMI, TIP, PR, TMM
-
Workshop Organizer: ECCV 2022 Workshop: Multiple Object Tracking and Segmentation in Complex Environments