Yi Jiang
I am a Research Lead at Media Intelligence of ByteDance Ads Generative AI, where I work on computer vision and machine learning.
I got my Master's degree from the Department of Computer Science and Engineering, Zhejiang University.
My previous research focus is on instance understanding in spatial & temporal, including large scale & open-vocabulary object recognition, detection, segmentation, tracking and Pretraining. Most of my works have released codes on Github, with over 12.0K stars.
Recently, my research focuses on visual foundation models, deep generative models(Diffusion models and its application) and large language models.
Email  / 
Google Scholar  / 
Github
|
|
Research Highlight
UNINEXT accepted by CVPR'23. UNINEXT unifies 10 instance perception tasks using a single model with the same model parameters
ByteTrack ranks 1th of the most influential papers in ECCV 2022. Code is available on github with 3.4k stars
Spark accepted by ICLR'23 as Spotlight. Spark is the first successful BERT/MAE-style pretraining on any convolutional networks
Unicorn accepted by ECCV'22 as Oral Presentation . Unicorn accomplishes the great unification of the tracking network architecture and learning paradigm
IDOL and Seqformer accepted by ECCV'22 as Oral Presentation , serving as strong baseline for video instance segmentation
Sparse R-CNN accepted by CVPR'21. Sparse R-CNN is integrated into several famous frameworks(Detectron2, MMDetection, PaddlePaddle)
Publications (* equal contribution, † corresponding author)
|
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma, Yi Jiang†, Xin Wen,Zehuan Yuan, Xiaojuan Qi.
Advances in Neural Information Processing Systems(NeurIPS), 2023
Paper
|
|
Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo.
International Conference on Computer Vision (ICCV), 2023
Paper
|
|
Exploring Transformers for Open-world Instance Segmentation
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo.
International Conference on Computer Vision (ICCV), 2023
Paper
|
|
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
Qiushan Guo, Chuofan Ma, Yi Jiang, Zehuan Yuan, Yizhou Yu, Ping Luo.
International Conference on Computer Vision (ICCV), 2023
Paper
/
Project
|
|
UNINEXT : Universal Instance Perception as Object Discovery and Retrieval
Bin Yan, Yi Jiang†, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu
Computer Vision and Pattern Recognition (CVPR), 2023
Paper
/
Code
UNINEXT unifies 10 instance perception tasks and achieves state of the art performance using a single model with the same model parameters.
|
|
InstMove: Instance Motion for Object-centric Video Segmentation
Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai
Computer Vision and Pattern Recognition (CVPR), 2023
Paper
/
Code
|
|
Spark : Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Keyu Tian, Yi Jiang†, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan
International Conference on Learning Representations (ICLR), 2023 [Spotlight notable-top-25% of Accepted Papers]
Paper
/
Code
Spark is the first successful BERT/MAE-style pretraining on any convolutional networks
|
|
Learning Object-Language Alignments for Open-Vocabulary Object Detection,
Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
International Conference on Learning Representations (ICLR), 2023
Paper
/
Code
|
|
Rethinking Resolution in the Context of Efficient Video Recognition
Chuofan Ma, Qiushan Guo, Yi Jiang†, Zehuan Yuan, Ping Luo, Xiaojuan Qi†
Advances in Neural Information Processing Systems (NeurIPS), 2022
Paper
/
Code
|
|
Unicorn 🦄 : Towards Grand Unification of Object Tracking
Bin Yan, Yi Jiang†, Peize Sun, Dong Wang, Zehuan Yuan, Ping Luo, Huchuan Lu
European Conference on Computer Vision (ECCV), 2022 [Oral Presentation Top 2.7%]
Paper
/
Code
For the first time, we accomplish the great unification of various object tracking task with unify network architecture and learning paradigm
|
|
In Defense of Online Models for Video Instance Segmentation
Junfeng Wu, QiHao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
European Conference on Computer Vision (ECCV), 2022 [Oral Presentation Top 2.7%]
Paper
/
Code
|
|
SeqFormer: Sequential Transformer for Video Instance Segmentation
Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai
European Conference on Computer Vision (ECCV), 2022 [Oral Presentation Top 2.7%]
Paper
/
Code
|
|
Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation
Chuang Lin, Yi Jiang†, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan
European Conference on Computer Vision (ECCV), 2022
Paper
/
Code
|
|
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang
European Conference on Computer Vision (ECCV), 2022
Paper
/
Code
ByteTrack ranks 1th of the most influential papers in ECCV 2022.
|
|
Language as Queries for Referring Video Object Segmentation
Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo
Computer Vision and Pattern Recognition (CVPR), 2022
Paper
/
Code
|
|
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo
Computer Vision and Pattern Recognition (CVPR), 2022
Paper
/
Code
|
|
Objects in Semantic Topology
Shuo Yang, Peize Sun, Yi Jiang, Xiaobo Xia, Ruiheng Zhang, Zehuan Yuan, Changhu Wang, Ping Luo, Min Xu
International Conference on Learning Representations (ICLR), 2022
Paper
|
|
What Makes for End-to-End Object Detection?
Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo
International Conference on Machine Learning (ICML), 2021
Paper
/
Code
|
|
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Peize Sun*, Rufeng Zhang*, Yi Jiang*, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo
Computer Vision and Pattern Recognition (CVPR), 2021
Paper
/
Code
|
|
Single person dense pose estimation via geometric equivariance consistency
Qinchuan Zhang, Yi Jiang, Qin Zhou, Yiru Zhao, Yao Liu, Hongtao Lu, Xian-Sheng Hua
IEEE Transactions on Multimedia (TMM), 2021
Paper
|
|
Learning to Segment the Tail
Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang
Computer Vision and Pattern Recognition (CVPR), 2020
Paper
/
Code
|
|
SimpleDet - A Simple and Versatile Framework for Object Detection and Instance Recognition
Yuntao Chen, Chenxia Han, Yanghao Li, Zehao Huang, Yi Jiang, Naiyan Wang, Zhaoxiang Zhang
Journal of Machine Learning Research (JMLR), 2019
Paper
/
Code
|
Preprints (* equal contribution, † corresponding author)
|
Multi-Level Contrastive Learning for Dense Prediction Task
Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, and Ping Luo.
arXiv, 2023
arXiv
/
Code
|
|
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang†, Xiatian Zhu, Zehuan Yuan
arXiv, 2022
arXiv
/
Code
|
|
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao, Yi Jiang†, Bin Wen, Jia Sun, Zehuan Yuan
arXiv, 2022
arXiv
/
Code
|
|
TransTrack: Multiple Object Tracking with Transformer
Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo
arXiv, 2020
arXiv
/
Code
|
Awards
Winner of CVPR 2022 Large-scale Video Object Segmentation Challenge: Video Instance Segmentation
Runner up of CVPR 2021 FGVC8 iNaturalist Challenge
Runner up of ICCV 2019 WIDER Face and Person Challenge: Face Detection
Excellent Mentor Award, ByteDance 2021
Outstanding Staff Award, Bytedance 2020
Competition Master in kaggle 2018
Professional activities
|