Email: jiangyi0425-at-gmail.com jiangyi.enjoy-at-bytedance.com

Biography

Currently, I am a Research Lead at ByteDance Seed, where I work on Generative Foundation models.

I got my Master's degree from the Department of Computer Science and Engineering, Zhejiang University.

My work of Visual AutoRegressive modeling won the Best Paper Award of NeurIPS 2024.

Research Interests

Visual Foundation Models, Generative Pretrain Models and Large Language Models.

Open World Interaction via Unified Multi-modal generation and understanding.

Large-scale Multi-modal generative Pretraining and Alignment.

Invited Talks

"Elucidating the Design Space of Visual Autoregressive Models and Image Tokenizers", Tutorial: Autoregressive Models Beyond Language, NeurIPS, 2025.

"Towards autoregressive modeling for Scalable and Versatile Visual Generation", Workshop: What Makes a Good Video: Next Practices in Video Generation and Evaluation, NeurIPS, 2025.

"Visual AutoRegressive Modeling: Scalable Image Generation via Next-Scale Prediction", Invited talk at BAAI Conference, 2024

"Spark From Large Language Models : Pretraining, Open-world, Generalized vision models", Invited talk at IDEA, 2024

Highlights

  • Visual AutoRegressive modeling: new visual generation Framework elevates Autoregressive models beyond diffusion, indicate scaling law in image generation.

  • Waver: next-generation foundation model for unified image & video generation, built on rectified flow Transformers and engineered for industry-grade performance.

  • Liquid: scalable autoregressive Multi-modal model that builds shared vocabulary for both images and text, enabling it to understand and generate visual content.

  • Unitok: unified tokenizer for visual generation & understanding, can be seamlessly integrated into MLLMs to unlock visual generation & understanding capability.

  • ByteTrack ranks 1th of the most influential papers in ECCV 2022. Code is available on github with 5.1k stars.

  • Sparse R-CNN accepted by CVPR'21. Sparse R-CNN is integrated into several famous frameworks(Detectron2, MMDetection, PaddlePaddle).

Selected Publications [Google Scholar]

(* Equal contribution, Project Lead, Corresponding Author)

Conference and Preprint

Honors and Awards

Competitions

  • Winner of CVPR 2022 Large-scale Video Object Segmentation Challenge: Video Instance Segmentation

  • Runner up of CVPR 2021 FGVC8 iNaturalist Challenge

  • Runner up of ICCV 2019 WIDER Face and Person Challenge: Face Detection

  • Competition Master in kaggle 2018

Professional Activities