Email: jiangyi0425-at-gmail.com jiangyi.enjoy-at-bytedance.com

Biography

Currently, I am a Research Scientist at ByteDance Seed, where I work on Generative Foundation models.

My previous research focus is on large scale open world visual understanding and pretraining in images & videos.

My work of Visual AutoRegressive modeling won the Best Paper Award of NeurIPS 2024.

Research Interests

Visual Foundation Models, Generative Pretrain Models and Large Language Models.

Open World Interaction via Unified Multi-modal generation and understanding.

Large-scale Multi-modal generative Pretraining and Alignment.

Highlights

  • Visual AutoRegressive modeling: new visual generation Framework elevates Autoregressive models beyond diffusion, indicate scaling law in image generation.

  • Waver: next-generation foundation model for unified image & video generation, built on rectified flow Transformers and engineered for industry-grade performance.

  • Liquid: scalable autoregressive Multi-modal model that builds shared vocabulary for both images and text, enabling it to understand and generate visual content.

  • Unitok: unified tokenizer for visual generation & understanding, can be seamlessly integrated into MLLMs to unlock visual generation & understanding capability.

  • ByteTrack ranks 1th of the most influential papers in ECCV 2022. Code is available on github with 5.1k stars.

  • Sparse R-CNN accepted by CVPR'21. Sparse R-CNN is integrated into several famous frameworks(Detectron2, MMDetection, PaddlePaddle).

Publications [Google Scholar]

(* Equal contribution, Project Lead, Corresponding Author)

Conference and Preprint

Honors and Awards

Competitions

  • Winner of CVPR 2022 Large-scale Video Object Segmentation Challenge: Video Instance Segmentation

  • Runner up of CVPR 2021 FGVC8 iNaturalist Challenge

  • Runner up of ICCV 2019 WIDER Face and Person Challenge: Face Detection

  • Competition Master in kaggle 2018

Professional Activities