Jiashi Feng
- Papers
- 50
Cite
Notes
Only stored in your browser.
Authored papers
50EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
arXiv 2026
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
arXiv 2026
VideoWorld 2: Learning Transferable Knowledge from Real-world Videos
arXiv 2026
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
arXiv 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
arXiv 2025
Seed1.5-VL Technical Report
arXiv 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
Puppeteer: Rig and Animate Your 3D Models
arXiv 2025
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
arXiv 2025
Depth Anything 3: Recovering the Visual Space from Any Views
arXiv 2025
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
arXiv 2025
Trace Anything: Representing Any Video in 4D via Trajectory Fields
arXiv 2025
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
arXiv 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
ICCV 2025
Depth Anything V2
arXiv 2024
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
CVPR 2025 1
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
CVPR 2024 1
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
arXiv 2024
Parallelized Autoregressive Visual Generation
CVPR 2025 1
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
pllava-parameter-free-llava-extension-from
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
arXiv 2024
Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion
arXiv 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
arXiv 2024
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
arXiv 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
arXiv 2024
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
dora-sampling-and-benchmarking-for-3d-shape
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
arXiv 2024
Magic-Me: Identity-Specific Video Customized Diffusion
arXiv 2024
Classification Done Right for Vision-Language Pre-Training
arXiv 2024
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
arXiv 2023
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration
arXiv 2023
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
arXiv 2023
ChatAnything: Facetime Chat with LLM-Enhanced Personas
arXiv 2023
Dataset Quantization
ICCV 2023 1
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method
arXiv 2023
Expanding Small-Scale Datasets with Guided Imagination
expanding-small-scale-datasets-with-guided
Sharpness-Aware Training for Free
arXiv 2022
Generalizing Few-Shot NAS with Gradient Matching
generalizing-few-shot-nas-with-gradient
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
arXiv 2022
Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning
NeurIPS 2021 12
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
ICCV 2021 10
Deep Long-Tailed Learning: A Survey
arXiv 2021
Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition
arXiv 2021
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
ICLR 2020 1
The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up
arXiv 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
NeurIPS 2020 12
Decoupling Representation and Classifier for Long-Tailed Recognition
ICLR 2020 1
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
psgan-pose-and-expression-robust-spatial
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
partial-order-pruning-for-best-speedaccuracy-1
Affiliations
Frequent co-authors
10from 50 papers