0

Jiashi Feng

Papers
50

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
50papers

Authored papers

50

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

arXiv 2026

2026

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

arXiv 2026

2026

VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

arXiv 2026

2026

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

arXiv 2025

2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

arXiv 2025

2025

Seed1.5-VL Technical Report

arXiv 2025

2025

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

ICCV 2025

2025

Puppeteer: Rig and Animate Your 3D Models

arXiv 2025

2025

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

arXiv 2025

2025

Depth Anything 3: Recovering the Visual Space from Any Views

arXiv 2025

2025

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

arXiv 2025

2025

Trace Anything: Representing Any Video in 4D via Trajectory Fields

arXiv 2025

2025

Flash-VStream: Efficient Real-Time Understanding for Long Video Streams

arXiv 2025

2025

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

ICCV 2025

2025

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

ICCV 2025

2025

Depth Anything V2

arXiv 2024

2024

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

CVPR 2025 1

2024

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

CVPR 2024 1

2024

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

arXiv 2024

2024

Parallelized Autoregressive Visual Generation

CVPR 2025 1

2024

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

pllava-parameter-free-llava-extension-from

2024

LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

arXiv 2024

2024

Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion

arXiv 2024

2024

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

arXiv 2024

2024

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

arXiv 2024

2024

Image Understanding Makes for A Good Tokenizer for Image Generation

arXiv 2024

2024

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

dora-sampling-and-benchmarking-for-3d-shape

2024

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

arXiv 2024

2024

Magic-Me: Identity-Specific Video Customized Diffusion

arXiv 2024

2024

Classification Done Right for Vision-Language Pre-Training

arXiv 2024

2024

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

arXiv 2023

2023

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

arXiv 2023

2023

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

arXiv 2023

2023

ChatAnything: Facetime Chat with LLM-Enhanced Personas

arXiv 2023

2023

Dataset Quantization

ICCV 2023 1

2023

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

arXiv 2023

2023

Expanding Small-Scale Datasets with Guided Imagination

expanding-small-scale-datasets-with-guided

2022

Sharpness-Aware Training for Free

arXiv 2022

2022

Generalizing Few-Shot NAS with Gradient Matching

generalizing-few-shot-nas-with-gradient

2022

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

arXiv 2022

2022

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

NeurIPS 2021 12

2021

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

ICCV 2021 10

2021

Deep Long-Tailed Learning: A Survey

arXiv 2021

2021

Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition

arXiv 2021

2021

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

ICLR 2020 1

2020

The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

arXiv 2020

2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

NeurIPS 2020 12

2020

Decoupling Representation and Classifier for Long-Tailed Recognition

ICLR 2020 1

2019

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

psgan-pose-and-expression-robust-spatial

2019

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

partial-order-pruning-for-best-speedaccuracy-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 50 papers