Bhiksha Raj

Robust Latent Matters: Boosting Image Generation with Sampling Error

arXiv 2025

Image Tokenizer Needs Post-Training

arXiv 2025

Mellow: a small audio language model for reasoning

arXiv 2025

ControlVAR: Exploring Controllable Visual Autoregressive Modeling

arXiv 2024

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

arXiv 2024

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

arXiv 2024

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

arXiv 2024

uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes

arXiv 2024

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

CVPR 2023 1

Training Audio Captioning Models without Audio

arXiv 2023

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

arXiv 2023

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

arXiv 2023

Token Prediction as Implicit Classification to Identify LLM-Generated Text

arXiv 2023

How many perturbations break this model? Evaluating robustness beyond adversarial accuracy

arXiv 2022

USB: A Unified Semi-supervised Learning Benchmark for Classification

arXiv 2022

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

arXiv 2022

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

arXiv 2022

HEAR: Holistic Evaluation of Audio Representations

arXiv 2022