Dinesh Manocha
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26Do Audio-Visual Large Language Models Really See and Hear?
arXiv 2026
EgoAVU: Egocentric Audio-Visual Understanding
arXiv 2026
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
arXiv 2026
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
arXiv 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
arXiv 2025
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
arXiv 2024
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
CVPR 2025 1
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
arXiv 2024
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
arXiv 2024
HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models
arXiv 2024
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
arXiv 2024
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
arXiv 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
arXiv 2024
Prompt Mixing in Diffusion Models using the Black Scholes Algorithm
arXiv 2024
Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Guided Exploration for Zero-Shot Object Navigation
arXiv 2023
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
ICCV 2023 1
iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning
arXiv 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
CVPR 2024 1
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
arXiv 2023
DALE: Generative Data Augmentation for Low-Resource Legal NLP
arXiv 2023
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations
arXiv 2023
FAR: Fourier Aerial Video Recognition
arXiv 2022
GANav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments
arXiv 2021
FAST-RIR: Fast neural diffuse room impulse response generator
arXiv 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
arXiv 2021
MotionHint: Self-Supervised Monocular Visual Odometry with Motion Constraints
arXiv 2021
Affiliations
Frequent co-authors
10from 26 papers