Soujanya Poria

Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision

arXiv 2025

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

arXiv 2025

NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

arXiv 2025

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

arXiv 2025

Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned

arXiv 2025

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

arXiv 2025

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

arXiv 2025

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

arXiv 2025

Pixel-Level Reasoning Segmentation via Multi-turn Conversations

arXiv 2025

Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics

arXiv 2025

PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference

arXiv 2025

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

arXiv 2024

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

arXiv 2024

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

arXiv 2024

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

arXiv 2024

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

arXiv 2024

Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

arXiv 2024

Inference Time Alignment with Reward-Guided Tree Search

arXiv 2024

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

arXiv 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

arXiv 2024

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

arXiv 2024

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

arXiv 2024

Two are better than one: Context window extension with multi-grained self-injection

arXiv 2024

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

arXiv 2024

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

arXiv 2023

Mustango: Toward Controllable Text-to-Music Generation

arXiv 2023

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

arXiv 2023

Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

arXiv 2023

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

arXiv 2023

Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

arXiv 2023

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

arXiv 2023

Contrastive Chain-of-Thought Prompting

arXiv 2023

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

arXiv 2023