Soujanya Poria
- Papers
- 44
Cite
Notes
Only stored in your browser.
Authored papers
44δ-mem: Efficient Online Memory for Large Language Models
arXiv 2026
From Perception to Action: An Interactive Benchmark for Vision Reasoning
arXiv 2026
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
arXiv 2025
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
arXiv 2025
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
arXiv 2025
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
arXiv 2025
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
arXiv 2025
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
arXiv 2025
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
arXiv 2025
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics
arXiv 2025
PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference
arXiv 2025
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
arXiv 2025
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
arXiv 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
arXiv 2025
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
arXiv 2024
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
arXiv 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
arXiv 2024
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
arXiv 2024
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
arXiv 2024
Two are better than one: Context window extension with multi-grained self-injection
arXiv 2024
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
arXiv 2024
Inference Time Alignment with Reward-Guided Tree Search
arXiv 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
arXiv 2024
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
arXiv 2024
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models
arXiv 2024
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
arXiv 2024
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
arXiv 2024
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks
arXiv 2023
Mustango: Toward Controllable Text-to-Music Generation
arXiv 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
arXiv 2023
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
arXiv 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
arXiv 2023
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
arXiv 2023
Contrastive Chain-of-Thought Prompting
arXiv 2023
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
arXiv 2023
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding
arXiv 2023
Multiview Contextual Commonsense Inference: A New Dataset and Task
arXiv 2022
A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach
arXiv 2022
WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs
arXiv 2022
COSMIC: COmmonSense knowledge for eMotion Identification in Conversations
Findings of the Association for Computational Linguistics 2020
Recognizing Emotion Cause in Conversations
recognizing-emotion-cause-in-conversations
MIME: MIMicking Emotions for Empathetic Response Generation
EMNLP 2020 11
Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)
arXiv 2019
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
meld-a-multimodal-multi-party-dataset-for-1
Affiliations
Frequent co-authors
10from 44 papers