Siva Reddy

LLM2Vec-Gen: Generative Embeddings from Large Language Models

arXiv 2026

Structured Distillation of Web Agent Capabilities Enables Generalization

arXiv 2026

LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs

arXiv 2026

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

arXiv 2026

Humans and LLMs Diverge on Probabilistic Inferences

arXiv 2026

SafeArena: Evaluating the Safety of Autonomous Web Agents

arXiv 2025

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

arXiv 2025

DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning

arXiv 2025

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

arXiv 2025

The Markovian Thinker

arXiv 2025

The Promise of RL for Autoregressive Image Editing

arXiv 2025

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

arXiv 2025

How to Get Your LLM to Generate Challenging Problems for Evaluation

arXiv 2025

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

arXiv 2024

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

arXiv 2024

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

arXiv 2024

Are self-explanations from Large Language Models faithful?

arXiv 2024

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

arXiv 2024

The BrowserGym Ecosystem for Web Agent Research

arXiv 2024

Universal Adversarial Triggers Are Not Universal

arXiv 2024

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

arXiv 2023

The Impact of Positional Encoding on Length Generalization in Transformers

the-impact-of-positional-encoding-on-length

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

arXiv 2023

Faithfulness Measurable Masked Language Models

arXiv 2023

Combining Modular Skills in Multitask Learning

arXiv 2022

Image Retrieval from Contextual Descriptions

ACL 2022 5

Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model

arXiv 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Findings (ACL) 2022 5