Jindong Wang
- Papers
- 31
Cite
Notes
Only stored in your browser.
Authored papers
31AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
arXiv 2026
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training
arXiv 2026
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
arXiv 2025
RewardAnything: Generalizable Principle-Following Reward Models
arXiv 2025
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
arXiv 2025
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
arXiv 2025
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
arXiv 2025
HAROOD: A Benchmark for Out-of-distribution Generalization in Sensor-based Human Activity Recognition
arXiv 2025
TrustLLM: Trustworthiness in Large Language Models
arXiv 2024
AgentReview: Exploring Peer Review Dynamics with LLM Agents
arXiv 2024
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
arXiv 2024
SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis
arXiv 2024
Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models
arXiv 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
arXiv 2024
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
arXiv 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
arXiv 2024
NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli
arXiv 2024
Time Series Analysis for Education: Methods, Applications, and Future Directions
arXiv 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
arXiv 2024
A Survey on Evaluation of Large Language Models
arXiv 2023
Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
ICCV 2023 1
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
arXiv 2023
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation
arXiv 2023
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
arXiv 2023
Supervised Knowledge Makes Large Language Models Better In-context Learners
arXiv 2023
PromptBench: A Unified Library for Evaluation of Large Language Models
arXiv 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
arXiv 2023
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
distilling-out-of-distribution-robustness
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
arXiv 2022
Memory-Guided Multi-View Multi-Domain Fake News Detection
arXiv 2022
USB: A Unified Semi-supervised Learning Benchmark for Classification
arXiv 2022
Affiliations
Frequent co-authors
10from 31 papers