Shafiq Joty
- Papers
- 44
Cite
Notes
Only stored in your browser.
Authored papers
44SkillOrchestra: Learning to Route Agents via Skill Transfer
arXiv 2026
References Improve LLM Alignment in Non-Verifiable Domains
arXiv 2026
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
arXiv 2026
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
arXiv 2026
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms
arXiv 2025
Meta-Design Matters: A Self-Design Multi-Agent System
arXiv 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
arXiv 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
arXiv 2025
Demystifying Domain-adaptive Post-training for Financial LLMs
arXiv 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
arXiv 2025
What Makes a Good Natural Language Prompt?
arXiv 2025
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
arXiv 2025
Preference Optimization for Reasoning with Pseudo Feedback
arXiv 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
arXiv 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
arXiv 2024
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
arXiv 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
arXiv 2024
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
arXiv 2024
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
arXiv 2024
How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library
arXiv 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
arXiv 2024
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
arXiv 2024
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
arXiv 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
arXiv 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
arXiv 2024
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
arXiv 2023
XGen-7B Technical Report
arXiv 2023
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
arXiv 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
arXiv 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
arXiv 2023
Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning
arXiv 2023
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
arXiv 2023
Exploring Self-supervised Logic-enhanced Training for Large Language Models
arXiv 2023
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
arXiv 2023
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
arXiv 2023
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
arXiv 2023
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
arXiv 2023
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Findings (ACL) 2022 5
FOLIO: Natural Language Reasoning with First-Order Logic
arXiv 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
arXiv 2022
GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems
ACL 2022 5
GeDi: Generative Discriminator Guided Sequence Generation
Findings (EMNLP) 2021 11
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations
it-s-morphin-time-combating-linguistic-1
Domain Adaptation with Adversarial Training and Graph Embeddings
domain-adaptation-with-adversarial-training-1
Affiliations
Frequent co-authors
10from 44 papers