Lijun Wu
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
arXiv 2026
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
arXiv 2026
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
arXiv 2026
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
arXiv 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
arXiv 2025
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
NatureLM: Deciphering the Language of Nature for Scientific Discovery
arXiv 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
arXiv 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value
arXiv 2025
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
arXiv 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
arXiv 2025
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
arXiv 2025
PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model
arXiv 2025
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
arXiv 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
arXiv 2025
Sequential Diffusion Language Models
arXiv 2025
Revisiting Long-context Modeling from Context Denoising Perspective
arXiv 2025
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning
arXiv 2025
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
arXiv 2025
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once
arXiv 2025
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
arXiv 2025
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
arXiv 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
arXiv 2025
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
arXiv 2025
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
arXiv 2024
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
arXiv 2024
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
arXiv 2024
FABind: Fast and Accurate Protein-Ligand Binding
NeurIPS 2023 11
SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction
arXiv 2022
Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change
arXiv 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
arXiv 2022
Affiliations
Frequent co-authors
10from 34 papers