0

PengFei Liu

Papers
83

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
83papers

Authored papers

83

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

arXiv 2026

2026

ASI-Evolve: AI Accelerates AI

arXiv 2026

2026

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

arXiv 2026

2026

daVinci-Env: Open SWE Environment Synthesis at Scale

arXiv 2026

2026

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

arXiv 2026

2026

LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model

arXiv 2026

2026

Hybrid Policy Distillation for LLMs

arXiv 2026

2026

daVinci-LLM:Towards the Science of Pretraining

arXiv 2026

2026

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

arXiv 2026

2026

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

arXiv 2026

2026

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

arXiv 2026

2026

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

arXiv 2026

2026

Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training

arXiv 2026

2026

daVinci-Dev: Agent-native Mid-training for Software Engineering

arXiv 2026

2026

AcademiClaw: When Students Set Challenges for AI Agents

arXiv 2026

2026

PRBench: End-to-end Paper Reproduction in Physics Research

arXiv 2026

2026

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

arXiv 2025

2025

Seed1.5-VL Technical Report

arXiv 2025

2025

Thinking with Generated Images

arXiv 2025

2025

LIMO: Less is More for Reasoning

arXiv 2025

2025

DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models

arXiv 2025

2025

Generative AI Act II: Test Time Scaling Drives Cognition Engineering

arXiv 2025

2025

Efficient Agent Training for Computer Use

arXiv 2025

2025

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

arXiv 2025

2025

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

arXiv 2025

2025

Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

arXiv 2025

2025

LIMI: Less is More for Agency

arXiv 2025

2025

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

arXiv 2025

2025

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

arXiv 2025

2025

DIVE: Diversified Iterative Self-Improvement

arXiv 2025

2025

One RL to See Them All: Visual Triple Unified Reinforcement Learning

arXiv 2025

2025

SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

arXiv 2025

2025

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

arXiv 2025

2025

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

arXiv 2025

2025

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

arXiv 2025

2025

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

arXiv 2025

2025

LIMR: Less is More for RL Scaling

arXiv 2025

2025

O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

arXiv 2025

2025

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

arXiv 2025

2025

Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

arXiv 2025

2025

Halu-J: Critique-Based Hallucination Judge

arXiv 2024

2024

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

arXiv 2024

2024

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

arXiv 2024

2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

arXiv 2024

2024

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

arXiv 2024

2024

OpenResearcher: Unleashing AI for Accelerated Scientific Research

arXiv 2024

2024

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

arXiv 2024

2024

Extending LLMs' Context Window with 100 Samples

arXiv 2024

2024

Weak-to-Strong Reasoning

arXiv 2024

2024

Benchmarking Benchmark Leakage in Large Language Models

arXiv 2024

2024

InFoBench: Evaluating Instruction Following Ability in Large Language Models

arXiv 2024

2024

The Critique of Critique

arXiv 2024

2024

BeHonest: Benchmarking Honesty in Large Language Models

arXiv 2024

2024

Dissecting Human and LLM Preferences

arXiv 2024

2024

A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions

arXiv 2024

2024

A quantitative analysis of knowledge-learning preferences in large language models in molecular science

arXiv 2024

2024

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models

arXiv 2024

2024

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

arXiv 2024

2024

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

arXiv 2024

2024

Evaluating Mathematical Reasoning Beyond Accuracy

arXiv 2024

2024

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

arXiv 2024

2024

Understanding Reference Policies in Direct Preference Optimization

arXiv 2024

2024

ECon: On the Detection and Resolution of Evidence Conflicts

arXiv 2024

2024

Reformatted Alignment

arXiv 2024

2024

FELM: Benchmarking Factuality Evaluation of Large Language Models

NeurIPS 2023 11

2023

Alignment for Honesty

arXiv 2023

2023

MathPile: A Billion-Token-Scale Pretraining Corpus for Math

arXiv 2023

2023

Generative Judge for Evaluating Alignment

arXiv 2023

2023

GPTScore: Evaluate as You Desire

arXiv 2023

2023

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

arXiv 2023

2023

Align on the Fly: Adapting Chatbot Behavior to Established Norms

arXiv 2023

2023

On Learning to Summarize with Large Language Models as References

arXiv 2023

2023

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

arXiv 2023

2023

How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation

arXiv 2023

2023

BRIO: Bringing Order to Abstractive Summarization

ACL 2022 5

2022

Towards a Unified Multi-Dimensional Evaluator for Text Generation

arXiv 2022

2022

reStructured Pre-training

arXiv 2022

2022

I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose Estimation

arXiv 2022

2022

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

arXiv 2022

2022

BARTScore: Evaluating Generated Text as Text Generation

NeurIPS 2021 12

2021

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

arXiv 2021

2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

EMNLP 2021 11

2021

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

ACL 2021 5

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 83 papers