0

Hisham Cholakkal

Papers
33

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
33papers

Authored papers

33

MediX-R1: Open Ended Medical Reinforcement Learning

arXiv 2026

2026

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

arXiv 2026

2026

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

arXiv 2026

2026

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

arXiv 2026

2026

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

arXiv 2026

2026

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

arXiv 2025

2025

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv 2025

2025

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

arXiv 2025

2025

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

arXiv 2025

2025

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

arXiv 2025

2025

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards

arXiv 2025

2025

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

arXiv 2025

2025

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv 2025

2025

Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

arXiv 2025

2025

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

arXiv 2025

2025

AIN: The Arabic INclusive Large Multimodal Model

arXiv 2025

2025

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

arXiv 2025

2025

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

CVPR 2025 1

2024

CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

arXiv 2024

2024

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

arXiv 2024

2024

PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

arXiv 2024

2024

Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

arXiv 2024

2024

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

CVPR 2024 1

2024

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

arXiv 2024

2024

BiMediX: Bilingual Medical Mixture of Experts LLM

arXiv 2024

2024

Multi-modal Generation via Cross-Modal In-Context Learning

arXiv 2024

2024

GLaMM: Pixel Grounding Large Multimodal Model

CVPR 2024 1

2023

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

arXiv 2023

2023

Foundational Models Defining a New Era in Vision: A Survey and Outlook

arXiv 2023

2023

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

ICCV 2023 1

2023

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

ICCV 2023 1

2023

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

arXiv 2022

2022

Handwriting Transformers

ICCV 2021 10

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 33 papers