Yonatan Bitton
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17Error-Driven Scene Editing for 3D Grounding in Large Language Models
arXiv 2025
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
arXiv 2024
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
arXiv 2024
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
arXiv 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
arXiv 2024
Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis
arXiv 2024
Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks
arXiv 2024
DataComp: In search of the next generation of multimodal datasets
NeurIPS 2023 11
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
arXiv 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
what-you-see-is-what-you-read-improving-text
IRFL: Image Recognition of Figurative Language
arXiv 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
arXiv 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
CVPR 2024 1
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
arXiv 2023
VASR: Visual Analogies of Situation Recognition
arXiv 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
arXiv 2022
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
NAACL 2021 4
Affiliations
Frequent co-authors
10from 17 papers