appletea_cache

community

AI & ML interests

None defined yet.

Recent Activity

appletea2333 authored a paper 3 days ago

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

appletea2333 authored a paper 3 days ago

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

appletea2333 authored a paper 3 days ago

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

View all activity

authored 4 papers 3 days ago

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Paper • 2512.08294 • Published Dec 9, 2025 • 18

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing

Paper • 2603.19206 • Published Mar 19 • 1

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Paper • 2605.21487 • Published May 20 • 23

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Paper • 2606.27828 • Published 8 days ago • 24

submitted a paper to Daily Papers 4 days ago

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Paper • 2606.27828 • Published 8 days ago • 24

authored 2 papers 3 months ago

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Paper • 2603.28767 • Published Mar 30 • 58

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 149

authored 7 papers 7 months ago

OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

Paper • 2511.20211 • Published Nov 25, 2025 • 12

Architecture Decoupling Is Not All You Need For Unified Multimodal Model

Paper • 2511.22663 • Published Nov 27, 2025 • 29

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 35

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published Dec 5, 2025 • 38

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Paper • 2501.08282 • Published Jan 14, 2025

Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency

Paper • 2506.01908 • Published Jun 2, 2025 • 1

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

Paper • 2511.16671 • Published Nov 20, 2025 • 16