A curated list of story/novel/script generation research in the LLM era (2022-present), organized by method with strict link verification.
- Category Overview
- Papers and Projects
- Story Generation Methods
- Quality, Control, and Iteration
- Evaluation and Resources
- Maintenance Notes
- Citation
| Category | Entries |
|---|---|
| Planning / Decomposition for Story Generation | 17 |
| Agent Collaboration for Story Writing | 5 |
| Sandbox / World Simulation Narrative Generation | 10 |
| Multimodal Story Generation (Text-Image/Video/Comic/Audio) | 16 |
| Memory & Long-Context Coherence | 11 |
| Consistency / Controllability / Constraint Following | 21 |
| Refinement / Self-Critique / Iterative Editing | 13 |
| Evaluation / Benchmarks / Metrics | 35 |
| Datasets / Surveys / Resources | 22 |
| Open-source Projects (No Paper Required) | 10 |
Note: Project stores project/demo links; Code stores verified GitHub repositories.
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| Narrix: Remixing Narrative Strategies from Examples for Story Writing | CHI 2026 (Conference on Human Factors in Computing Systems) | 2026-04 | arXiv | - | - | - | planning, narrative-structure |
| BiT-MCTS: A Theme-based Bidirectional MCTS Approach to Chinese Fiction Generation | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | planning, narrative-structure |
| DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing | ArXiv 2026 (arXiv preprint) | 2026-01 | arXiv | - | - | - | planning, narrative-structure |
| Codified Foreshadowing-Payoff Text Generation | ArXiv 2026 (arXiv preprint) | 2026-01 | arXiv | - | - | - | planning, narrative-structure |
| SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency | ArXiv 2025 (arXiv preprint) | 2025-10 | arXiv | - | - | - | planning, narrative-structure |
| Long Story Generation via Knowledge Graph and Literary Theory | ArXiv 2025 (arXiv preprint) | 2025-08 | arXiv | - | - | - | planning, narrative-structure |
| STORYTELLER: An Enhanced Plot-Planning Framework for Coherent and Cohesive Story Generation | ArXiv 2025 (arXiv preprint) | 2025-06 | arXiv | - | - | - | planning, narrative-structure |
| Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective | ArXiv 2025 (arXiv preprint) | 2025-06 | arXiv | - | - | - | planning, narrative-structure |
| Learning to Reason for Long-Form Story Generation | ArXiv 2025 (arXiv preprint) | 2025-03 | arXiv | - | Code | planning, narrative-structure | |
| Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement | NAACL 2025 (North American Chapter of ACL) | 2024-12 | arXiv | - | - | planning, narrative-structure | |
| Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding | ACL 2024 (Annual Meeting of the Association for Computational Linguistics) | 2024-08 | arXiv | - | - | planning, narrative-structure | |
| Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models | NAACL 2025 (North American Chapter of ACL) | 2024-04 | arXiv | - | - | planning, narrative-structure | |
| Creating Suspenseful Stories: Iterative Planning with Large Language Models | EACL 2024 (Conference of the European Chapter of ACL) | 2024-02 | arXiv | - | - | planning, narrative-structure | |
| Improving Pacing in Long-Form Story Planning | EMNLP Findings 2023 (Findings of EMNLP) | 2023-11 | arXiv | - | - | planning, narrative-structure | |
| End-to-End Story Plot Generator | ArXiv 2023 (arXiv preprint) | 2023-10 | arXiv | - | - | planning, narrative-structure | |
| The Next Chapter: A Study of Large Language Models in Storytelling | ArXiv 2023 (arXiv preprint) | 2023-01 | arXiv | - | - | - | planning, narrative-structure |
| DOC: Improving Long Story Coherence With Detailed Outline Control | ArXiv 2022 (arXiv preprint) | 2022-12 | arXiv | - | - | - | planning, narrative-structure |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games | ACL Findings 2026 (Findings of ACL) | 2026-04 | arXiv | - | - | - | multi-agent, collaboration |
| A Cognitive Writing Perspective for Constrained Long-Form Text Generation | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | Code | multi-agent, collaboration | |
| Agents' Room: Narrative Generation through Multi-step Collaboration | ICLR 2025 (International Conference on Learning Representations) | 2024-10 | arXiv | - | - | multi-agent, collaboration | |
| HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing | EMNLP Findings 2024 (Findings of EMNLP) | 2024-06 | arXiv | - | - | multi-agent, collaboration | |
| AutoAgents: A Framework for Automatic Agent Generation | IJCAI 2024 (International Joint Conference on Artificial Intelligence) | 2023-09 | arXiv | - | Code | multi-agent, collaboration |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| EvoSpark: Endogenous Interactive Agent Societies for Unified Long-Horizon Narrative Evolution | ACL 2026 (Annual Meeting of the Association for Computational Linguistics) | 2026-04 | arXiv | - | - | - | sandbox, simulation |
| StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models | ArXiv 2025 (arXiv preprint) | 2025-10 | arXiv | Project | - | - | sandbox, simulation |
| OPEN-THEATRE: An Open-Source Toolkit for LLM-based Interactive Drama | ArXiv 2025 (arXiv preprint) | 2025-09 | arXiv | - | - | - | sandbox, interactive |
| HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics | ArXiv 2025 (arXiv preprint) | 2025-07 | arXiv | - | - | - | sandbox, simulation |
| STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game | ArXiv 2025 (arXiv preprint) | 2025-05 | arXiv | - | - | sandbox, interactive | |
| BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation | ArXiv 2025 (arXiv preprint) | 2025-04 | arXiv | Project | - | - | sandbox, interactive |
| Towards Enhanced Immersion and Agency for LLM-based Interactive Drama | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | - | sandbox, interactive | |
| IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation | ACL 2024 (Annual Meeting of the Association for Computational Linguistics) | 2024-07 | arXiv | - | Code | sandbox, interactive | |
| StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning | FDG 2024 (Foundations of Digital Games) | 2024-05 | arXiv | - | - | sandbox, simulation | |
| Generative Agents: Interactive Simulacra of Human Behavior | ArXiv 2023 (arXiv preprint) | 2023-04 | arXiv | Project | Code | - | sandbox, interactive |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | multimodal, visual-story |
| OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | multimodal, screenplay |
| Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | multimodal, video-story |
| StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | multimodal, visual-story |
| LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | multimodal, visual-story |
| Customized Visual Storytelling with Unified Multimodal LLMs | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | multimodal, visual-story |
| Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | multimodal, visual-story |
| EmoStory: Emotion-Aware Story Generation | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | multimodal, visual-story |
| PlayWrite: A Multimodal System for AI Supported Narrative Co-Authoring Through Play in XR | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | multimodal, co-creation |
| StoryComposerAI: A Multimodal Story Co-Creation Tool for Amateur Writers | CHI EA 2026 (Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems) | 2026-02 | arXiv | - | - | - | multimodal, co-creation |
| Re:Verse -- Can Your VLM Read a Manga? | ICCV AISTORY Workshop 2025 (ICCV AISTORY Workshop) | 2025-08 | arXiv | - | - | - | multimodal, visual-story |
| Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation | ArXiv 2025 (arXiv preprint) | 2025-08 | arXiv | - | - | - | multimodal, visual-story |
| R^2: A LLM BASED NOVEL-TO-SCREENPLAY GENERATION FRAMEWORK WITH CAUSAL PLOT GRAPHS | ICLR 2025 (International Conference on Learning Representations) | 2025-03 | arXiv | - | - | multimodal, screenplay | |
| LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | Code | - | multimodal, visual-story |
| SEED-Story: Multimodal Long Story Generation with Large Language Model | ArXiv 2024 (arXiv preprint) | 2024-07 | arXiv | - | Code | - | multimodal, visual-story |
| Make-A-Story: Visual Memory Conditioned Consistent Story Generation | CVPR 2023 (Conference on Computer Vision and Pattern Recognition) | 2022-11 | arXiv | - | - | - | multimodal, visual-story |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | long-context, coherence |
| Skeleton-based Coherence Modeling in Narratives | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | long-context, coherence |
| Shifting Long-Context LLMs Research from Input to Output | ArXiv 2025 (arXiv preprint) | 2025-03 | arXiv | - | - | - | long-context, coherence |
| Language Models can Self-Lengthen to Generate Long Texts | ArXiv 2024 (arXiv preprint) | 2024-10 | arXiv | - | Code | - | long-context, coherence |
| LongGenBench: Benchmarking Long-Form Generation in Long-Context LLMs | ArXiv 2024 (arXiv preprint) | 2024-09 | Published | - | - | - | long-context, coherence |
| LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | ArXiv 2024 (arXiv preprint) | 2024-08 | arXiv | - | Code | - | long-context, coherence |
| LongLaMP: A Benchmark for Personalized Long-form Text Generation | ArXiv 2024 (arXiv preprint) | 2024-07 | arXiv | - | - | - | long-context, coherence |
| CHIRON: Rich Character Representations in Long-Form Narratives | EMNLP Findings 2024 (Findings of EMNLP) | 2024-06 | Published | - | - | - | long-context, coherence |
| With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation | COLM 2024 (Conference on Language Modeling) | 2024-01 | arXiv | - | - | long-context, coherence | |
| LongAlign: A Recipe for Long Context Alignment of Large Language Models | ArXiv 2024 (arXiv preprint) | 2024-01 | arXiv | - | Code | - | long-context, coherence |
| RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text | ArXiv 2023 (arXiv preprint) | 2023-05 | arXiv | Project | Code | long-context, interactive |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | controllability, consistency |
| Noise Steering for Controlled Text Generation: Improving Diversity and Reading-Level Fidelity in Arabic Educational Story Generation | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | controllability, consistency |
| Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | controllability, consistency |
| TaleFrame: An Interactive Story Generation System with Fine-Grained Control and Large Language Models | ArXiv 2025 (arXiv preprint) | 2025-12 | arXiv | - | - | - | controllability, interactive |
| SCORE: Story Coherence and Retrieval Enhancement for AI Narratives | ArXiv 2025 (arXiv preprint) | 2025-03 | arXiv | - | - | controllability, retrieval | |
| Whose story is it? Personalizing story generation by inferring author styles | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | - | controllability, consistency | |
| Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | - | controllability, consistency | |
| CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints | ArXiv 2024 (arXiv preprint) | 2024-10 | arXiv | - | Code | controllability, consistency | |
| Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation | ArXiv 2024 (arXiv preprint) | 2024-10 | arXiv | - | - | controllability, consistency | |
| MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2024-09 | arXiv | - | - | controllability, consistency | |
| FACTTRACK: Time-Aware World State Tracking in Story Outlines | NAACL 2025 (North American Chapter of ACL) | 2024-07 | arXiv | - | - | controllability, consistency | |
| Suri: Multi-constraint Instruction Following for Long-form Text Generation | ArXiv 2024 (arXiv preprint) | 2024-06 | arXiv | - | Code | - | controllability, consistency |
| MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation | ACL 2024 (Annual Meeting of the Association for Computational Linguistics) | 2024-06 | arXiv | Project | Code | controllability, consistency | |
| Measuring Psychological Depth in Language Models | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2024-06 | arXiv | - | - | controllability, consistency | |
| Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming | ACL Workshop 2025 (ACL Workshop) | 2024-06 | arXiv | - | - | controllability, consistency | |
| Multigenre AI-powered Story Composition | ArXiv 2024 (arXiv preprint) | 2024-05 | arXiv | - | - | controllability, consistency | |
| Returning to the Start: Generating Narratives with Related Endpoints | NAACL 2024 (North American Chapter of ACL) | 2024-04 | arXiv | Project | Code | controllability, consistency | |
| NarrativeGenie: Generating Narrative Beats and Dynamic Storytelling with Large Language Models | AIIDE 2024 (Conference on Artificial Intelligence and Interactive Digital Entertainment) | 2024-01 | Published | - | - | - | controllability, consistency |
| CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer | ArXiv 2024 (arXiv preprint) | 2024-01 | arXiv | - | - | controllability, consistency | |
| Learning to Generate Text in Arbitrary Writing Styles | ArXiv 2023 (arXiv preprint) | 2023-12 | arXiv | - | - | controllability, consistency | |
| RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment | ICLR 2024 (International Conference on Learning Representations) | 2023-07 | arXiv | - | - | controllability, consistency |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | refinement, revision |
| LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback | ArXiv 2026 (arXiv preprint) | 2026-01 | arXiv | - | - | - | refinement, revision |
| All Stories Are One Story: Emotional Arc Guided Procedural Game Level Generation | ArXiv 2025 (arXiv preprint) | 2025-08 | arXiv | - | - | - | refinement, revision |
| SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models | ArXiv 2025 (arXiv preprint) | 2025-06 | arXiv | - | - | - | refinement, revision |
| Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection | ArXiv 2025 (arXiv preprint) | 2025-04 | arXiv | - | - | refinement, revision | |
| MLD-EA: Check and Complete Narrative Coherence by Introducing Emotions and Actions | ArXiv 2024 (arXiv preprint) | 2024-12 | arXiv | - | - | refinement, revision | |
| Collective Critics for Creative Story Generation | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2024-10 | arXiv | - | - | refinement, revision | |
| SWAG: Storytelling With Action Guidance | EMNLP Findings 2024 (Findings of EMNLP) | 2024-02 | arXiv | - | - | refinement, revision | |
| GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence | EMNLP Findings 2023 (Findings of EMNLP) | 2023-10 | arXiv | - | - | refinement, retrieval | |
| EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation | ArXiv 2023 (arXiv preprint) | 2023-10 | arXiv | - | - | refinement, revision | |
| Branch-Solve-Merge Improves Large Language Model Evaluation and Generation | ArXiv 2023 (arXiv preprint) | 2023-10 | arXiv | - | - | - | refinement, revision |
| Re3: Generating Longer Stories With Recursive Reprompting and Revision | ArXiv 2022 (arXiv preprint) | 2022-10 | arXiv | - | - | - | refinement, revision |
| Model Criticism for Long-Form Text Generation | ArXiv 2022 (arXiv preprint) | 2022-10 | arXiv | - | - | - | refinement, revision |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, evaluation |
| Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, dataset |
| MCSC-Bench: Multimodal Context-to-Script Creation for Realistic Video Production | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, dataset |
| Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, evaluation |
| Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, evaluation |
| Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, evaluation |
| StoryScope: Investigating idiosyncrasies in AI fiction | ArXiv 2026 (arXiv preprint) | 2026-04 | arXiv | - | - | - | benchmark, evaluation |
| Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | benchmark, evaluation |
| Creative Convergence or Imitation? Genre-Specific Homogeneity in LLM-Generated Chinese Literature | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | - | - | - | benchmark, evaluation |
| Lost in Stories: Consistency Bugs in Long Story Generation by LLMs | ArXiv 2026 (arXiv preprint) | 2026-03 | arXiv | Project | Code | - | benchmark, evaluation |
| LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers | ArXiv 2026 (arXiv preprint) | 2026-02 | arXiv | - | - | - | benchmark, evaluation |
| Evaluation Framework for AI Creativity: A Case Study Based on Story Generation | ArXiv 2026 (arXiv preprint) | 2026-01 | arXiv | - | - | - | benchmark, evaluation |
| Evaluating LLM Story Generation through Large-scale Network Analysis of Social Structures | ArXiv 2025 (arXiv preprint) | 2025-10 | arXiv | - | - | - | benchmark, evaluation |
| EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation | ArXiv 2025 (arXiv preprint) | 2025-08 | arXiv | - | - | - | benchmark, evaluation |
| LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing | ArXiv 2025 (arXiv preprint) | 2025-07 | arXiv | - | - | - | benchmark, dataset |
| WritingBench: A Comprehensive Benchmark for Generative Writing | ArXiv 2025 (arXiv preprint) | 2025-03 | arXiv | - | - | - | benchmark, evaluation |
| CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization | ArXiv 2025 (arXiv preprint) | 2025-03 | arXiv | - | - | benchmark, evaluation | |
| LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm | ArXiv 2025 (arXiv preprint) | 2025-02 | arXiv | - | Code | benchmark, evaluation | |
| Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs | ArXiv 2025 (arXiv preprint) | 2025-01 | arXiv | - | - | benchmark, evaluation | |
| Evaluating Creative Short Story Generation in Humans and Large Language Models | ArXiv 2024 (arXiv preprint) | 2024-11 | arXiv | - | - | benchmark, evaluation | |
| Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs | COLING 2025 (International Conference on Computational Linguistics) | 2024-09 | arXiv | - | - | benchmark, evaluation | |
| HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | ArXiv 2024 (arXiv preprint) | 2024-09 | arXiv | - | Code | - | benchmark, evaluation |
| STORYSUMM: Evaluating Faithfulness in Story Summarization | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2024-07 | arXiv | - | - | benchmark, evaluation | |
| Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing? | ArXiv 2024 (arXiv preprint) | 2024-07 | arXiv | - | - | benchmark, evaluation | |
| Are Large Language Models Capable of Generating Human-Level Narratives? | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2024-07 | arXiv | - | - | benchmark, evaluation | |
| Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation | TACL 2024 (Transactions of the Association for Computational Linguistics) | 2024-05 | arXiv | - | - | benchmark, evaluation | |
| Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers | TACL 2024 (Transactions of the Association for Computational Linguistics) | 2024-03 | arXiv | - | - | benchmark, evaluation | |
| Learning Personalized Alignment for Evaluating Open-ended Text Generation | EMNLP 2024 (Conference on Empirical Methods in Natural Language Processing) | 2023-10 | arXiv | - | - | benchmark, evaluation | |
| A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing | EMNLP Findings 2023 (Findings of EMNLP) | 2023-10 | arXiv | - | - | benchmark, evaluation | |
| Art or Artifice? Large Language Models and the False Promise of Creativity | CHI 2023 (Conference on Human Factors in Computing Systems) | 2023-09 | arXiv | - | - | benchmark, evaluation | |
| HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation | ACL 2023 (Annual Meeting of the Association for Computational Linguistics) | 2023-06 | arXiv | - | - | benchmark, evaluation | |
| Can Large Language Models Be an Alternative to Human Evaluations? | ACL 2023 (Annual Meeting of the Association for Computational Linguistics) | 2023-05 | arXiv | - | - | benchmark, evaluation | |
| DeltaScore: Evaluating Story Generation with Differentiating Perturbations | EMNLP Findings 2023 (Findings of EMNLP) | 2023-03 | arXiv | - | - | benchmark, evaluation | |
| StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning | EMNLP 2022 (Conference on Empirical Methods in Natural Language Processing) | 2022-10 | arXiv | - | Code | benchmark, evaluation | |
| Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation | COLING 2022 (International Conference on Computational Linguistics) | 2022-08 | arXiv | - | - | benchmark, evaluation |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey | ArXiv 2026 (arXiv preprint) | 2026-02 | arXiv | - | - | - | dataset, survey |
| MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration | ArXiv 2026 (arXiv preprint) | 2026-02 | arXiv | - | - | - | dataset, resource |
| StoryWriter: A Multi-Agent Framework for Long Story Generation | ArXiv 2025 (arXiv preprint) | 2025-06 | arXiv | - | - | - | dataset, resource |
| Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation | ArXiv 2025 (arXiv preprint) | 2025-01 | arXiv | Project | Code | - | dataset, resource |
| Multi-Agent Based Character Simulation for Story Writing | IN2Writing 2025 (IN2Writing Workshop) | 2025-01 | Published | - | - | - | dataset, resource |
| BookWorm: A Dataset for Character Description and Analysis | EMNLP Findings 2024 (Findings of EMNLP) | 2024-10 | arXiv | - | - | dataset, dataset | |
| What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation | ArXiv 2024 (arXiv preprint) | 2024-08 | arXiv | - | - | dataset, survey | |
| The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories | EMNLP Workshop 2025 (EMNLP Workshop) | 2024-06 | arXiv | - | Code | dataset, dataset | |
| CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis | NAACL Findings 2025 (Findings of NAACL) | 2024-06 | arXiv | - | Code | dataset, resource | |
| The Value, Benefits, and Concerns of Generative AI-Powered Assistance in Writing | CHI 2024 (Conference on Human Factors in Computing Systems) | 2024-03 | arXiv | - | - | dataset, resource | |
| Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives | ACL Findings 2024 (Findings of ACL) | 2024-02 | arXiv | - | - | dataset, resource | |
| CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation | LREC-COLING 2024 (LREC-COLING) | 2024-02 | arXiv | - | Code | dataset, dataset | |
| Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | ArXiv 2024 (arXiv preprint) | 2024-02 | arXiv | - | - | - | dataset, resource |
| Weaver: Foundation Models for Creative Writing | ArXiv 2024 (arXiv preprint) | 2024-01 | arXiv | - | - | dataset, resource | |
| Reflections & Resonance: Two-Agent Partnership for Advancing LLM-based Story Annotation | LREC-COLING 2024 (LREC-COLING) | 2024-01 | Published | - | - | dataset, resource | |
| CLAUSE-ATLAS: A Corpus of Narrative Information to Scale up Computational Literary Analysis | LREC-COLING 2024 (LREC-COLING) | 2024-01 | Published | - | - | dataset, resource | |
| STONYBOOK: A System and Resource for Large-Scale Analysis of Novels | ArXiv 2023 (arXiv preprint) | 2023-11 | arXiv | - | - | dataset, resource | |
| Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding | EMNLP Findings 2023 (Findings of EMNLP) | 2023-10 | arXiv | - | - | dataset, resource | |
| StoryWars: A Dataset and Instruction Tuning Baselines for Collaborative Story Understanding and Generation | ACL 2023 (Annual Meeting of the Association for Computational Linguistics) | 2023-05 | arXiv | - | - | dataset, dataset | |
| Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey | Neurocomputing 2023 (Neurocomputing (Journal)) | 2022-12 | arXiv | - | - | - | dataset, survey |
| Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals | ArXiv 2022 (arXiv preprint) | 2022-09 | arXiv | Project | Code | - | dataset, screenplay |
| A corpus for understanding and generating moral stories | NAACL 2022 (North American Chapter of ACL) | 2022-04 | arXiv | - | - | dataset, resource |
| Title | Venue | Date | Paper | Project | Code | Citations | Tags |
|---|---|---|---|---|---|---|---|
| FireRed-OpenStoryline | GitHub 2026 (Open-source repository) | 2026-01 | - | Project | Code | - | tooling, open-source |
| ReasoningNCP (Official Repository) | GitHub 2025 (Open-source repository) | 2025-03 | arXiv | Project | Code | - | tooling, open-source |
| SEED-Story (Official Repository) | GitHub 2024 (Open-source repository) | 2024-07 | arXiv | Project | Code | - | tooling, open-source |
| IBSEN (Official Repository) | GitHub 2024 (Open-source repository) | 2024-07 | arXiv | Project | Code | - | tooling, open-source |
| RENarGen (Official Repository) | GitHub 2024 (Open-source repository) | 2024-04 | arXiv | Project | Code | - | tooling, open-source |
| fictionx-story-gen | GitHub 2024 (Open-source repository) | 2024-01 | - | Project | Code | - | tooling, open-source |
| SillyTavern | GitHub 2023 (Open-source repository) | 2023-01 | - | Project | Code | - | tooling, open-source |
| GOAT-Storytelling-Agent | GitHub 2023 (Open-source repository) | 2023-01 | - | Project | Code | - | tooling, open-source |
| Dramatron (Official Repository) | GitHub 2022 (Open-source repository) | 2022-09 | arXiv | Project | Code | - | tooling, screenplay |
| TavernAI | GitHub 2022 (Open-source repository) | 2022-01 | - | Project | Code | - | tooling, open-source |
- Check duplicate titles before adding new entries.
- Update README.md and README_zh.md together.
- Use YYYY-MM for Date.
- Keep Paper as one primary link (Published preferred, otherwise arXiv; use
-if unavailable).
If this repository helps your research or project, please cite:
@misc{lijunjie2026awesomellmstorygeneration,
title = {Awesome LLM Story Generation},
author = {Lijunjie},
year = {2026},
howpublished = {\url{https://github.com/lijunjie/awesome-llm-story-generation}},
note = {GitHub repository, accessed 2026-02-27}
}If you later change your GitHub account or repository name, update author and howpublished accordingly.