Anime Engagement & Popularity Dynamics

Medallion Architecture • Feature Engineering • Statistical Analysis • BI Dashboards
Data Engineering • EDA • Correlation Analysis • Time Series • Power BI • Tableau

⛩️ Executive Summary

"A full end-to-end data analytics pipeline investigating whether anime quality drives popularity — built on Databricks Medallion Architecture with statistical validation, EDA, time series analysis and BI dashboards across Power BI and Tableau."

This project analyses 3,818 anime titles from MyAnimeList (2010–2025) to answer a core industry question:

Does more anime production lead to better content — and does quality actually drive popularity?

The central finding is a confirmed Anime Quality Paradox:

More production ≠ better quality
More popularity ≠ higher quality
Quality moderately drives popularity (Pearson r = 0.567, p < 0.001)
The industry self-corrected post-2021 — Modern Renaissance era produces the highest quality scores in the dataset

📂 Repository Structure

📁 notebooks/
   ├── 01_data_ingestion.ipynb            → Bronze layer — raw CSV ingestion
   ├── 02_data_cleaning.ipynb             → Silver layer — cleaning & filtering
   ├── 03_gold_feature_engineering.ipynb  → Gold layer — feature engineering
   ├── 04_EDA_Analysis.ipynb              → 18 research questions across EDA
   ├── 05_Correlation_Analysis.ipynb      → Pearson & Spearman correlation
   └── 06_Time_Series.ipynb               → YoY production & quality trends

📁 dashboards/
   ├── PowerBI_Dashboard.pbix             → 3-page interactive business dashboard
   └── Tableau_Story.twbx                 → 5-slide narrative story

📁 data/
   └── gold_analysis.csv                     → Exported gold layer dataset

📁 report/
   └── Anime_Engagement_Popularity_Dynamics.pdf

📁 Visualization/
   ├── EDA       
   ├── Correlation Analysis          
   ├── Time Series Analysis

🌐 Background & Overview

The global anime industry has undergone dramatic transformation over the past 15 years.

Streaming platforms like Crunchyroll and Netflix accelerated internationalisation
Annual production doubled from ~140 titles in 2010 to a peak of ~280 by 2016–2017
By 2025 the global anime market reached approximately USD 36–38 billion, growing at 7–9% CAGR

This rapid expansion created a fundamental tension between production volume and content quality — and between critical quality and audience popularity. This project quantifies that tension using a full Medallion Architecture data pipeline on Databricks, covering ingestion, cleaning, feature engineering, statistical analysis and business intelligence visualisation.

Project Type: End-to-end data analytics portfolio project Domain: Media & Entertainment Analytics Period: 2010 – 2025 Platform: Databricks Community Edition

📊 Dataset Overview

Property	Value
Source	MyAnimeList (MAL)
Raw dataset	28,858 titles (full MAL catalogue)
After year filter (2010–2026)	3,818 titles
Scored anime	~3,018 (79%)
Unscored anime	~800 (21%)
Score range	2.5 – 9.3
Members range	34 – 4,192,911
Unique genres	21
Unique studios	500+

Missing Value Treatment

Column	Missing %	Treatment	Reason
demographics	56.87%	"Unknown"	Too high to impute reliably
title_english	30.35%	Fill with title	Logical fallback
themes	28.33%	"None"	Categorical — not numeric
score	21.25%	0	Unrated ≠ average score
scored_by	21.25%	0	Consistent with score = 0
studios	12.83%	"Unknown"	Categorical
genres	6.01%	"None"	Categorical
episodes	4.04%	Median (12)	Standard seasonal length
rating	1.92%	"Unknown"	Categorical
rank	2.44%	0	Numeric placeholder

Design Decision: Score filled with 0 rather than mean/median. Imputing average would artificially inflate quality metrics for unrated anime. All score-based analysis filters score > 0 before calculation.

🔧 Technical Process

🏗️ Pipeline — Medallion Architecture

Raw CSV  ──►  🥉 Bronze Layer  ──►  🥈 Silver Layer  ──►  🥇 Gold Layer  ──►  📊 Analysis & BI
              Ingestion              Cleaning &              Feature              EDA · Correlation
                                     Filtering               Engineering          Time Series · Dashboards

🔧 Feature Engineering — Gold Layer

Feature	Formula	Purpose
`engagement_score`	`log(members) + log(favorites+1)`	Composite audience interest; log normalises right-skew
`score_z`	`(score − mean) / std`	Quality normalised vs full dataset; `score > 0` only
`popularity_momentum`	`members / (scored_by+1)`	High = viral casual · Low = dedicated fanbase
`retention_proxy`	`favorites / (members+1)`	Proportion of viewers who became loyal fans
`score_tier`	Score range buckets	Excellent ≥8.0 · Good ≥7.0 · Average ≥6.0 · Below Average >0
`airing_era`	Year-based labels	Digital Boom · Streaming Revolution · Global Expansion · Modern Renaissance

Assumptions & Caveats:

Log transformation applied throughout to normalise skewed distributions

Genre analysis restricted to genres with 50+ titles (small sample bias prevention)

Studio quality analysis restricted to studios with 10+ anime (same reason)

Popularity momentum set to 0 for unscored anime

Score z-score returns null for unscored anime (not extreme negative value)

📚 Analysis Notebooks

Notebook	Method	Output
`04_EDA_Analysis`	18 research questions — distributions, genres, studios, correlations	Charts, tier breakdown, hidden gems, overhyped titles
`05_Correlation_Analysis`	Pearson (score vs log members) + Spearman (score vs raw members); genre-level for 20+ observations	r = 0.567, genre correlation table
`06_Time_Series`	YoY production growth, 3-year rolling average, quality trend, genre growth 2010–2017 vs 2018–2025	Production cycle, quality recovery curve

📈 Key Results

🎌 Core Statistical Finding

Metric	Value
Pearson r (Score vs Log Members)	0.567
Spearman r (Score vs Members)	0.568
p-value	< 0.001
Interpretation	Moderate positive — quality influences but does not solely determine popularity

🏆 Quality at a Glance

Metric	Value
Total anime analysed	3,818
Excellent tier (score ≥ 8.0)	8% only
Best quality year	2024 — avg score 7.06
Worst quality year	2017 — avg score 6.61
Quality swing (2017 → 2024)	+0.45 points
Top genre by score (50+ titles)	Drama — 7.28
Top studio by score (10+ titles)	Ufotable — 7.98
Audience drop-off	677M members → 347M rated → 7M favorites

🌸 Era Quality Analysis

Era	Period	Avg Score Z	Verdict
Digital Boom Era	2010–2012	+0.140	✅ Above average
Streaming Revolution	2013–2016	−0.115	❌ Below average
Global Expansion Era	2017–2020	−0.133	❌ Worst era
Modern Renaissance	2021–2025	+0.143	🏆 Best era

🎭 Genre-Level Correlation

Genre	Pearson r	Relationship
Sports	0.633	Quality strongly predicts popularity
Supernatural	0.628	Quality strongly predicts popularity
Sci-Fi	0.603	Quality strongly predicts popularity
Drama	0.556	Moderate relationship
Action	0.551	Moderate relationship
Romance	0.441	Weak — audience preference dominates

🎛️ Dashboards

Power BI — 3-Page Interactive Dashboard

Page	Title	Content
1	Project Overview	5 key findings · 5 analytical dimensions
2	Overview	KPIs · top genres · score tier donut · year / season / source slicers
3	Analysis	Production trend · era diverging bar · engagement funnel · scatter plot

Tableau — 5-Slide Narrative Story · "From Boom to Renaissance: The Evolution of Anime Quality"

Slide	Title	Chart Type
1	Did Growth Kill Quality?	Dual-axis bar + line
2	The Streaming Effect	Diverging bar by era
3	Quality Has a Genre	Horizontal bar with colour gradient
4	Popular ≠ Good	Scatter with r = 0.57 trend line annotation
5	Hidden Gems vs Broken Promises	Scatter with category colours

🔍 Insights & Deep Dive

✔ Insight 1 — Excellence is Rare but Recovering

Only 8% of 3,818 anime achieve Excellent status (8.0+). The largest group is Average at 29.4%, Good at 28.3%, Unscored at 21% and Below Average at 13.2%. Despite rarity, the Modern Renaissance era (2021–2025) shows avg z-score +0.143 — the highest of all four eras. The worst year was 2017 at 6.61. By 2024 average recovered to 7.06. Less output, better results.

✔ Insight 2 — Volume Dilutes Genre Quality

Comedy has 1,478 titles and scores ~6.85. Drama has 411 titles and scores 7.28. The pattern is consistent across all genres: more titles = lower average score. Suspense and Drama also build the most loyal fanbases by retention proxy. The same genres that score highest retain fans most deeply. Action viewers watch casually. Drama viewers become fans.

✔ Insight 3 — Franchise Loyalty Overrides Quality

Yakusoku no Neverland S2 dropped from 8.7 → 5.25 (−3.45 points, largest in dataset) yet retained 973,450 members. Tokyo Ghoul:re scores 6.37 but holds 1.26M members. Two distinct overhype drivers: sequel disappointments and fanservice titles. Franchise equity sustains audiences long after quality has deteriorated.

✔ Insight 4 — Streaming First Hurt Then Helped Quality

Production peaked at ~280 titles in 2016–2017. Average score hit lowest at 6.61 in 2017. The inverse relationship is statistically visible. Post-2021 platforms shifted to prestige content investment. The Modern Renaissance recovery to +0.143 confirms the correction is real and measurable in the data.

✔ Insight 5 — Best Anime Rarely Reach Mainstream

Hidden gems (score_z > 1, scored_by ≥ 1,000, members ≥ 5,000, popularity > 5,000):

Title	Score	Members	Score Z
IDOLiSH7 Third Beat! Part 2	8.34	17,101	+1.68
Chiikawa	8.26	10,352	+1.59
Pui Pui Molcar	8.01	18,750	+1.30
Love Live! Superstar!! 3rd	7.89	20,223	+1.16

All four are niche continuations with passionate small fanbases. Outstanding quality exists throughout the catalogue but visibility is determined by format, franchise and genre reach — not by score alone.

🏁 Recommendations

🎬 For Studios

Prioritise quality over volume — selective production consistently outperforms high-volume output on score, retention and brand value
Invest in Drama, Suspense and Mystery — highest average scores and strongest fan loyalty in the dataset
Protect sequel quality — the Neverland collapse (−3.45 pts) is the clearest data-backed example of franchise equity destruction

📺 For Streaming Platforms

Volume-based acquisition does not improve catalogue quality — the 2013–2020 period is direct evidence
Surface hidden gems through recommendation systems — score_z > 1, popularity rank > 5,000 titles represent an underserved audience with strong loyalty signals
Prestige content investment works — it produced the measurable quality recovery seen in the Modern Renaissance

📊 For Industry Analysts

Track annual production volume as a leading quality indicator — the inverse cycle is observable and predictable
Score trajectory across sequential seasons is more reliable than raw member retention for franchise health assessment

⚠️ Limitations

#	Limitation
1	Data sourced from MyAnimeList only — may skew toward active online communities over casual global viewers
2	Multi-valued genre entries require Python-side exploding — not natively supported in all BI tools
3	Ratings reflect only users who submitted scores — passive viewers may hold different quality opinions
4	2024–2025 data is partially incomplete — recently aired titles had not accumulated sufficient ratings at collection time
5	Member and favorites counts are cumulative and do not reflect when engagement occurred relative to air date

🚀 Tech Stack

🙌 Author

Karthick S

🏷️ Tags

data-analytics anime databricks pyspark medallion-architecture feature-engineering eda correlation-analysis time-series power-bi tableau media-analytics portfolio-project

✨ Built with data, curiosity, and an unhealthy amount of anime knowledge. ✨
🎌 Exploring the dynamics of anime popularity — one dataset at a time.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Visualization		Visualization
dashboards		dashboards
data		data
notebooks		notebooks
report		report
Anime Engagement Analytics Architecture Diagram.png		Anime Engagement Analytics Architecture Diagram.png
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Anime Engagement & Popularity Dynamics

⛩️ Executive Summary

📂 Repository Structure

🌐 Background & Overview

📊 Dataset Overview

Missing Value Treatment

🔧 Technical Process

🔧 Feature Engineering — Gold Layer

📚 Analysis Notebooks

📈 Key Results

🎌 Core Statistical Finding

🏆 Quality at a Glance

🌸 Era Quality Analysis

🎭 Genre-Level Correlation

🎛️ Dashboards

🔍 Insights & Deep Dive

✔ Insight 1 — Excellence is Rare but Recovering

✔ Insight 2 — Volume Dilutes Genre Quality

✔ Insight 3 — Franchise Loyalty Overrides Quality

✔ Insight 4 — Streaming First Hurt Then Helped Quality

✔ Insight 5 — Best Anime Rarely Reach Mainstream

🏁 Recommendations

🎬 For Studios

📺 For Streaming Platforms

📊 For Industry Analysts

⚠️ Limitations

🚀 Tech Stack

🙌 Author

🏷️ Tags

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages