Thank you for using Animagine XL 3.0 and becoming a pioneer of anime-themed open-source text-to-image models with us. We are truly glad that the model received overwhelmingly positive feedback from users on different platforms. We cannot thank all of you enough for the feedback, support, and excitement for what’s to come next.
To celebrate the success of Animagine XL 3.0, today, we are happy to introduce you to Animagine XL 3.1, the next iteration of our opinionated open-source anime text-to-image model and direct continuation of the Animagine XL V3 series. With enhanced knowledge, all-new configuration to address overexposure, and powerful new aesthetic tags, Animagine XL 3.1 represents a major leap forward in open anime image generation.
In this iteration, we are trying to improve the model’s capabilities and fix the issues that occurred in the last iteration. We would like to break down the details in this blog post.
Animagine XL 3.1 is undeniably a direct continuation of Animagine XL 3.0. We heard many suggestions and feedback to make the model suitable for everyone. Incremental learning is then implemented so we can update the model almost every month.
Using the base version of Animagine XL 3.0, we utilize 2x A100 80GB at Runpod. This model was pretrained against a data-rich collection of 870k ordered and tagged images for 15 days in the second half of February, approximately over 350 GPU hours. In this iteration, we’re focused on doing 3 things:
We are truly grateful to SeaArt for funding our model training process via Runpod Credits. Thanks for supporting the open source community, we truly appreciate it.
Inspired by what NovelAI did in their anime text-to-image model last year, NovelAI Diffusion V3, we still build our datasets with tag ordering, meaning that prompt order is crucial to get what you want.
For optimal results, it’s recommended to follow this structured prompt template:
1boy/1girl, what character, from which series, everything else in random order
On Animagine XL 3.0, we primarily focused on adding characters from popular gacha games. This expansion of the training data significantly enhanced Animagine XL 3.1’s knowledge base. In this iteration, we are integrating numerous well-known anime franchises into our dataset.
The model now understands a vast range of anime more deeply: from the legendary Neon Genesis Evangelion to the newly aired Kusuriya no Hitorigoto. It spans from the oldest anime art styles to the most modern art styles. This development doesn’t cover everything but significantly broadens Animagine XL’s capabilities in generating and recognizing characters, themes, and styles from a wide spectrum of anime history, catering to fans of various genres.
Animagine XL 3.1 utilizes a refined set of special tags to guide the model towards generating images with specific qualities, ratings, creation dates, and aesthetics. While not strictly necessary, these tags are powerful tools for achieving your desired results.
A major addition in Animagine XL 3.1 is the set of aesthetic tags, which categorize content based on visual appeal. These tags are derived from a specialized Vision Transformer (ViT) model, shadowlilac/aesthetic-shadow-v2. This tag, combined with quality tag, can be used to guide the model to generate better results. Below is the list of aesthetic tag, sorted from the best to the worst:
The quality tags in Animagine XL 3.1 have been updated to consider both scores and post ratings, ensuring a more balanced distribution of quality in the generated images. We’ve also made the labels clearer, such as changing ‘high quality’ to ‘great quality’. Below is the list of quality tag, sorted from the best to the worst:
The year range tags have been redefined to more accurately represent specific modern and vintage anime art styles. This simplified range focuses on distinct eras relevant to current and past anime aesthetics. Below is the list of year tag, sorted from the best to the worst:
In Animagine XL 3.0, we encountered several problems such as unbalanced quality tags distribution, gradients not syncing due to DDP issues when using multi GPUs. These factors caused the results to be overly sensitive and explicit, even with safe prompts, and also produced more artifacts and poor anatomy.
To address these issues in Animagine XL 3.1, we implemented the following changes:
There are several ways to get started with this model:
Based on Animagine XL 3.0, Animagine XL 3.1 falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points:
We chose this license to keep Animagine XL 3.1 and its derivatives (including this iteration) open and modifiable, aligning with open source community spirit. It protects contributors and users, encouraging a collaborative, ethical open-source community. This ensures the model not only benefits from communal input but also respects open-source development freedoms.