CharacterGPT: Generating consistent character assets

Nov 12, 2023

Early GPT use cases

GPTs were released by OpenAI on November 6th and are already taking the community by storm. From what we've seen in the first week, there are 2 early use cases that we will explore here in time.

Automate simple workflows where chat-based iteration is an advantage
Provide quick-and-dirty proof of concepts for more complex workflows and sub-workflows

We explore here an example that actually traverses both of those. What we're able to create is likely sufficient for my current needs (#1), but it shows that there is huge potential for something much more powerful. One project I've been tinkering on is how to train a model like Stable Diffusion on brand guidelines. The objective would be to use brand colors, images and more to generate a plethora of assets that are typically used across website and social needs. But that's for another day.

Current challenges of ChatGPT with Dall-E

As I've built buildaifirst.com, one of the things I've struggled with is generating assets that are consistent. Early on, I decided on a character style I liked after some experimentation with Dall-E. However, there are multiple issues with generating assets in the traditional ChatGPT interface:

With each new image, you start a new chat where you need to provide all the context again.
If you continue in a previous chat, the agent is difficult to orient towards your new ask, and you get stuck between prompts.
If you find an asset that you really like and want one small iteration, a single wrong keyword can send the image in completely the wrong direction.

While I can't say yet that GPTs solve all of these, they certainly help reduce the impact of many of them.

5 Minute MVP

Going from 0 to 1 here is very straightforward. One important note is that my outline here is useful for generating repeatable characters, but that might not exactly extend to other design needs (ads, text-heavy images, etc). Watch the video for a more comprehensive walkthrough but my main learnings in getting to some sort of GPT with repeatable outputs are the following:

Specify your brand colors: While they won't be used everywhere, Dall-E seems good at integrating them subtly like in background colors and objects.
Describe your character in detail: You probably cannot overdo it here with the details. As I add more specifics on hair color, 3d style, clothing features and more, the outputs gain more and more consistency.
Iterate with the preview mode: Run through iterations with an idea in mind, and as you become more specific in the chat window, add those details to the general prompt. Note - once you add details to that prompt, the preview window will refresh, so try and learn enough so providing the update is worthwhile.
Optimize through self-training: As you collect more and more assets you like as outputs, feed these back as inputs to the GPT. I'm yet to see drastic improvements via this method, but the more context it has, the more consistent the outputs should be.

With time, I'm sure we can break this model up further into more GPTs to be higher performing in specific domains. Right now I'm thinking about character diversity (gender, ethnicity) and maybe surrounding scenes. The more specific we can make the input description, the more closely the output will match our dream state.

Taking it to the next level

This model will serve its purpose for me right now, but I'm very interested in going much deeper on #4 above with Stable Diffusion. Having experimented with some tools like tryleap.ai, it's clear that the more assets you can feed into the top of these models, the better they get. Using this GPT, I can generate 10s to 100s of assets that are on brand and feed those into my custom model which will leave less room for ambiguity. Then we can really bring the content up a level on the site, even adding voices and movement to our heroes.

As you build out your version, be sure to leave your learnings in the #how-I-built-this channel.

Sign up today.

Get Started

Twitter