AI Covers and Azure OpenAI in the Publish Flow
Photo: Unsplash
A book cover is the first thing a reader sees. For indie authors, that's often a problem: either you pay for a professional design, or you grab a generic image and hope it's good enough. Neither is satisfying.
We decided to build AI generation directly into the publish process. Not as an add-on you have to configure separately — but as an integrated step that's automatically offered when publishing.
The three-step publish wizard
Before I get into the AI covers, a bit of context: publishing on OutaStory runs through a three-step wizard.
Step 1 — Publish: The author enters title, slug, description, category, and monetization setting. Here she also decides whether ads should be enabled for the story.
Step 2 — Cover: The author chooses between three options: AI-generated cover, a flat color from a palette, or uploading her own image. The AI path is step 2a — it opens a sub-dialog.
Step 3 — Audio: The author picks a voice for the audio version and starts generation — or skips the step for now.
So the cover is its own dedicated step, not a downstream field in a long form. That turned out to be the right call during development, because cover generation is asynchronous and sometimes takes several attempts.
How AI cover generation works technically
The system uses Azure OpenAI with the gpt-image-1.5 model. That's Microsoft's managed access to OpenAI's image generation model — with the benefits of Azure: privacy compliance, regional availability, and integration with the rest of the Azure infrastructure.
The flow looks like this:
- The author can optionally enter a description prompt, or leave it blank. If she leaves it blank, the system automatically generates a prompt from the story's title and description.
- The system sends a request to the
coverprocessorAzure Service Bus queue. - The
OutaStory.ImageProcessorAzure Function reads the queue message, assembles the prompt, and callsgpt-image-1.5. - Each round generates three candidates — not one. The author gets three options to choose from.
- If none of them appeal to her, she can start a new round. A maximum of three rounds per draft — so up to nine candidates.
After at most three rounds, a decision has to be made, or the author switches to a flat color or her own upload.
Why three candidates per round?
That was an insight from early testing. Showing a single suggestion and asking "do you like this?" works worse than offering three options. With three candidates, the author can compare and choose — and if none of them fit, she gets a better sense of what she doesn't want, which makes the next round more targeted.
The three-times-three system also has an economic reason behind it: AI image generation costs money. Nine images per draft is a manageable expense. An unlimited number would not be.
How the prompt is built
The prompt for gpt-image-1.5 isn't raw free text. We assemble it from several components:
- A fixed system prefix that dictates the visual style (cover format, light/dark depending on genre, no text in the image)
- The story's genre as a stylistic signal
- The optional user prompt or the automatically generated description text
- An instruction not to include any readable text in the image — so the title can be overlaid separately
The last point cost us some time. Early versions sometimes rendered text into the images — often misspelled, always ugly. The system prefix that explicitly instructs "no text, no letters, no words" solved the problem.
Photo: Unsplash
Flat colors as an alternative
Not every author wants an AI-generated cover. That's why there's the flat-color option: a selection of color palettes matched to different genre moods. Dark purple for dark fantasy, orange-red for adventure, light blue for romantasy.
The flat color isn't a fallback, but a deliberate design statement: a minimalist cover with the title in large type can be just as memorable as an AI image — sometimes more so.
The upload field is the third option, for authors who already have a professional cover or want to create one themselves.
Feature flag: AiCoverGeneration
AI cover generation sits behind a feature flag: AiCoverGeneration. That lets us disable the feature on certain instances — for example when Azure quotas are exhausted, or when we're testing a new version of the prompt system.
Feature flags in OutaStory are configured per host via featureflags.json. The IFeatureFlags DI service reads the configuration and returns a simple bool. No complex flag infrastructure, no external dependency — just a JSON file that can be swapped out on deployment.
Next week the audio system is up, which also runs asynchronously via Service Bus and brings its own complexity.
What's next?
Next week: how every story automatically gets an audio version — with Azure Speech HD Voices, validated SSML, and chapter-by-chapter MP3 generation.
