Portions of this article were written and edited by AI.
Artificial Intelligence (AI) has made significant advancements in recent years, and one astonishing development is the ability to generate images from text descriptions. But AI image generators don’t simply copy from a database – they create images from scratch by learning patterns from training data. This article will clarify how text-to-image models work their magic.
Training Text-to-Image Generative AI Models
The first step in developing a text-to-image model involves showing the AI a large number of image and text pairs during training. For instance, the caption “a red apple” would be paired with an image of a red apple. The more examples of each subject that are included in the training dataset, the better the model performs. Depending on the model’s complexity, this dataset can range from hundreds to millions of images. This extensive training allows the AI to learn the relationships between language and visuals.
A common model architecture used to power these AIs is called Generative Adversarial Networks (GANs). GANs consist of two key components:
Generator: Creates images based on text prompts.
Discriminator: Evaluates the generated images and provides feedback.
When the generator is first created, it knows nothing about translating text into accurate images. It begins by producing random noise images, which are unrecognizable garbled messes of static pixels.
Over many training cycles, the discriminator tells the generator which of its random images starts to resemble the paired text captions. Based on this feedback, the generator gradually adjusts its process for translating text into image data. It learns to map words to visual features such as shapes, colors, and textures. Eventually, the generator can render coherent and sensible images from text prompts alone.
Synthesizing New Images
During model use, a person types in a text prompt and the generator gets to work. It starts by generating a random noise image. Then based on patterns learned from training data related to that text, the model tweaks the noise pixel-by-pixel until an appropriate image emerges based on what it has been trained to accept something like what was described should look like.
For example, the prompt “a towering ice castle” may cause the generator to create elements similar to what it learned from images of frozen landscapes, tall buildings, and royal palaces during training. The result: a brand new icy castle scene!
This ability to create sensible, unique images from text demonstrates how AI models can build on randomness and prior learning to create novel images. The possibilities are endless as these models continue to advance.
Welcome to my Midjourney beginner’s guide. If you already have a Discord account and a Midjourney subscription, skip this introduction. Very soon we’ll be able to make Midjourney images on their website. For now, you need a Discord account. Once you have Discord, go to Midjourney.com and click “Join the Beta” to join the Midjourney Discord server. Then you’ll also need to go to www.midjourney.com/account to sign up for a subscription. That’s the quick version of how to get started.
Once you’re inside the Midjourney server there are a bunch of channels on the right. You can make images in a Newbie or General channel, but it gets crowded so I recommend you set up a private chat with Midjourney. To do this, view the member list inside Midjourney by selecting the people icon in the top right, then right-click on “Midjourney bot” and choose “Message.”
Now to the fun part. To start creating type /imagine in a message to Discord. Hit space and the word “Prompt” should appear, or you can select it from the recommended actions that appear as you type. Then type what you want to appear. It’s good (although not always necessary) to be specific in your description, including the subject, adjectives, background, composition, style, and other important details. For example, “child’s drawing of a small boat with two sails, in the bay, on a bright blue day, wide shot”
Think About What Details Matter
Anything left unsaid may surprise you. Be as specific or vague as you want, but anything you leave out will be randomized. Being vague is a great way to get variety, but you may not get the specific details you want.
Midjourney is a powerful tool that can create stunning images, but it has its limits and won’t always include everything you write. One thing you can do to help it pay attention to your details better is to change a setting. Simply add --style raw to the end of your prompt. This is especially supposed to help you get the specific style you want. If you leave it out Midjourney will take more creative control of the image style. Here’s an example from Midjourney’s documentation, which is a great resource.
Midjourney will create 4 attempts of your image, but you can click the upscale button to get a single image. It’s 1-4 from left to right, top to bottom. The version buttons allow you to get new images based on one of the four shown. I’ve used the version button occasionally, but I am frequently unimpressed with the results and find it better to just start from scratch to make tweaks.
Settings
Here’s the complete list of Parameters, and we’ll cover the ones I think are the most useful. Check out Midjourney’s documentation if you want to explore them further.
Parameters
Aspect Ratios
Chaos
No
Quality
Repeat
Seeds
Stop
Style
Stylize
Tile
Version
Video
Weird
AspectRatios
To get custom aspect ratios you need to type --ar followed by a space and the ratio. You can type virtually any ratio, but the most common are 3:2 (standard photography size) and 16:9 (standard video size). You can use 2:1 for a more panoramic ratio, or 3:4 and 4:5 for a common painting canvas ratio (I.e. if you want to hang your image as a canvas print on the wall). Of course, any of these ratios can be reversed to get a vertical image. The default size, if you don’t include --ar, is a square (1:1).
A quick note: parameters require no space between the dashes. This can be tricky on Apple mobile devices because it turns two dashes into a long dash by default. I don’t know if there’s a better solution, but I have to type it with a space, then go back and delete the space for it to work.
Chaos
I like the Chaos parameter to get more compositions from my prompt. The values range from a default of 0 to 100. Say I want to create four images of a sail boat, but I want them all to be very different in composition. --Chaos 100 makes that possible. The more vague you are in the prompt the more different the images will be. If you leave out a style description you could get very different styles. If you include a style, then it’s more likely something else will be varied, like the background, framing, angle, or even the appearance of your subject, etc. This isn’t a hard fast rule; see in the image below using an extreme chaos value of 100 resulted in only one looking like a child’s drawing.
Negative Prompting
Negative prompting is helpful if you find Midjourney is doing something that you didn’t ask for but don’t want. I’ve used this as --no text, if I want to make sure it doesn’t try to add its own text to an image, like a poster; it doesn’t always work. I’ve also done things like a cute monster --no sharp teeth Of all the parameters in this article, this is the one I’ve needed to use the least.
Stylize
We’ve already mentioned style, which allows you to pick raw mode. Stylize on the other hand is more about the amount of detail and well, style. The values range from 0 to 1000. The default is 100. A higher value will generally have more detail and more professionalism. If you’re going for a more candid and less staged photograph look, use a lower value.
Conclusion
That’s my Midjourney beginners guide. Remember, it can take practice and finesse to master prompt craft. If you don’t get what you want the first time, don’t assume it can’t be done. Experiment, read the documentation, and ask for help and feedback from other users, on Discord or other support groups online. Although it’s not covered in this article, it might be worth checking out what the “quality” parameter does. Other useful settings include the “Advanced Prompts” and “Commands” sections of the documentation. Good luck and have fun!
Do you want to save time or don’t want to buy a Midjourney subscription, but need some stunning stock images? Drop me a line with your image request, and I’ll take care of the prompt craft. Get images through AI Depository for as low as $1!