Generative AI

What is Generative AI?

Generative AI is a technology that can generate high-quality text, images, and videos. Generative AI is different from other forms of AI in that the Generative AI models are trained to create new data instead of predicting outcome from given data.

Where can I play with it?

https://chatgpt.com/ is the most common example of Gen AI where a user can chat with an AI model and receive high quality text responses. The AI model is aware of a wide variety of knowledge from around the world. It can do basic and some advanced math, it can write and run programs, and can even write poetry and stories. Gen AI models like Dall-E and Sora can also generate high quality images and videos respectively.

https://www.midjourney.com/imagine is another example of Generative AI where you can enter a text prompt and generate high-quality images. Giving the same prompt to two different models results in quite different responses. Therefore it’s important to understand the nuances and take the time to play with a few models before committing to one for your project.

How does it work?

Generative AI text models work by converting the user prompt into corresponding word vectors. The word vectors are then fed into a Large Language Model which then generates word vectors in response. These word vectors are then converted to text and sent to the user. The Large language models (LLMs) are trained on a very large corpus of text, and the models are very large in size - requiring billions of parameters to train. During training, the model learns the recurrence patterns of large chunks of text and predicts the next text.

Can I build one?

Yes - and no. Building an enterprise-level model requires detailed understanding of various ML technologies, access to a large amount of text and images/ videos, and computes. These requirements make it virtually impossible for an individual to create a high-quality Generative AI model from scratch. However, one can build a toy Generative AI model to get familiar with some of the technologies involved.

Here are some essential components one will need to build text to image model:

Text encoder: Converts input text into a numerical vector using pre-trained word embedding model (such as Word2Vec, GloVe)
Image generator: Lightweight GAN (Generative Adversarial Network) that takes text embedding and generates 32x32 pixel images. In this type of model there are two neural networks. The discriminator aims to maximize the probability of correctly classifying real data as "real" and fake data generated by the generator as "fake", while the generator aims to minimize the discriminator's ability to distinguish its generated data as fake, essentially trying to fool the discriminator into thinking it's real data.
Dataset: A simple dataset like COCO dataset (e.g. cats and dogs images).
Loss function: A loss function quantifies the gap between a model’s output and the actual target. Simply put, the loss function tells you how good the model’s predictions are. Binary Cross-Entropy Loss (aka log loss) is the most common loss function for GANs.
Post-processing: Maybe a simple post processing to upscale the image output and simple filters to smoothen the pixelation.

What is the timeline of Generative AI evolution?

Late 1980s: Introduction of Recurrent Neural Networks (RNNs) for processing sequential data.

1997: Long Short-Term Memory (LSTM) networks enhance order dependence handling.

2014: Introduction of Generative Adversarial Networks (GANs) for high-quality image generation.

2014+: Development of variational autoencoders (VAEs), diffusion models, and flow-based models for improved generative processes.

2017: Transformer models - allowing the model to process all parts of the text at the same time.

2018: OpenAI creates Generative Pre-trained Transformer (GPT) model.

2022: OpenAI releases ChatGPT.

2023: Meta releases Llama (Large language model Meta AI) - a family of models.

Generative AI

This website uses cookies.