Generative AI for Beginners

转载 Microsoft

S 精选进阶 | 约 230 分钟阅读更新于 2026-03-02

本文为开源社区精选内容，由 Microsoft 原创。文中链接将跳转到原始仓库，部分图片可能加载较慢。

AI 导读

Generative AI for Beginners 21 Lessons -- Get Started Building with Generative AI. A comprehensive course by Microsoft Cloud Advocates covering everything from GenAI fundamentals to building...

Generative AI for Beginners

21 Lessons -- Get Started Building with Generative AI. A comprehensive course by Microsoft Cloud Advocates covering everything from GenAI fundamentals to building production applications with LLMs, RAG, agents, and fine-tuning.

Introduction to Generative AI and Large Language Models

Generative AI is artificial intelligence capable of generating text, images and other types of content. What makes it a fantastic technology is that it democratizes AI, anyone can use it with as little as a text prompt, a sentence written in a natural language. There's no need for you to learn a language like Java or SQL to accomplish something worthwhile, all you need is to use your language, state what you want and out comes a suggestion from an AI model. The applications and impact for this are huge, you write or understand reports, write applications and much more, all in seconds.

In this curriculum, we’ll explore how our startup leverages generative AI to unlock new scenarios in the education world and how we address the inevitable challenges associated with the social implications of its application and the technology limitations.

Introduction

This lesson will cover:

Introduction to the business scenario: our startup idea and mission.
Generative AI and how we landed on the current technology landscape.
Inner working of a large language model.
Main capabilities and practical use cases of Large Language Models.

Learning Goals

After completing this lesson, you will understand:

What generative AI is and how Large Language Models work.
How you can leverage large language models for different use cases, with a focus on education scenarios.

Scenario: our educational startup

Generative Artificial Intelligence (AI) represents the pinnacle of AI technology, pushing the boundaries of what was once thought impossible. Generative AI models have several capabilities and applications, but for this curriculum we'll explore how it's revolutionizing education through a fictional startup. We'll refer to this startup as our startup. Our startup works in the education domain with the ambitious mission statement of

improving accessibility in learning, on a global scale, ensuring equitable access to education and providing personalized learning experiences to every learner, according to their needs.

Our startup team is aware we’ll not be able to achieve this goal without leveraging one of the most powerful tools of modern times – Large Language Models (LLMs).

Generative AI is expected to revolutionize the way we learn and teach today, with students having at their disposal virtual teachers 24 hours a day who provide vast amounts of information and examples, and teachers able to leverage innovative tools to assess their students and give feedback.

To start, let’s define some basic concepts and terminology we’ll be using throughout the curriculum.

How did we get Generative AI?

Despite the extraordinary hype created lately by the announcement of generative AI models, this technology is decades in the making, with the first research efforts dating back to the 60s. We're now at a point with AI having human cognitive capabilities, like conversation as shown by for example OpenAI ChatGPT or Bing Chat, which also uses a GPT model for the web search Bing conversations.

Backing up a bit, the very first prototypes of AI consisted of typewritten chatbots, relying on a knowledge base extracted from a group of experts and represented into a computer. The answers in the knowledge base were triggered by keywords appearing in the input text. However, it soon became clear that such an approach, using typewritten chatbots, did not scale well.

A statistical approach to AI: Machine Learning

A turning point arrived during the 90s, with the application of a statistical approach to text analysis. This led to the development of new algorithms – known as machine learning – capable of learning patterns from data without being explicitly programmed. This approach allows machines to simulate human language understanding: a statistical model is trained on text-label pairings, enabling the model to classify unknown input text with a pre-defined label representing the intention of the message.

Neural networks and modern virtual assistants

In recent years, the technological evolution of hardware, capable of handling larger amounts of data and more complex computations, encouraged research in AI, leading to the development of advanced machine learning algorithms known as neural networks or deep learning algorithms.

Neural networks (and in particular Recurrent Neural Networks – RNNs) significantly enhanced natural language processing, enabling the representation of the meaning of text in a more meaningful way, valuing the context of a word in a sentence.

This is the technology that powered the virtual assistants born in the first decade of the new century, very proficient in interpreting human language, identifying a need, and performing an action to satisfy it – like answering with a pre-defined script or consuming a 3rd party service.

Present day, Generative AI

So that’s how we came to Generative AI today, which can be seen as a subset of deep learning.

After decades of research in the AI field, a new model architecture – called Transformer – overcame the limits of RNNs, being able to get much longer sequences of text as input. Transformers are based on the attention mechanism, enabling the model to give different weights to the inputs it receives, ‘paying more attention’ where the most relevant information is concentrated, regardless of their order in the text sequence.

Most of the recent generative AI models – also known as Large Language Models (LLMs), since they work with textual inputs and outputs – are indeed based on this architecture. What’s interesting about these models – trained on a huge amount of unlabeled data from diverse sources like books, articles and websites – is that they can be adapted to a wide variety of tasks and generate grammatically correct text with a semblance of creativity. So, not only did they incredibly enhance the capacity of a machine to ‘understand’ an input text, but they enabled their capacity to generate an original response in human language.

How do large language models work?

In the next chapter we are going to explore different types of Generative AI models, but for now let’s have a look at how large language models work, with a focus on OpenAI GPT (Generative Pre-trained Transformer) models.

Tokenizer, text to numbers: Large Language Models receive a text as input and generate a text as output. However, being statistical models, they work much better with numbers than text sequences. That’s why every input to the model is processed by a tokenizer, before being used by the core model. A token is a chunk of text – consisting of a variable number of characters, so the tokenizer's main task is splitting the input into an array of tokens. Then, each token is mapped with a token index, which is the integer encoding of the original text chunk.

Predicting output tokens: Given n tokens as input (with max n varying from one model to another), the model is able to predict one token as output. This token is then incorporated into the input of the next iteration, in an expanding window pattern, enabling a better user experience of getting one (or multiple) sentence as an answer. This explains why, if you ever played with ChatGPT, you might have noticed that sometimes it looks like it stops in the middle of a sentence.
Selection process, probability distribution: The output token is chosen by the model according to its probability of occurring after the current text sequence. This is because the model predicts a probability distribution over all possible ‘next tokens’, calculated based on its training. However, not always is the token with the highest probability chosen from the resulting distribution. A degree of randomness is added to this choice, in a way that the model acts in a non-deterministic fashion - we do not get the exact same output for the same input. This degree of randomness is added to simulate the process of creative thinking and it can be tuned using a model parameter called temperature.

How can our startup leverage Large Language Models?

Now that we have a better understanding of the inner working of a large language model, let’s see some practical examples of the most common tasks they can perform pretty well, with an eye to our business scenario. We said that the main capability of a Large Language Model is generating a text from scratch, starting from a textual input, written in natural language.

But what kind of textual input and output? The input of a large language model is known as a prompt, while the output is known as a completion, term that refers to the model mechanism of generating the next token to complete the current input. We are going to dive deep into what is a prompt and how to design it in a way to get the most out of our model. But for now, let’s just say that a prompt may include:

An instruction specifying the type of output we expect from the model. This instruction sometimes might embed some examples or some additional data.
1. Summarization of an article, book, product reviews and more, along with extraction of insights from unstructured data.
1. Creative ideation and design of an article, an essay, an assignment or more.
A question, asked in the form of a conversation with an agent.
A chunk of text to complete, which implicitly is an ask for writing assistance.
A chunk of code together with the ask of explaining and documenting it, or a comment asking to generate a piece of code performing a specific task.

The examples above are quite simple and are not intended to be an exhaustive demonstration of Large Language Models' capabilities. They are meant to show the potential of using generative AI, in particular but not limited to educational contexts.

Also, the output of a generative AI model is not perfect and sometimes the creativity of the model can work against it, resulting in an output which is a combination of words that the human user can interpret as a mystification of reality, or it can be offensive. Generative AI is not intelligent - at least in the more comprehensive definition of intelligence, including critical and creative reasoning or emotional intelligence; it is not deterministic, and it is not trustworthy, since fabrications, such as erroneous references, content, and statements, may be combined with correct information, and presented in a persuasive and confident manner. In the following lessons, we’ll be dealing with all these limitations and we’ll see what we can do to mitigate them.

Assignment

Your assignment is to read up more on generative AI and try to identify an area where you would add generative AI today that doesn't have it. How would the impact be different from doing it the "old way", can you do something you couldn't before, or are you faster? Write a 300 word summary on what your dream AI startup would look like and include headers like "Problem", "How I would use AI", "Impact" and optionally a business plan.

If you did this task, you might even be ready to apply to Microsoft's incubator, Microsoft for Startups Founders Hub we offer credits for both Azure, OpenAI, mentoring and much more, check it out!

Knowledge check

What's true about large language models?

You get the exact same response every time.
It does things perfectly, great at adding numbers, produce working code etc.
The response may vary despite using the same prompt. It's also great at giving you a first draft of something, be it text or code. But you need to improve on the results.

A: 3, an LLM is non-deterministic, the response varies, however, you can control its variance via a temperature setting. You also shouldn't expect it to do things perfectly, it's here to do the heavy-lifting for you which often means you get a good first attempt at something that you need to gradually improve.

Great Work! Continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 2 where we will look at how to explore and compare different LLM types!

Exploring and comparing different LLMs

With the previous lesson, we have seen how Generative AI is changing the technology landscape, how Large Language Models (LLMs) work and how a business - like our startup - can apply them to their use cases and grow! In this chapter, we're looking to compare and contrast different types of large language models (LLMs) to understand their pros and cons.

The next step in our startup's journey is exploring the current landscape of LLMs and understanding which are suitable for our use case.

Introduction

This lesson will cover:

Different types of LLMs in the current landscape.
Testing, iterating, and comparing different models for your use case in Azure.
How to deploy an LLM.

Learning Goals

After completing this lesson, you will be able to:

Select the right model for your use case.
Understand how to test, iterate, and improve the performance of your model.
Know how businesses deploy models.

Understand different types of LLMs

LLMs can have multiple categorizations based on their architecture, training data, and use case. Understanding these differences will help our startup select the right model for the scenario, and understand how to test, iterate, and improve performance.

There are many different types of LLM models, your choice of model depends on what you aim to use them for, your data, how much you're ready to pay and more.

Depending on if you aim to use the models for text, audio, video, image generation and so on, you might opt for a different type of model.

Audio and speech recognition. For this purpose, Whisper-type models are a great choice as they're general-purpose and aimed at speech recognition. It's trained on diverse audio and can perform multilingual speech recognition. Learn more about Whisper type models here.
Image generation. For image generation, DALL-E and Midjourney are two very well-known choices. DALL-E is offered by Azure OpenAI. Read more about DALL-E here and also in Chapter 9 of this curriculum.
Text generation. Most models are trained on text generation and you have a large variety of choices from GPT-3.5 to GPT-4. They come at different costs with GPT-4 being the most expensive. It's worth looking into the Azure OpenAI playground to evaluate which models best fit your needs in terms of capability and cost.
Multi-modality. If you're looking to handle multiple types of data in input and output, you might want to look into models like gpt-4 turbo with vision or gpt-4o - the latest releases of OpenAI models - which are capable to combine natural language processing to visual understanding, enabling interactions through multi-modal interfaces.

Selecting a model means you get some basic capabilities, that might not be enough however. Often you have company specific data that you somehow need to tell the LLM about. There are a few different choices on how to approach that, more on that in the upcoming sections.

Foundation Models versus LLMs

The term Foundation Model was coined by Stanford researchers and defined as an AI model that follows some criteria, such as:

They are trained using unsupervised learning or self-supervised learning, meaning they are trained on unlabeled multi-modal data, and they do not require human annotation or labeling of data for their training process.
They are very large models, based on very deep neural networks trained on billions of parameters.
They are normally intended to serve as a ‘foundation’ for other models, meaning they can be used as a starting point for other models to be built on top of, which can be done by fine-tuning.

Image source: Essential Guide to Foundation Models and Large Language Models | by Babar M Bhatti | Medium

To further clarify this distinction, let’s take ChatGPT as an example. To build the first version of ChatGPT, a model called GPT-3.5 served as the foundation model. This means that OpenAI used some chat-specific data to create a tuned version of GPT-3.5 that was specialized in performing well in conversational scenarios, such as chatbots.

Image source: 2108.07258.pdf (arxiv.org)

Open Source versus Proprietary Models

Another way to categorize LLMs is whether they are open source or proprietary.

Open-source models are models that are made available to the public and can be used by anyone. They are often made available by the company that created them, or by the research community. These models are allowed to be inspected, modified, and customized for the various use cases in LLMs. However, they are not always optimized for production use, and may not be as performant as proprietary models. Plus, funding for open-source models can be limited, and they may not be maintained long term or may not be updated with the latest research. Examples of popular open source models include Alpaca, Bloom and LLaMA.

Proprietary models are models that are owned by a company and are not made available to the public. These models are often optimized for production use. However, they are not allowed to be inspected, modified, or customized for different use cases. Plus, they are not always available for free, and may require a subscription or payment to use. Also, users do not have control over the data that is used to train the model, which means they should entrust the model owner with ensuring commitment to data privacy and responsible use of AI. Examples of popular proprietary models include OpenAI models, Google Bard or Claude 2.

Embedding versus Image generation versus Text and Code generation

LLMs can also be categorized by the output they generate.

Embeddings are a set of models that can convert text into a numerical form, called embedding, which is a numerical representation of the input text. Embeddings make it easier for machines to understand the relationships between words or sentences and can be consumed as inputs by other models, such as classification models, or clustering models that have better performance on numerical data. Embedding models are often used for transfer learning, where a model is built for a surrogate task for which there’s an abundance of data, and then the model weights (embeddings) are re-used for other downstream tasks. An example of this category is OpenAI embeddings.

Image generation models are models that generate images. These models are often used for image editing, image synthesis, and image translation. Image generation models are often trained on large datasets of images, such as LAION-5B, and can be used to generate new images or to edit existing images with inpainting, super-resolution, and colorization techniques. Examples include DALL-E-3 and Stable Diffusion models.

Text and code generation models are models that generate text or code. These models are often used for text summarization, translation, and question answering. Text generation models are often trained on large datasets of text, such as BookCorpus, and can be used to generate new text, or to answer questions. Code generation models, like CodeParrot, are often trained on large datasets of code, such as GitHub, and can be used to generate new code, or to fix bugs in existing code.

Encoder-Decoder versus Decoder-only

To talk about the different types of architectures of LLMs, let's use an analogy.

Imagine your manager gave you a task for writing a quiz for the students. You have two colleagues; one oversees creating the content and the other oversees reviewing them.

The content creator is like a Decoder only model, they can look at the topic and see what you already wrote and then he can write a course based on that. They are very good at writing engaging and informative content, but they are not very good at understanding the topic and the learning objectives. Some examples of Decoder models are GPT family models, such as GPT-3.

The reviewer is like an Encoder only model, they look at the course written and the answers, noticing the relationship between them and understanding context, but they are not good at generating content. An example of Encoder only model would be BERT.

Imagine that we can have someone as well who could create and review the quiz, this is an Encoder-Decoder model. Some examples would be BART and T5.

Service versus Model

Now, let's talk about the difference between a service and a model. A service is a product that is offered by a Cloud Service Provider, and is often a combination of models, data, and other components. A model is the core component of a service, and is often a foundation model, such as an LLM.

Services are often optimized for production use and are often easier to use than models, via a graphical user interface. However, services are not always available for free, and may require a subscription or payment to use, in exchange for leveraging the service owner’s equipment and resources, optimizing expenses and scaling easily. An example of a service is Azure OpenAI Service, which offers a pay-as-you-go rate plan, meaning users are charged proportionally to how much they use the service Also, Azure OpenAI Service offers enterprise-grade security and a responsible AI framework on top of the models' capabilities.

Models are just the Neural Network, with the parameters, weights, and others. Allowing companies to run locally, however, would need to buy equipment, build a structure to scale and buy a license or use an open-source model. A model like LLaMA is available to be used, requiring computational power to run the model.

How to test and iterate with different models to understand performance on Azure

Once our team has explored the current LLMs landscape and identified some good candidates for their scenarios, the next step is testing them on their data and on their workload. This is an iterative process, done by experiments and measures. Most of the models we mentioned in previous paragraphs (OpenAI models, open source models like Llama2, and Hugging Face transformers) are available in the Model Catalog in Azure AI Studio.

Azure AI Studio is a Cloud Platform designed for developers to build generative AI applications and manage the whole development lifecycle - from experimentation to evaluation - by combining all Azure AI services into a single hub with an handy GUI. The Model Catalog in Azure AI Studio enables the user to:

Find the Foundation Model of interest in the catalog - either proprietary or open source, filtering by task, license, or name. To improve searchability, the models are organized into collections, like Azure OpenAI collection, Hugging Face collection, and more.

Review the model card, including a detailed description of intended use and training data, code samples and evaluation results on the internal evaluations library.

Compare benchmarks across models and datasets available in the industry to assess which one meets the business scenario, through the Model Benchmarks pane.

Fine-tune the model on custom training data to improve model performance in a specific workload, leveraging the experimentation and tracking capabilities of Azure AI Studio.

Deploy the original pre-trained model or the fine-tuned version to a remote real time inference - managed compute - or serverless api endpoint - pay-as-you-go - to enable applications to consume it.

[!NOTE] Not all models in the catalog are currently available for fine-tuning and/or pay-as-you-go deployment. Check the model card for details on the model's capabilities and limitations.

Improving LLM results

We’ve explored with our startup team different kinds of LLMs and a Cloud Platform (Azure Machine Learning) enabling us to compare different models, evaluate them on test data, improve performance and deploy them on inference endpoints.

But when shall they consider fine-tuning a model rather than using a pre-trained one? Are there other approaches to improve model performance on specific workloads?

There are several approaches a business can use to get the results they need from an LLM. You can select different types of models with different degrees of training when deploying an LLM in production, with different levels of complexity, cost, and quality. Here are some different approaches:

Prompt engineering with context. The idea is to provide enough context when you prompt to ensure you get the responses you need.
Retrieval Augmented Generation, RAG. Your data might exist in a database or web endpoint for example, to ensure this data, or a subset of it, is included at the time of prompting, you can fetch the relevant data and make that part of the user's prompt.
Fine-tuned model. Here, you trained the model further on your own data which led to the model being more exact and responsive to your needs but might be costly.

Img source: Four Ways that Enterprises Deploy LLMs | Fiddler AI Blog

Prompt Engineering with Context

Pre-trained LLMs work very well on generalized natural language tasks, even by calling them with a short prompt, like a sentence to complete or a question – the so-called “zero-shot” learning.

However, the more the user can frame their query, with a detailed request and examples – the Context – the more accurate and closest to user’s expectations the answer will be. In this case, we talk about “one-shot” learning if the prompt includes only one example and “few shot learning” if it includes multiple examples. Prompt engineering with context is the most cost-effective approach to kick-off with.

Retrieval Augmented Generation (RAG)

LLMs have the limitation that they can use only the data that has been used during their training to generate an answer. This means that they don’t know anything about the facts that happened after their training process, and they cannot access non-public information (like company data). This can be overcome through RAG, a technique that augments prompt with external data in the form of chunks of documents, considering prompt length limits. This is supported by Vector database tools (like Azure Vector Search) that retrieve the useful chunks from varied pre-defined data sources and add them to the prompt Context.

This technique is very helpful when a business doesn’t have enough data, enough time, or resources to fine-tune an LLM, but still wishes to improve performance on a specific workload and reduce risks of fabrications, i.e., mystification of reality or harmful content.

Fine-tuned model

Fine-tuning is a process that leverages transfer learning to ‘adapt’ the model to a downstream task or to solve a specific problem. Differently from few-shot learning and RAG, it results in a new model being generated, with updated weights and biases. It requires a set of training examples consisting of a single input (the prompt) and its associated output (the completion). This would be the preferred approach if:

Using fine-tuned models. A business would like to use fine-tuned less capable models (like embedding models) rather than high performance models, resulting in a more cost effective and fast solution.
Considering latency. Latency is important for a specific use-case, so it’s not possible to use very long prompts or the number of examples that should be learned from the model doesn’t fit with the prompt length limit.
Staying up to date. A business has a lot of high-quality data and ground truth labels and the resources required to maintain this data up to date over time.

Trained model

Training an LLM from scratch is without a doubt the most difficult and the most complex approach to adopt, requiring massive amounts of data, skilled resources, and appropriate computational power. This option should be considered only in a scenario where a business has a domain-specific use case and a large amount of domain-centric data.

Knowledge check

What could be a good approach to improve LLM completion results?

Prompt engineering with context
RAG
Fine-tuned model

A:3, if you have the time and resources and high quality data, fine-tuning is the better option to stay up to date. However, if you're looking at improving things and you're lacking time it's worth considering RAG first.

🚀 Challenge

Read up more on how you can use RAG for your business.

Great Work, Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 3 where we will look at how to build with Generative AI Responsibly!

Using Generative AI Responsibly

It's easy to be fascinated with AI and generative AI in particular, but you need to consider how you would use it responsibly. You need to consider things like how to ensure the output is fair, non-harmful and more. This chapter aims to provide you with the mentioned context, what to consider, and how to take active steps to improve your AI usage.

Introduction

This lesson will cover:

Why you should prioritize Responsible AI when building Generative AI applications.
Core principles of Responsible AI and how they relate to Generative AI.
How to put these Responsible AI principles into practice through strategy and tooling.

Learning Goals

After completing this lesson you will know:

The importance of Responsible AI when building Generative AI applications.
When to think and apply the core principles of Responsible AI when building Generative AI applications.
What tools and strategies are available to you to put the concept of Responsible AI into practice.

Responsible AI Principles

The excitement of Generative AI has never been higher. This excitement has brought a lot of new developers, attention, and funding to this space. While this is very positive for anyone looking to build products and companies using Generative AI, it is also important we proceed responsibly.

Throughout this course, we are focusing on building our startup and our AI education product. We’ll use the principles of Responsible AI: Fairness, Inclusiveness, Reliability/Safety, Security & Privacy, Transparency and Accountability. With these principles, we will explore how they relate to our use of Generative AI in our products.

Why Should You Prioritize Responsible AI

When building a product, taking a human-centric approach by keeping your user's best interest in mind leads to the best results.

The uniqueness of Generative AI is its power to create helpful answers, information, guidance, and content for users. This can be done without many manual steps which can lead to very impressive results. Without proper planning and strategies, it can also unfortunately lead to some harmful results for your users, your product, and society as a whole.

Let's look at some (but not all) of these potentially harmful results:

Hallucinations

Hallucinations are a term used to describe when an LLM produces content that is either completely nonsensical or something we know is factually wrong based on other sources of information.

Let's take for example we build a feature for our startup that allows students to ask historical questions to a model. A student asks the question Who was the sole survivor of Titanic?

The model produces a response such as the one below:

Prompt saying "Who was the sole survivor of the Titanic"

(Source: Flying bisons)

This is a very confident and thorough answer. Unfortunately, it is incorrect. Even with a minimal amount of research, one would discover there was more than one survivor of the Titanic disaster. For a student who is just starting to research this topic, this answer can be persuasive enough to not be questioned and treated as fact. The consequences of this can lead to the AI system being unreliable and negatively impact the reputation of our startup.

With each iteration of any given LLM, we have seen performance improvements around minimizing hallucinations. Even with this improvement, we as application builders and users still need to remain aware of these limitations.

Harmful Content

We covered in the earlier section when an LLM produces incorrect or nonsensical responses. Another risk we need to be aware of is when a model responds with harmful content.

Harmful content can be defined as:

Providing instructions or encouraging self-harm or harm to certain groups.
Hateful or demeaning content.
Guiding the planning of any type of attack or violent acts.
Providing instructions on how to find illegal content or commit illegal acts.
Displaying sexually explicit content.

For our startup, we want to make sure we have the right tools and strategies in place to prevent this type of content from being seen by students.

Lack of Fairness

Fairness is defined as “ensuring that an AI system is free from bias and discrimination and that they treat everyone fairly and equally.” In the world of Generative AI, we want to ensure that exclusionary worldviews of marginalized groups are not reinforced by the model’s output.

These types of outputs are not only destructive to building positive product experiences for our users, but they also cause further societal harm. As application builders, we should always keep a wide and diverse user base in mind when building solutions with Generative AI.

How to Use Generative AI Responsibly

Now that we have identified the importance of Responsible Generative AI, let's look at 4 steps we can take to build our AI solutions responsibly:

Measure Potential Harms

In software testing, we test the expected actions of a user on an application. Similarly, testing a diverse set of prompts users are most likely going to use is a good way to measure potential harm.

Since our startup is building an education product, it would be good to prepare a list of education-related prompts. This could be to cover a certain subject, historical facts, and prompts about student life.

Mitigate Potential Harms

It is now time to find ways where we can prevent or limit the potential harm caused by the model and its responses. We can look at this in 4 different layers:

Model. Choosing the right model for the right use case. Larger and more complex models like GPT-4 can cause more of a risk of harmful content when applied to smaller and more specific use cases. Using your training data to fine-tune also reduces the risk of harmful content.
Safety System. A safety system is a set of tools and configurations on the platform serving the model that help mitigate harm. An example of this is the content filtering system on the Azure OpenAI service. Systems should also detect jailbreak attacks and unwanted activity like requests from bots.
Metaprompt. Metaprompts and grounding are ways we can direct or limit the model based on certain behaviors and information. This could be using system inputs to define certain limits of the model. In addition, providing outputs that are more relevant to the scope or domain of the system.

It can also be using techniques like Retrieval Augmented Generation (RAG) to have the model only pull information from a selection of trusted sources. There is a lesson later in this course for building search applications

User Experience. The final layer is where the user interacts directly with the model through our application’s interface in some way. In this way we can design the UI/UX to limit the user on the types of inputs they can send to the model as well as text or images displayed to the user. When deploying the AI application, we also must be transparent about what our Generative AI application can and can’t do.

We have an entire lesson dedicated to Designing UX for AI Applications

Evaluate model. Working with LLMs can be challenging because we don’t always have control over the data the model was trained on. Regardless, we should always evaluate the model’s performance and outputs. It’s still important to measure the model’s accuracy, similarity, groundedness, and relevance of the output. This helps provide transparency and trust to stakeholders and users.

Operate a Responsible Generative AI solution

Building an operational practice around your AI applications is the final stage. This includes partnering with other parts of our startup like Legal and Security to ensure we are compliant with all regulatory policies. Before launching, we also want to build plans around delivery, handling incidents, and rollback to prevent any harm to our users from growing.

Tools

While the work of developing Responsible AI solutions may seem like a lot, it is work well worth the effort. As the area of Generative AI grows, more tooling to help developers efficiently integrate responsibility into their workflows will mature. For example, the Azure AI Content Safety can help detect harmful content and images via an API request.

Knowledge check

What are some things you need to care about to ensure responsible AI usage?

That the answer is correct.
Harmful usage, that AI isn't used for criminal purposes.
Ensuring the AI is free from bias and discrimination.

A: 2 and 3 are correct. Responsible AI helps you consider how to mitigate harmful effects and biases and more.

🚀 Challenge

Read up on Azure AI Content Safety and see what you can adopt for your usage.

Great Work, Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 4 where we will look at Prompt Engineering Fundamentals!

Prompt Engineering Fundamentals

Introduction

This module covers essential concepts and techniques for creating effective prompts in generative AI models. The way you write your prompt to an LLM also matters. A carefully-crafted prompt can achieve a better quality of response. But what exactly do terms like prompt and prompt engineering mean? And how do I improve the prompt input that I send to the LLM? These are the questions we'll try to answer within this chapter and the next.

Generative AI is capable of creating new content (e.g., text, images, audio, code etc.) in response to user requests. It achieves this using Large Language Models like OpenAI's GPT ("Generative Pre-trained Transformer") series that are trained for using natural language and code.

Users can now interact with these models using familiar paradigms like chat, without needing any technical expertise or training. The models are prompt-based - users send a text input (prompt) and get back the AI response (completion). They can then "chat with the AI" iteratively, in multi-turn conversations, refining their prompt until the response matches their expectations.

"Prompts" now become the primary programming interface for generative AI apps, telling the models what to do and influencing the quality of returned responses. "Prompt Engineering" is a fast-growing field of study that focuses on the design and optimization of prompts to deliver consistent and quality responses at scale.

Learning Goals

In this lesson, we learn what Prompt Engineering is, why it matters, and how we can craft more effective prompts for a given model and application objective. We'll understand core concepts and best practices for prompt engineering - and learn about an interactive Jupyter Notebooks "sandbox" environment where we can see these concepts applied to real examples.

By the end of this lesson we will be able to:

Explain what prompt engineering is and why it matters.
Describe the components of a prompt and how they are used.
Learn best practices and techniques for prompt engineering.
Apply learned techniques to real examples, using an OpenAI endpoint.

Key Terms

Prompt Engineering: The practice of designing and refining inputs to guide AI models toward producing desired outputs. Tokenization: The process of converting text into smaller units, called tokens, that a model can understand and process. Instruction-Tuned LLMs: Large Language Models (LLMs) that have been fine-tuned with specific instructions to improve their response accuracy and relevance.

Learning Sandbox

Prompt engineering is currently more art than science. The best way to improve our intuition for it is to practice more and adopt a trial-and-error approach that combines application domain expertise with recommended techniques and model-specific optimizations.

The Jupyter Notebook accompanying this lesson provides a sandbox environment where you can try out what you learn - as you go or as part of the code challenge at the end. To execute the exercises, you will need:

An Azure OpenAI API key - the service endpoint for a deployed LLM.
A Python Runtime - in which the Notebook can be executed.
Local Env Variables - complete the SETUP steps now to get ready.

The notebook comes with starter exercises - but you are encouraged to add your own Markdown (description) and Code (prompt requests) sections to try out more examples or ideas - and build your intuition for prompt design.

Illustrated Guide

Want to get the big picture of what this lesson covers before you dive in? Check out this illustrated guide, which gives you a sense of the main topics covered and the key takeaways for you to think about in each one. The lesson roadmap takes you from understanding the core concepts and challenges to addressing them with relevant prompt engineering techniques and best practices. Note that the "Advanced Techniques" section in this guide refers to content covered in the next chapter of this curriculum.

Our Startup

Now, let's talk about how this topic relates to our startup mission to bring AI innovation to education. We want to build AI-powered applications of personalized learning - so let's think about how different users of our application might "design" prompts:

Administrators might ask the AI to analyze curriculum data to identify gaps in coverage. The AI can summarize results or visualize them with code.
Educators might ask the AI to generate a lesson plan for a target audience and topic. The AI can build the personalized plan in a specified format.
Students might ask the AI to tutor them in a difficult subject. The AI can now guide students with lessons, hints & examples tailored to their level.

That's just the tip of the iceberg. Check out Prompts For Education - an open-source prompts library curated by education experts - to get a broader sense of the possibilities! Try running some of those prompts in the sandbox or using the OpenAI Playground to see what happens!

What is Prompt Engineering?

We started this lesson by defining Prompt Engineering as the process of designing and optimizing text inputs (prompts) to deliver consistent and quality responses (completions) for a given application objective and model. We can think of this as a 2-step process:

designing the initial prompt for a given model and objective
refining the prompt iteratively to improve the quality of the response

This is necessarily a trial-and-error process that requires user intuition and effort to get optimal results. So why is it important? To answer that question, we first need to understand three concepts:

Tokenization = how the model "sees" the prompt
Base LLMs = how the foundation model "processes" a prompt
Instruction-Tuned LLMs = how the model can now see "tasks"

Tokenization

An LLM sees prompts as a sequence of tokens where different models (or versions of a model) can tokenize the same prompt in different ways. Since LLMs are trained on tokens (and not on raw text), the way prompts get tokenized has a direct impact on the quality of the generated response.

To get an intuition for how tokenization works, try tools like the OpenAI Tokenizer shown below. Copy in your prompt - and see how that gets converted into tokens, paying attention to how whitespace characters and punctuation marks are handled. Note that this example shows an older LLM (GPT-3) - so trying this with a newer model may produce a different result.

Concept: Foundation Models

Once a prompt is tokenized, the primary function of the "Base LLM" (or Foundation model) is to predict the token in that sequence. Since LLMs are trained on massive text datasets, they have a good sense of the statistical relationships between tokens and can make that prediction with some confidence. Note that they don't understand the meaning of the words in the prompt or token; they just see a pattern they can "complete" with their next prediction. They can continue predicting the sequence till terminated by user intervention or some pre-established condition.

Want to see how prompt-based completion works? Enter the above prompt into the Azure OpenAI Studio Chat Playground with the default settings. The system is configured to treat prompts as requests for information - so you should see a completion that satisfies this context.

But what if the user wanted to see something specific that met some criteria or task objective? This is where instruction-tuned LLMs come into the picture.

Concept: Instruction Tuned LLMs

An Instruction Tuned LLM starts with the foundation model and fine-tunes it with examples or input/output pairs (e.g., multi-turn "messages") that can contain clear instructions - and the response from the AI attempt to follow that instruction.

This uses techniques like Reinforcement Learning with Human Feedback (RLHF) that can train the model to follow instructions and learn from feedback so that it produces responses that are better-suited to practical applications and more relevant to user objectives.

Let's try it out - revisit the prompt above, but now change the system message to provide the following instruction as context:

Summarize content you are provided with for a second-grade student. Keep the result to one paragraph with 3-5 bullet points.

See how the result is now tuned to reflect the desired goal and format? An educator can now directly use this response in their slides for that class.

Why do we need Prompt Engineering?

Now that we know how prompts are processed by LLMs, let's talk about why we need prompt engineering. The answer lies in the fact that current LLMs pose a number of challenges that make reliable and consistent completions more challenging to achieve without putting effort into prompt construction and optimization. For instance:

Model responses are stochastic. The same prompt will likely produce different responses with different models or model versions. And it may even produce different results with the same model at different times. Prompt engineering techniques can help us minimize these variations by providing better guardrails.
Models can fabricate responses. Models are pre-trained with large but finite datasets, meaning they lack knowledge about concepts outside that training scope. As a result, they can produce completions that are inaccurate, imaginary, or directly contradictory to known facts. Prompt engineering techniques help users identify and mitigate such fabrications e.g., by asking AI for citations or reasoning.
Models capabilities will vary. Newer models or model generations will have richer capabilities but also bring unique quirks and tradeoffs in cost & complexity. Prompt engineering can help us develop best practices and workflows that abstract away differences and adapt to model-specific requirements in scalable, seamless ways.

Let's see this in action in the OpenAI or Azure OpenAI Playground:

Use the same prompt with different LLM deployments (e.g, OpenAI, Azure OpenAI, Hugging Face) - did you see the variations?
Use the same prompt repeatedly with the same LLM deployment (e.g., Azure OpenAI playground) - how did these variations differ?

Fabrications Example

In this course, we use the term "fabrication" to reference the phenomenon where LLMs sometimes generate factually incorrect information due to limitations in their training or other constraints. You may also have heard this referred to as "hallucinations" in popular articles or research papers. However, we strongly recommend using "fabrication" as the term so we don't accidentally anthropomorphize the behavior by attributing a human-like trait to a machine-driven outcome. This also reinforces Responsible AI guidelines from a terminology perspective, removing terms that may also be considered offensive or non-inclusive in some contexts.

Want to get a sense of how fabrications work? Think of a prompt that instructs the AI to generate content for a non-existent topic (to ensure it is not found in the training dataset). For example - I tried this prompt:

Prompt: generate a lesson plan on the Martian War of 2076.

A web search showed me that there were fictional accounts (e.g., television series or books) on Martian wars - but none in 2076. Commonsense also tells us that 2076 is in the future and thus, cannot be associated with a real event.

So what happens when we run this prompt with different LLM providers?

Response 1: OpenAI Playground (GPT-35)

Response 2: Azure OpenAI Playground (GPT-35)

Response 3: : Hugging Face Chat Playground (LLama-2)

As expected, each model (or model version) produces slightly different responses thanks to stochastic behavior and model capability variations. For instance, one model targets an 8th grade audience while the other assumes a high-school student. But all three models did generate responses that could convince an uninformed user that the event was real.

Prompt engineering techniques like metaprompting and temperature configuration may reduce model fabrications to some extent. New prompt engineering architectures also incorporate new tools and techniques seamlessly into the prompt flow, to mitigate or reduce some of these effects.

Case Study: GitHub Copilot

Let's wrap this section by getting a sense of how prompt engineering is used in real-world solutions by looking at one Case Study: GitHub Copilot.

GitHub Copilot is your "AI Pair Programmer" - it converts text prompts into code completions and is integrated into your development environment (e.g., Visual Studio Code) for a seamless user experience. As documented in the series of blogs below, the earliest version was based on the OpenAI Codex model - with engineers quickly realizing the need to fine-tune the model and develop better prompt engineering techniques, to improve code quality. In July, they debuted an improved AI model that goes beyond Codex for even faster suggestions.

Read the posts in order, to follow their learning journey.

May 2023 | GitHub Copilot is Getting Better at Understanding Your Code
May 2023 | Inside GitHub: Working with the LLMs behind GitHub Copilot.
Jun 2023 | How to write better prompts for GitHub Copilot.
Jul 2023 | .. GitHub Copilot goes beyond Codex with improved AI model
Jul 2023 | A Developer's Guide to Prompt Engineering and LLMs
Sep 2023 | How to build an enterprise LLM app: Lessons from GitHub Copilot

You can also browse their Engineering blog for more posts like this one that shows how these models and techniques are applied for driving real-world applications.

Prompt Construction

We've seen why prompt engineering is important - now let's understand how prompts are constructed so we can evaluate different techniques for more effective prompt design.

Basic Prompt

Let's start with the basic prompt: a text input sent to the model with no other context. Here's an example - when we send the first few words of the US national anthem to the OpenAI Completion API it instantly completes the response with the next few lines, illustrating the basic prediction behavior.

Prompt (Input)	Completion (Output)
Oh say can you see	It sounds like you're starting the lyrics to "The Star-Spangled Banner," the national anthem of the United States. The full lyrics are ...

Complex Prompt

Now let's add context and instructions to that basic prompt. The Chat Completion API lets us construct a complex prompt as a collection of messages with:

Input/output pairs reflecting user input and assistant response.
System message setting the context for assistant behavior or personality.

The request is now in the form below, where the tokenization effectively captures relevant information from context and conversation. Now, changing the system context can be as impactful on the quality of completions, as the user inputs provided.

response = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

Instruction Prompt

In the above examples, the user prompt was a simple text query that can be interpreted as a request for information. With instruction prompts, we can use that text to specify a task in more detail, providing better guidance to the AI. Here's an example:

Prompt (Input)	Completion (Output)	Instruction Type
Write a description of the Civil War	returned a simple paragraph	Simple
Write a description of the Civil War. Provide key dates and events and describe their significance	returned a paragraph followed by a list of key event dates with descriptions	Complex
Write a description of the Civil War in 1 paragraph. Provide 3 bullet points with key dates and their significance. Provide 3 more bullet points with key historical figures and their contributions. Return the output as a JSON file	returns more extensive details in a text box, formatted as JSON that you can copy-paste to a file and validate as needed	Complex. Formatted.

Primary Content

In the above examples, the prompt was still fairly open-ended, allowing the LLM to decide what part of its pre-trained dataset was relevant. With the primary content design pattern, the input text is divided into two parts:

an instruction (action)
relevant content (that influences action)

Here's an example where the instruction is to "summarize this in 2 sentences".

Prompt (Input)	Completion (Output)
Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus. Summarize this in 2 short sentences	Jupiter, the fifth planet from the Sun, is the largest in the Solar System and is known for being one of the brightest objects in the night sky. Named after the Roman god Jupiter, it's a gas giant whose mass is two-and-a-half times that of all other planets in the Solar System combined.

Prompt (Input)

Completion (Output)

Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.
Summarize this in 2 short sentences

Jupiter, the fifth planet from the Sun, is the largest in the Solar System and is known for being one of the brightest objects in the night sky. Named after the Roman god Jupiter, it's a gas giant whose mass is two-and-a-half times that of all other planets in the Solar System combined.

The primary content segment can be used in various ways to drive more effective instructions:

Examples - instead of telling the model what to do with an explicit instruction, give it examples of what to do and let it infer the pattern.
Cues - follow the instruction with a "cue" that primes the completion, guiding the model towards more relevant responses.
Templates - these are repeatable 'recipes' for prompts with placeholders (variables) that can be customized with data for specific use cases.

Let's explore these in action.

Using Examples

This is an approach where you use the primary content to "feed the model" some examples of the desired output for a given instruction, and let it infer the pattern for the desired output. Based on the number of examples provided, we can have zero-shot prompting, one-shot prompting, few-shot prompting etc.

The prompt now consists of three components:

A task description
A few examples of the desired output
The start of a new example (which becomes an implicit task description)

Learning Type	Prompt (Input)	Completion (Output)
Zero-shot	"The Sun is Shining". Translate to Spanish	"El Sol está brillando".
One-shot	"The Sun is Shining" => ""El Sol está brillando". "It's a Cold and Windy Day" =>	"Es un día frío y ventoso".
Few-shot	The player ran the bases => Baseball The player hit an ace => Tennis The player hit a six => Cricket The player made a slam-dunk =>	Basketball

Note how we had to provide explicit instruction ("Translate to Spanish") in zero-shot prompting, but it gets inferred in the one-shot prompting example. The few-shot example shows how adding more examples allows models to make more accurate inferences with no added instructions.

Prompt Cues

Another technique for using primary content is to provide cues rather than examples. In this case, we are giving the model a nudge in the right direction by starting it off with a snippet that reflects the desired response format. The model then "takes the cue" to continue in that vein.

Number of Cues	Prompt (Input)	Completion (Output)
0	Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. Summarize This	Jupiter is the largest planet in our Solar System and the fifth one from the Sun. It is a gas giant with a mass 1/1000th of the Sun's, but it is heavier than all the other planets combined. Ancient civilizations have known about Jupiter for a long time, and it is easily visible in the night sky..
1	Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. Summarize This What we learned is that Jupiter	is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets combined. It is easily visible to the naked eye and has been known since ancient times.
2	Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. Summarize This Top 3 Facts We Learned:	1. Jupiter is the fifth planet from the Sun and the largest in the Solar System. 2. It is a gas giant with a mass one-thousandth that of the Sun... 3. Jupiter has been visible to the naked eye since ancient times ...

Prompt Templates

A prompt template is a pre-defined recipe for a prompt that can be stored and reused as needed, to drive more consistent user experiences at scale. In its simplest form, it is simply a collection of prompt examples like this one from OpenAI that provides both the interactive prompt components (user and system messages) and the API-driven request format - to support reuse.

In it's more complex form like this example from LangChain it contains placeholders that can be replaced with data from a variety of sources (user input, system context, external data sources etc.) to generate a prompt dynamically. This allows us to create a library of reusable prompts that can be used to drive consistent user experiences programmatically at scale.

Finally, the real value of templates lies in the ability to create and publish prompt libraries for vertical application domains - where the prompt template is now optimized to reflect application-specific context or examples that make the responses more relevant and accurate for the targeted user audience. The Prompts For Edu repository is a great example of this approach, curating a library of prompts for the education domain with emphasis on key objectives like lesson planning, curriculum design, student tutoring etc.

Supporting Content

If we think about prompt construction as having a instruction (task) and a target (primary content), then secondary content is like additional context we provide to influence the output in some way. It could be tuning parameters, formatting instructions, topic taxonomies etc. that can help the model tailor its response to be suit the desired user objectives or expectations.

For example: Given a course catalog with extensive metadata (name, description, level, metadata tags, instructor etc.) on all the available courses in the curriculum:

we can define an instruction to "summarize the course catalog for Fall 2023"
we can use the primary content to provide a few examples of the desired output
we can use the secondary content to identify the top 5 "tags" of interest.

Now, the model can provide a summary in the format shown by the few examples - but if a result has multiple tags, it can prioritize the 5 tags identified in secondary content.

Prompting Best Practices

Now that we know how prompts can be constructed, we can start thinking about how to design them to reflect best practices. We can think about this in two parts - having the right mindset and applying the right techniques.

Prompt Engineering Mindset

Prompt Engineering is a trial-and-error process so keep three broad guiding factors in mind:

Domain Understanding Matters. Response accuracy and relevance is a function of the domain in which that application or user operates. Apply your intuition and domain expertise to customize techniques further. For instance, define domain-specific personalities in your system prompts, or use domain-specific templates in your user prompts. Provide secondary content that reflects domain-specific contexts, or use domain-specific cues and examples to guide the model towards familiar usage patterns.
Model Understanding Matters. We know models are stochastic by nature. But model implementations can also vary in terms of the training dataset they use (pre-trained knowledge), the capabilities they provide (e.g., via API or SDK) and the type of content they are optimized for (e.g, code vs. images vs. text). Understand the strengths and limitations of the model you are using, and use that knowledge to prioritize tasks or build customized templates that are optimized for the model's capabilities.
Iteration & Validation Matters. Models are evolving rapidly, and so are the techniques for prompt engineering. As a domain expert, you may have other context or criteria your specific application, that may not apply to the broader community. Use prompt engineering tools & techniques to "jump start" prompt construction, then iterate and validate the results using your own intuition and domain expertise. Record your insights and create a knowledge base (e.g, prompt libraries) that can be used as a new baseline by others, for faster iterations in the future.

Best Practices

Now let's look at common best practices that are recommended by OpenAI and Azure OpenAI practitioners.

What	Why
Evaluate the latest models.	New model generations are likely to have improved features and quality - but may also incur higher costs. Evaluate them for impact, then make migration decisions.
Separate instructions & context	Check if your model/provider defines delimiters to distinguish instructions, primary and secondary content more clearly. This can help models assign weights more accurately to tokens.
Be specific and clear	Give more details about the desired context, outcome, length, format, style etc. This will improve both the quality and consistency of responses. Capture recipes in reusable templates.
Be descriptive, use examples	Models may respond better to a "show and tell" approach. Start with a `zero-shot` approach where you give it an instruction (but no examples) then try `few-shot` as a refinement, providing a few examples of the desired output. Use analogies.
Use cues to jumpstart completions	Nudge it towards a desired outcome by giving it some leading words or phrases that it can use as a starting point for the response.
Double Down	Sometimes you may need to repeat yourself to the model. Give instructions before and after your primary content, use an instruction and a cue, etc. Iterate & validate to see what works.
Order Matters	The order in which you present information to the model may impact the output, even in the learning examples, thanks to recency bias. Try different options to see what works best.
Give the model an “out”	Give the model a fallback completion response it can provide if it cannot complete the task for any reason. This can reduce chances of models generating false or fabricated responses.

As with any best practice, remember that your mileage may vary based on the model, the task and the domain. Use these as a starting point, and iterate to find what works best for you. Constantly re-evaluate your prompt engineering process as new models and tools become available, with a focus on process scalability and response quality.

Assignment

Congratulations! You made it to the end of the lesson! It's time to put some of those concepts and techniques to the test with real examples!

For our assignment, we'll be using a Jupyter Notebook with exercises you can complete interactively. You can also extend the Notebook with your own Markdown and Code cells to explore ideas and techniques on your own.

To get started, fork the repo, then

(Recommended) Launch GitHub Codespaces
(Alternatively) Clone the repo to your local device and use it with Docker Desktop
(Alternatively) Open the Notebook with your preferred Notebook runtime environment.

Next, configure your environment variables

Copy the .env.copy file in repo root to .env and fill in the AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT values. Come back to Learning Sandbox section to learn how.

Next, open the Jupyter Notebook

Select the runtime kernel. If using options 1 or 2, simply select the default Python 3.10.x kernel provided by the dev container.

You're all set to run the exercises. Note that there are no right and wrong answers here - just exploring options by trial-and-error and building intuition for what works for a given model and application domain.

For this reason there are no Code Solution segments in this lesson. Instead, the Notebook will have Markdown cells titled "My Solution:" that shows one example output for reference.

Knowledge check

Which of the following is a good prompt following some reasonable best practices?

Show me an image of red car
Show me an image of red car of make Volvo and model XC90 parked by a cliff with the sun setting
Show me an image of red car of make Volvo and model XC90

A: 2, it's the best prompt as it provides details on "what" and goes into specifics (not just any car but a specific make and model) and it also describes the overall setting. 3 is next best as it also contains a lot of description.

🚀 Challenge

See if you can leverage the "cue" technique with the prompt: Complete the sentence "Show me an image of red car of make Volvo and ". What does it respond with, and how would you improve it?

Great Work! Continue Your Learning

Want to learn more about different Prompt Engineering concepts? Go to the continued learning page to find other great resources on this topic.

Head over to Lesson 5 where we will look at advanced prompting techniques!

Creating Advanced prompts

Let's recap some learnings from the previous chapter:

Prompt engineering is the process by which we guide the model towards more relevant responses by providing more useful instructions or context.

There are also two steps to writing prompts: constructing the prompt, by providing relevant context, and optimization, how to gradually improve the prompt.

At this point, we have some basic understanding of how to write prompts, but we need to go deeper. In this chapter, you will go from trying out various prompts to understanding why one prompt is better than another. You will learn how to construct prompts following some basic techniques that can be applied to any LLM.

Introduction

In this chapter, we will cover the following topics:

Extend your knowledge of prompt engineering by applying different techniques to your prompts.
Configuring your prompts to vary the output.

Learning goals

After completing this lesson, you'll be able to:

Apply prompt engineering techniques that improve the outcome of your prompts.
Perform prompting that is either varied or deterministic.

Prompt engineering

Prompt engineering is the process of creating prompts that will produce the desired outcome. There's more to prompt engineering than just writing a text prompt. Prompt engineering is not an engineering discipline, it's more a set of techniques that you can apply to get the desired outcome.

An example of a prompt

Let's take a basic prompt like this one:

Generate 10 questions on geography.

In this prompt, you are actually applying a set of different prompt techniques.

Let's break this down.

Context, you specify it should be about "geography".
Limiting the output, you want no more than 10 questions.

Limitations of simple prompting

You may or may not get the desired outcome. You will get your questions generated, but geography is a big topic and you may not get what you want to due the following reasons:

Big topic, you don't know if it's going to be about countries, capitals, rivers and so on.
Format, what if you wanted the questions to be formatted in a certain way?

As you can see, there's a lot to consider when creating prompts.

So far, we've seen a simple prompt example, but generative AI is capable of much more to help people in a variety of roles and industries. Let's explore some basic techniques next.

Techniques for prompting

First, we need to understand that prompting is an emergent property of an LLM meaning that this is not a feature that is built into the model but rather something we discover as we use the model.

There are some basic techniques that we can use to prompt an LLM. Let's explore them.

Zero-shot prompting, this is the most basic form of prompting. It's a single prompt requesting a response from the LLM based solely on its training data.
Few-shot prompting, this type of prompting guides the LLM by providing 1 or more examples it can rely on to generate its response.
Chain-of-thought, this type of prompting tells the LLM how to break down a problem into steps.
Generated knowledge, to improve the response of a prompt, you can provide generated facts or knowledge additionally to your prompt.
Least to most, like chain-of-thought, this technique is about breaking down a problem into a series of steps and then ask these steps to be performed in order.
Self-refine, this technique is about critiquing the LLM's output and then asking it to improve.
Maieutic prompting. What you want here is to ensure the LLM answer is correct and you ask it to explain various parts of the answer. This is a form of self-refine.

Zero-shot prompting

This style of prompting is very simple, it consists of a single prompt. This technique is probably what you're using as you're starting to learn about LLMs. Here's an example:

Prompt: "What is Algebra?"
Answer: "Algebra is a branch of mathematics that studies mathematical symbols and the rules for manipulating these symbols."

Few-shot prompting

This style of prompting helps the model by providing a few examples along with the request. It consists of a single prompt with additional task-specific data. Here's an example:

Prompt: "Write a poem in the style of Shakespeare. Here are a few examples of Shakespearean sonnets.: Sonnet 18: 'Shall I compare thee to a summer's day? Thou art more lovely and more temperate...' Sonnet 116: 'Let me not to the marriage of true minds Admit impediments. Love is not love Which alters when it alteration finds...' Sonnet 132: 'Thine eyes I love, and they, as pitying me, Knowing thy heart torment me with disdain,...' Now, write a sonnet about the beauty of the moon."
Answer: "Upon the sky, the moon doth softly gleam, In silv'ry light that casts its gentle grace,..."

Examples provide the LLM with the context, format or style of the desired output. They help the model understand the specific task and generate more accurate and relevant responses.

Chain-of-thought

Chain-of-thought is a very interesting technique as it's about taking the LLM through a series of steps. The idea is to instruct the LLM in such a way that it understands how to do something. Consider the following example, with and without chain-of-thought:

- Prompt: "Alice has 5 apples, throws 3 apples, gives 2 to Bob and Bob gives one back, how many apples does Alice have?"
- Answer: 5

LLM answers with 5, which is incorrect. Correct answer is 1 apple, given the calculation (5 -3 -2 + 1 = 1).

So how can we teach the LLM to do this correctly?

Let's try chain-of-thought. Applying chain-of-thought means:

Give the LLM a similar example.
Show the calculation, and how to calculate it correctly.
Provide the original prompt.

Here's how:

Prompt: "Lisa has 7 apples, throws 1 apple, gives 4 apples to Bart and Bart gives one back: 7 -1 = 6 6 -4 = 2 2 +1 = 3
Alice has 5 apples, throws 3 apples, gives 2 to Bob and Bob gives one back, how many apples does Alice have?" Answer: 1

Note how we write substantially longer prompts with another example, a calculation and then the original prompt and we arrive at the correct answer 1.

As you can see chain-of-thought is a very powerful technique.

Generated knowledge

Many times when you want to construct a prompt, you want to do so using your own company's data. You want part of the prompt to be from the company and the other part should be the actual prompt you're interested in.

As an example, this is what your prompt then can look like if you're in the insurance business:

{{company}}: {{company_name}}
{{products}}:
{{products_list}}
Please suggest an insurance given the following budget and requirements:
Budget: {{budget}}
Requirements: {{requirements}}

Above, you see how the prompt is constructed using a template. In the template there's a number of variables, denoted by {{variable}}, that will be replaced with actual values from a company API.

Here's an example of how the prompt could look like once the variables have been replaced by content from your company:

Insurance company: ACME Insurance
Insurance products (cost per month):
- Car, cheap, 500 USD
- Car, expensive, 1100 USD
- Home, cheap, 600 USD
- Home, expensive, 1200 USD
- Life, cheap, 100 USD

Please suggest an insurance given the following budget and requirements:
Budget: $1000
Requirements: Car, Home, and Life insurance

Running this prompt through an LLM will produce a response like this:

Given the budget and requirements, we suggest the following insurance package from ACME Insurance:
- Car, cheap, 500 USD
- Home, cheap, 600 USD
- Life, cheap, 100 USD
Total cost: $1,200 USD

As you can see, it also suggests the Life insurance, which it shouldn't. This result is an indication that we need to optimize the prompt by changing the prompt to be clearer on what it can allow. After some trial and error, we arrive at the following prompt:

Insurance company: ACME Insurance
Insurance products (cost per month):
- type: Car, cheap, cost: 500 USD
- type: Car, expensive, cost: 1100 USD
- type: Home, cheap, cost: 600 USD
- type: Home, expensive, cost: 1200 USD
- type: Life, cheap, cost: 100 USD

Please suggest an insurance given the following budget and requirements:
Budget: $1000 restrict choice to types: Car, Home

Note how adding type and cost and also using the keyword restrict helps the LLM to understand what we want.

Now we get the following response:

Given the budget and requirements, we suggest the Car, Cheap insurance product which costs 500 USD per month.

The point of this example was to show that even though we're using a basic technique like generated knowledge, we still need to optimize the prompt in most cases to get the desired outcome.

Least-to-most

The idea with Least-to-most prompting is to break down a bigger problem into subproblems. That way, you help guide the LLM on how to "conquer" the bigger problem. A good example could be for data science where you can ask the LLM to divide up a problem like so:

Prompt: How to perform data science in 5 steps?

With your AI assistant answering with:

Collect data
Clean data
Analyze data
Plot data
Present data

Self-refine, critique the results

With generative AIs and LLMs, you can't trust the output. You need to verify it. After all, the LLM is just presenting you what's the next most likely thing to say, not what's correct. Therefore, a good idea is to ask the LLM to critique itself, which leads us to the self-refine technique.

How it works is that you follow the following steps:

Initial prompt asking the LLM to solve a problem
LLM answers
You critique the answer and ask the AI to improve
LLM answers again, this time considering the critique and suggest solutions it came up with

You can repeat this process as many times as you want.

Here's an example using this technique:

Prompt: "Create a Python Web API with routes products and customers"

AI Answer:

import flask
app = Flask(__name__)

@app.route('/products')
def products():
    return 'Products'

@app.route('/customers')
def customers():
    return 'Customers'

Prompt: suggest 3 improvements of the above code

AI Answer:

Sure, here are three suggestions to improve the code:

Import the correct module: The first line should import Flask instead of flask. The correct import statement should be from flask import Flask.

Add a main function: Add a if __name__ == '__main__': block to the end of the file to run the application when the file is executed.

Use f-strings: Use f-strings to format the return statements in the route functions. For example, return 'Products' can be changed to return f'This is the {request.path} page'.

from flask import Flask, request

app = Flask(__name__)

@app.route('/products')
def products():
    return f'This is the {request.path} page'

@app.route('/customers')
def customers():
    return f'This is the {request.path} page'

if __name__ == '__main__':
    app.run()

As you can see, the above AI response is improving the first suggested code thanks to critiquing the first answer.

Maieutic prompting

Maieutic prompting is a technique that is similar to self-refine but it's more about asking the LLM to explain itself. The goal is to reduce inconsistencies in the LLM's output so to ensure it arrives at the correct answer. The workflow to follow is:

Ask the LLM to answer a question
For each part of the answer, ask the LLM to explain it more in depth.
If there are inconsistencies, discard the parts that are inconsistent.

Repeat 2 and 3 until you've gone through all the parts and you're satisfied with the answer.

Here's an example prompt:

prompt: How can I create a crisis plan to mitigate a pandemic in 5 steps? LLM answer:

1. Identify the risks
1. Identify the stakeholders
1. Identify the resources
1. Identify the actions
1. Identify the communication channels

It's identified 5 steps, but can we determine if this is correct? Let's ask the LLM to explain each step:

prompt: Explain the first step in more detail, what are the risks in detail with a pandemic?

The risks are the following:

- Loss of life
- Loss of business
- Loss of reputation
- Loss of customers
- Loss of employees

At this point, we want to make sure the LLM is correct so we ask it to explain the first risk in more detail and hope it's consistent with the answer above:

prompt: In a Pandemic, which is the biggest risk and why?

The biggest risk is loss of life because it's the most important thing to protect.

What are the two biggest risks in a pandemic?

The two biggest risks are loss of life and loss of business.

At this point, the LLM is consistent and mentions "life" and "business" as the top two risks. We can now continue to the next step and feel fairly confident. However, we shouldn't trust the LLM blindly, we should always verify the output.

Vary your output

LLMs are nondeterministic by nature, meaning that you will get different results each time you run the same prompt. Try the following prompt for example:

"Generate code for a Python Web API"

# Import necessary modules
import flask
from flask import request, jsonify

# Create a Flask app
app = flask.Flask(__name__)

# Create a route for the API
@app.route('/api', methods=['GET'])
def api():
    # Get the data from the request
    data = request.args

    # Process the data
    result = process_data(data)

    # Return the result as JSON
    return jsonify(result)

# Function to process the data
def process_data(data):
    # Do something with the data
    result = data + 1

    # Return the result
    return result

# Run the app
if __name__ == '__main__':
    app.run()

Running the same prompt again generates a slightly different response:

#import necessary packages
import flask
from flask import request, jsonify

#create the Flask application
app = flask.Flask(__name__)
app.config["DEBUG"] = True

#create a list of books
books = [
    {'id': 0,
     'title': 'A Fire Upon the Deep',
     'author': 'Vernor Vinge',
     'first_sentence': 'The coldsleep itself was dreamless.',
     'year_published': '1992'},
    {'id': 1,
     'title': 'The Ones Who Walk Away From Omelas',
     'author': 'Ursula K. Le Guin',
     'first_sentence': 'With a clamor of bells that set the swallows soaring, the Festival of Summer came to the city Omelas, bright-towered by the sea.',
     'published': '1973'},
    {'id': 2,
     'title': 'Dhalgren',
     'author': 'Samuel R. Delany',
     'first_sentence': 'to wound the autumnal city.',
     'published': '1975'}
]

#create an endpoint for the API
@app.route('/', methods=['GET'])
def home():
    return '''<h1>Books API</h1>
<p>A prototype API for retrieving books.</p>'''

#create an endpoint to return all books
@app.route('/api/v1/resources/books/all', methods=['GET'])
def api_all():
    return jsonify(books)

#create an endpoint to return a single book
@app.route('/api/v1/resources/books', methods=['GET'])
def api_id():
    #check if an ID was provided as part of the URL
    #if ID is provided, assign it to a variable
    #if no ID is provided, display an error
    if 'id' in request.args:
        id = int(request.args['id'])
    else:
        return "Error: No id field provided. Please specify an id."

    #create an empty list for our results
    results = []

    #loop through the data and match results that fit the requested ID
    #IDs are unique, but other fields might return many results
    for book in books:
        if book['id'] == id:
            results.append(book)

    #use the jsonify function from Flask to convert our list of
    #Python dictionaries to the JSON format
    return jsonify(results)

app.run()

So is the varied output a problem?

Depends on what you're trying to do. If you want a specific response then it's a problem. If you're ok with a varied output like "Generate any 3 questions on geography", then it's not a problem.

Using temperature to vary your output

Ok, so we've decided we want to limit the output to be more predictable, that is more deterministic. How do we do that?

Temperature is a value between 0 and 1, where 0 is the most deterministic and 1 is the most varied. The default value is 0.7. Let's see what happens with two runs of the same prompt with temperature set to 0.1:

"Generate code for a Python Web API"

#import necessary libraries
import flask
from flask import request, jsonify

#create an instance of the Flask class
app = flask.Flask(__name__)

#create an endpoint for the API
@app.route('/api/v1/endpoint', methods=['GET'])
def api_endpoint():
    #get the request data
    data = request.get_json()

    #process the data
    result = process_data(data)

    #return the result
    return jsonify(result)

#function to process the data
def process_data(data):
    #process the data
    result = {'result': 'success'}

    #return the result
    return result

#run the Flask app
if __name__ == '__main__':
    app.run()

Running the prompt again gives us this outcome:

#import necessary libraries
import flask
from flask import request, jsonify

#create a Flask app
app = flask.Flask(__name__)

#create a route for the API
@app.route('/api', methods=['GET'])
def api_call():
    #get the data from the request
    data = request.get_json()

    #process the data
    result = process_data(data)

    #return the result
    return jsonify(result)

#define the process_data function
def process_data(data):
    #do something with the data
    result = data + 1

    #return the result
    return result

#run the app
if __name__ == '__main__':
    app.run()

There's only a tiny difference between these two outputs. Let's do the opposite this time, let's set the temperature to 0.9:

# Import necessary libraries
import flask
from flask import request, jsonify

# Create a Flask app
app = flask.Flask(__name__)

# Create a route for the API
@app.route('/api', methods=['GET'])
def api_call():
    # Get the data from the request
    data = request.args

    # Process the data
    result = process_data(data)

    # Return the result
    return jsonify(result)

# Function to process the data
def process_data(data):
    # Do something with the data
    result = data + 1

    # Return the result
    return result

# Run the app
if __name__ == '__main__':
    app.run()

and the second attempt at 0.9 as temperature value:

import flask
from flask import request, jsonify

# create the Flask app
app = flask.Flask(__name__)
app.config['DEBUG'] = True

# create some test data
books = [
    {'id': 0, 'title': 'A Fire Upon The Deep', 'author': 'Vernor Vinge', 'first_sentence': 'The coldsleep itself was dreamless.', 'year_published': '1992'},
    {'id': 1, 'title': 'The Ones Who Walk Away From Omelas', 'author': 'Ursula K. Le Guin', 'first_sentence': 'With a clamor of bells that set the swallows soaring, the Festival of Summer came to the city Omelas, bright-towered by the sea.', 'published': '1973'},
    {'id': 2, 'title': 'Dhalgren', 'author': 'Samuel R. Delany', 'first_sentence': 'to wound the autumnal city.', 'published': '1975'}
]

# create an endpoint
@app.route('/', methods=['GET'])
def home():
    return '''<h1>Welcome to our book API!</h1>'''

@app.route('/api/v1/resources/books

As you can see, the results couldn't be more varied.

Note, that there are more parameters you can change to vary the output, like top-k, top-p, repetition penalty, length penalty and diversity penalty but these are outside the scope of this curriculum.

Good practices

There are many practices you can apply to try to get what you want. You will find your own style as you use prompting more and more.

Additionally to the techniques we've covered, there are some good practices to consider when prompting an LLM.

Here are some good practices to consider:

Specify context. Context matters, the more you can specify like domain, topic, etc. the better.
Limit the output. If you want a specific number of items or a specific length, specify it.
Specify both what and how. Remember to mention both what you want and how you want it, for example "Create a Python Web API with routes products and customers, divide it into 3 files".
Use templates. Often, you will want to enrich your prompts with data from your company. Use templates to do this. Templates can have variables that you replace with actual data.
Spell correctly. LLMs might provide you with a correct response, but if you spell correctly, you will get a better response.

Assignment

Here's code in Python showing how to build a simple API using Flask:

from flask import Flask, request

app = Flask(__name__)

@app.route('/')
def hello():
    name = request.args.get('name', 'World')
    return f'Hello, {name}!'

if __name__ == '__main__':
    app.run()

Use an AI assistant like GitHub Copilot or ChatGPT and apply the "self-refine" technique to improve the code.

Solution

Please attempt to solve the assignment by adding suitable prompts to the code.

[!TIP] Phrase a prompt to ask it to improve, it's a good idea to limit how many improvements. You can also ask to improve it in a certain way, for example architecture, performance, security, etc.

Solution

Knowledge check

Why would I use chain-of-thought prompting? Show me 1 correct response and 2 incorrect responses.

To teach the LLM how to solve a problem.
B, To teach the LLM to find errors in code.
C, To instruct the LLM to come up with different solutions.

A: 1, because chain-of-thought is about showing the LLM how to solve a problem by providing it with a series of steps, and similar problems and how they were solved.

🚀 Challenge

You just used the self-refine technique in the assignment. Take any program you built and consider what improvements you would want to apply to it. Now use the self-refine technique to apply the proposed changes. What did you think the result, better or worse?

Great Work! Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 6 where we will apply our knowledge of Prompt Engineering by building text generation apps

Building Text Generation Applications

You've seen so far through this curriculum that there are core concepts like prompts and even a whole discipline called "prompt engineering". Many tools you can interact with like ChatGPT, Office 365, Microsoft Power Platform and more, support you using prompts to accomplish something.

For you to add such an experience to an app, you need to understand concepts like prompts, completions and choose a library to work with. That's exactly what you'll learn in this chapter.

Introduction

In this chapter, you will:

Learn about the openai library and it's core concepts.
Build a text generation app using openai.
Understand how to use concepts like prompt, temperature, and tokens to build a text generation app.

Learning goals

At the end of this lesson, you'll be able to:

Explain what a text generation app is.
Build a text generation app using openai.
Configure your app to use more or less tokens and also change the temperature, for a varied output.

What is a text generation app?

Normally when you build an app it has some kind of interface like the following:

Command-based. Console apps are typical apps where you type a command and it carries out a task. For example, git is a command-based app.
User interface (UI). Some apps have graphical user interfaces (GUIs) where you click buttons, input text, select options and more.

Console and UI apps are limited

Compare it to a command-based app where you type a command:

It's limited. You can't just type any command, only the ones that the app supports.
Language specific. Some apps support many languages, but by default the app is built for a specific language, even if you can add more language support.

Benefits of text generation apps

So how is a text generation app different?

In a text generation app, you have more flexibility, you're not limited to a set of commands or a specific input language. Instead, you can use natural language to interact with the app. Another benefit is that you're already interacting with a data source that has been trained on a vast corpus of information, whereas a traditional app might be limited on what's in a database.

What can I build with a text generation app?

There are many things you can build. For example:

A chatbot. A chatbot answering questions about topics, like your company and its products could be a good match.
Helper. LLMs are great at things like summarizing text, getting insights from text, producing text like resumes and more.
Code assistant. Depending on the language model you use, you can build a code assistant that helps you write code. For example, you can use a product like GitHub Copilot as well as ChatGPT to help you write code.

How can I get started?

Well, you need to find a way to integrate with an LLM which usually entails the following two approaches:

Use an API. Here you're constructing web requests with your prompt and get generated text back.
Use a library. Libraries help encapsulate the API calls and make them easier to use.

Libraries/SDKs

There are a few well known libraries for working with LLMs like:

openai, this library makes it easy to connect to your model and send in prompts.

Then there are libraries that operate on a higher level like:

Langchain. Langchain is well known and supports Python.
Semantic Kernel. Semantic Kernel is a library by Microsoft supporting the languages C#, Python, and Java.

First app using openai

Let's see how we can build our first app, what libraries we need, how much is required and so on.

Install openai

There are many libraries out there for interacting with OpenAI or Azure OpenAI. It's possible to use numerous programming languages as well like C#, Python, JavaScript, Java and more. We've chosen to use the openai Python library, so we'll use pip to install it.

pip install openai

Create a resource

You need to carry out the following steps:

Create an account on Azure https://azure.microsoft.com/free/.
Gain access to Azure OpenAI. Go to https://learn.microsoft.com/azure/ai-services/openai/overview#how-do-i-get-access-to-azure-openai and request access.

[!NOTE] At the time of writing, you need to apply for access to Azure OpenAI.
Install Python https://www.python.org/
Have created an Azure OpenAI Service resource. See this guide for how to create a resource.

Locate API key and endpoint

At this point, you need to tell your openai library what API key to use. To find your API key, go to "Keys and Endpoint" section of your Azure OpenAI resource and copy the "Key 1" value.

Keys and Endpoint resource blade in Azure Portal

Now that you have this information copied, let's instruct the libraries to use it.

[!NOTE] It's worth separating your API key from your code. You can do so by using environment variables.

Set the environment variable OPENAI_API_KEY to your API key. export OPENAI_API_KEY='sk-...'

Setup configuration Azure

If you're using Azure OpenAI, here's how you setup configuration:

openai.api_type = 'azure'
openai.api_key = os.environ["OPENAI_API_KEY"]
openai.api_version = '2023-05-15'
openai.api_base = os.getenv("API_BASE")

Above we're setting the following:

api_type to azure. This tells the library to use Azure OpenAI and not OpenAI.
api_key, this is your API key found in the Azure Portal.
api_version, this is the version of the API you want to use. At the time of writing, the latest version is 2023-05-15.
api_base, this is the endpoint of the API. You can find it in the Azure Portal next to your API key.

[!NOTE] > os.getenv is a function that reads environment variables. You can use it to read environment variables like OPENAI_API_KEY and API_BASE. Set these environment variables in your terminal or by using a library like dotenv.

Generate text

The way to generate text is to use the Completion class. Here's an example:

prompt = "Complete the following: Once upon a time there was a"

completion = openai.Completion.create(model="davinci-002", prompt=prompt)
print(completion.choices[0].text)

In the above code, we create a completion object and pass in the model we want to use and the prompt. Then we print the generated text.

Chat completions

So far, you've seen how we've been using Completion to generate text. But there's another class called ChatCompletion that is more suited for chatbots. Here's an example of using it:

import openai

openai.api_key = "sk-..."

completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])
print(completion.choices[0].message.content)

More on this functionality in an upcoming chapter.

Exercise - your first text generation app

Now that we learned how to set up and configure openai, it's time to build your first text generation app. To build your app, follow these steps:

Create a virtual environment and install openai:
```
python -m venv venv
source venv/bin/activate
pip install openai
```
[!NOTE] If you're using Windows type venv\Scripts\activate instead of source venv/bin/activate.

[!NOTE] Locate your Azure OpenAI key by going to https://portal.azure.com/ and search for Open AI and select the Open AI resource and then select Keys and Endpoint and copy the Key 1 value.

Create an app.py file and give it the following code:

import openai

openai.api_key = "<replace this value with your open ai key or Azure OpenAI key>"

openai.api_type = 'azure'
openai.api_version = '2023-05-15'
openai.api_base = "<endpoint found in Azure Portal where your API key is>"
deployment_name = "<deployment name>"

# add your completion code
prompt = "Complete the following: Once upon a time there was a"
messages = [{"role": "user", "content": prompt}]

# make completion
completion = openai.chat.completions.create(model=deployment_name, messages=messages)

# print response
print(completion.choices[0].message.content)

[!NOTE] If you're using Azure OpenAI, you need to set the api_type to azure and set the api_key to your Azure OpenAI key.

You should see an output like the following:

 very unhappy _____.

Once upon a time there was a very unhappy mermaid.

Different types of prompts, for different things

Now you've seen how to generate text using a prompt. You even have a program up and running that you can modify and change to generate different types of text.

Prompts can be used for all sorts of tasks. For example:

Generate a type of text. For example, you can generate a poem, questions for a quiz etc.
Lookup information. You can use prompts to look for information like the following example 'What does CORS mean in web development?'.
Generate code. You can use prompts to generate code, for example developing a regular expression used to validate emails or why not generate an entire program, like a web app?

A more practical use case: a recipe generator

Imagine you have ingredients at home and you want to cook something. For that, you need a recipe. A way to find recipes is to use a search engine or you could use an LLM to do so.

You could write a prompt like so:

"Show me 5 recipes for a dish with the following ingredients: chicken, potatoes, and carrots. Per recipe, list all the ingredients used"

Given the above prompt, you might get a response similar to:

1. Roasted Chicken and Vegetables:
Ingredients:
- 4 chicken thighs
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 2 tablespoons olive oil
- 2 cloves garlic, minced
- 1 teaspoon dried thyme
- 1 teaspoon dried oregano
- Salt and pepper, to taste

2. Chicken and Potato Stew:
Ingredients:
- 2 tablespoons olive oil
- 1 onion, diced
- 2 cloves garlic, minced
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 1 cup chicken broth
- Salt and pepper, to taste

3. Chicken and Potato Bake:
Ingredients:
- 2 tablespoons olive oil
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 onion, diced
- 2 cloves garlic, minced
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 1 cup chicken broth
- Salt and pepper, to taste

4. Chicken and Potato Soup:
Ingredients:
- 2 tablespoons olive oil
- 1 onion, diced
- 2 cloves garlic, minced
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 4 cups chicken broth
- Salt and pepper, to taste

5. Chicken and Potato Hash:
Ingredients:
- 2 tablespoons olive oil
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 onion, diced
- 2 cloves garlic, minced
- 1 teaspoon dried oregano

This outcome is great, I know what to cook. At this point, what could be useful improvements are:

Filtering out ingredients I don't like or am allergic to.
Produce a shopping list, in case I don't have all the ingredients at home.

For the above cases, let's add an additional prompt:

"Please remove recipes with garlic as I'm allergic and replace it with something else. Also, please produce a shopping list for the recipes, considering I already have chicken, potatoes and carrots at home."

Now you have a new result, namely:

1. Roasted Chicken and Vegetables:
Ingredients:
- 4 chicken thighs
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 2 tablespoons olive oil
- 1 teaspoon dried thyme
- 1 teaspoon dried oregano
- Salt and pepper, to taste

2. Chicken and Potato Stew:
Ingredients:
- 2 tablespoons olive oil
- 1 onion, diced
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 1 cup chicken broth
- Salt and pepper, to taste

3. Chicken and Potato Bake:
Ingredients:
- 2 tablespoons olive oil
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 onion, diced
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 1 cup chicken broth
- Salt and pepper, to taste

4. Chicken and Potato Soup:
Ingredients:
- 2 tablespoons olive oil
- 1 onion, diced
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 teaspoon dried oregano
- 1 teaspoon dried thyme
- 4 cups chicken broth
- Salt and pepper, to taste

5. Chicken and Potato Hash:
Ingredients:
- 2 tablespoons olive oil
- 2 chicken breasts, cut into cubes
- 2 potatoes, cut into cubes
- 2 carrots, cut into cubes
- 1 onion, diced
- 1 teaspoon dried oregano

Shopping List:
- Olive oil
- Onion
- Thyme
- Oregano
- Salt
- Pepper

That's your five recipes, with no garlic mentioned and you also have a shopping list considering what you already have at home.

Exercise - build a recipe generator

Now that we have played out a scenario, let's write code to match the demonstrated scenario. To do so, follow these steps:

Use the existing app.py file as a starting point

Locate the prompt variable and change its code to the following:

prompt = "Show me 5 recipes for a dish with the following ingredients: chicken, potatoes, and carrots. Per recipe, list all the ingredients used"

If you now run the code, you should see an output similar to:

-Chicken Stew with Potatoes and Carrots: 3 tablespoons oil, 1 onion, chopped, 2 cloves garlic, minced, 1 carrot, peeled and chopped, 1 potato, peeled and chopped, 1 bay leaf, 1 thyme sprig, 1/2 teaspoon salt, 1/4 teaspoon black pepper, 1 1/2 cups chicken broth, 1/2 cup dry white wine, 2 tablespoons chopped fresh parsley, 2 tablespoons unsalted butter, 1 1/2 pounds boneless, skinless chicken thighs, cut into 1-inch pieces
-Oven-Roasted Chicken with Potatoes and Carrots: 3 tablespoons extra-virgin olive oil, 1 tablespoon Dijon mustard, 1 tablespoon chopped fresh rosemary, 1 tablespoon chopped fresh thyme, 4 cloves garlic, minced, 1 1/2 pounds small red potatoes, quartered, 1 1/2 pounds carrots, quartered lengthwise, 1/2 teaspoon salt, 1/4 teaspoon black pepper, 1 (4-pound) whole chicken
-Chicken, Potato, and Carrot Casserole: cooking spray, 1 large onion, chopped, 2 cloves garlic, minced, 1 carrot, peeled and shredded, 1 potato, peeled and shredded, 1/2 teaspoon dried thyme leaves, 1/4 teaspoon salt, 1/4 teaspoon black pepper, 2 cups fat-free, low-sodium chicken broth, 1 cup frozen peas, 1/4 cup all-purpose flour, 1 cup 2% reduced-fat milk, 1/4 cup grated Parmesan cheese

-One Pot Chicken and Potato Dinner: 2 tablespoons olive oil, 1 pound boneless, skinless chicken thighs, cut into 1-inch pieces, 1 large onion, chopped, 3 cloves garlic, minced, 1 carrot, peeled and chopped, 1 potato, peeled and chopped, 1 bay leaf, 1 thyme sprig, 1/2 teaspoon salt, 1/4 teaspoon black pepper, 2 cups chicken broth, 1/2 cup dry white wine

-Chicken, Potato, and Carrot Curry: 1 tablespoon vegetable oil, 1 large onion, chopped, 2 cloves garlic, minced, 1 carrot, peeled and chopped, 1 potato, peeled and chopped, 1 teaspoon ground coriander, 1 teaspoon ground cumin, 1/2 teaspoon ground turmeric, 1/2 teaspoon ground ginger, 1/4 teaspoon cayenne pepper, 2 cups chicken broth, 1/2 cup dry white wine, 1 (15-ounce) can chickpeas, drained and rinsed, 1/2 cup raisins, 1/2 cup chopped fresh cilantro

NOTE, your LLM is nondeterministic, so you might get different results every time you run the program.

Great, let's see how we can improve things. To improve things, we want to make sure the code is flexible, so ingredients and number of recipes can be improved and changed.

Let's change the code in the following way:

no_recipes = input("No of recipes (for example, 5): ")

ingredients = input("List of ingredients (for example, chicken, potatoes, and carrots): ")

# interpolate the number of recipes into the prompt an ingredients
prompt = f"Show me {no_recipes} recipes for a dish with the following ingredients: {ingredients}. Per recipe, list all the ingredients used"

Taking the code for a test run, could look like this:

No of recipes (for example, 5): 3
List of ingredients (for example, chicken, potatoes, and carrots): milk,strawberries

-Strawberry milk shake: milk, strawberries, sugar, vanilla extract, ice cubes
-Strawberry shortcake: milk, flour, baking powder, sugar, salt, unsalted butter, strawberries, whipped cream
-Strawberry milk: milk, strawberries, sugar, vanilla extract

Improve by adding filter and shopping list

We now have a working app capable of producing recipes and it's flexible as it relies on inputs from the user, both on the number of recipes but also the ingredients used.

To further improve it, we want to add the following:

Filter out ingredients. We want to be able to filter out ingredients we don't like or are allergic to. To accomplish this change, we can edit our existing prompt and add a filter condition to the end of it like so:

filter = input("Filter (for example, vegetarian, vegan, or gluten-free): ")

prompt = f"Show me {no_recipes} recipes for a dish with the following ingredients: {ingredients}. Per recipe, list all the ingredients used, no {filter}"

Above, we add {filter} to the end of the prompt and we also capture the filter value from the user.

An example input of running the program can now look like so:

No of recipes (for example, 5): 3
List of ingredients (for example, chicken, potatoes, and carrots): onion,milk
Filter (for example, vegetarian, vegan, or gluten-free): no milk

1. French Onion Soup

Ingredients:

-1 large onion, sliced
-3 cups beef broth
-1 cup milk
-6 slices french bread
-1/4 cup shredded Parmesan cheese
-1 tablespoon butter
-1 teaspoon dried thyme
-1/4 teaspoon salt
-1/4 teaspoon black pepper

Instructions:

1. In a large pot, sauté onions in butter until golden brown.
2. Add beef broth, milk, thyme, salt, and pepper. Bring to a boil.
3. Reduce heat and simmer for 10 minutes.
4. Place french bread slices on soup bowls.
5. Ladle soup over bread.
6. Sprinkle with Parmesan cheese.

2. Onion and Potato Soup

Ingredients:

-1 large onion, chopped
-2 cups potatoes, diced
-3 cups vegetable broth
-1 cup milk
-1/4 teaspoon black pepper

Instructions:

1. In a large pot, sauté onions in butter until golden brown.
2. Add potatoes, vegetable broth, milk, and pepper. Bring to a boil.
3. Reduce heat and simmer for 10 minutes.
4. Serve hot.

3. Creamy Onion Soup

Ingredients:

-1 large onion, chopped
-3 cups vegetable broth
-1 cup milk
-1/4 teaspoon black pepper
-1/4 cup all-purpose flour
-1/2 cup shredded Parmesan cheese

Instructions:

1. In a large pot, sauté onions in butter until golden brown.
2. Add vegetable broth, milk, and pepper. Bring to a boil.
3. Reduce heat and simmer for 10 minutes.
4. In a small bowl, whisk together flour and Parmesan cheese until smooth.
5. Add to soup and simmer for an additional 5 minutes, or until soup has thickened.

As you can see, any recipes with milk in it has been filtered out. But, if you're lactose intolerant, you might want to filter out recipes with cheese in them as well, so there's a need to be clear.

Produce a shopping list. We want to produce a shopping list, considering what we already have at home.

For this functionality, we could either try to solve everything in one prompt or we could split it up into two prompts. Let's try the latter approach. Here we're suggesting adding an additional prompt, but for that to work, we need to add the result of the former prompt as context to the latter prompt.

Locate the part in the code that prints out the result from the first prompt and add the following code below:

old_prompt_result = completion.choices[0].message.content
prompt = "Produce a shopping list for the generated recipes and please don't include ingredients that I already have."

new_prompt = f"{old_prompt_result} {prompt}"
messages = [{"role": "user", "content": new_prompt}]
completion = openai.Completion.create(engine=deployment_name, messages=messages, max_tokens=1200)

# print response
print("Shopping list:")
print(completion.choices[0].message.content)

Note the following:

We're constructing a new prompt by adding the result from the first prompt to the new prompt:
```
new_prompt = f"{old_prompt_result} {prompt}"
```

We make a new request, but also considering the number of tokens we asked for in the first prompt, so this time we say max_tokens is 1200.

completion = openai.Completion.create(engine=deployment_name, prompt=new_prompt, max_tokens=1200)

Taking this code for a spin, we now arrive at the following output:

No of recipes (for example, 5): 2
List of ingredients (for example, chicken, potatoes, and carrots): apple,flour
Filter (for example, vegetarian, vegan, or gluten-free): sugar


-Apple and flour pancakes: 1 cup flour, 1/2 tsp baking powder, 1/2 tsp baking soda, 1/4 tsp salt, 1 tbsp sugar, 1 egg, 1 cup buttermilk or sour milk, 1/4 cup melted butter, 1 Granny Smith apple, peeled and grated
-Apple fritters: 1-1/2 cups flour, 1 tsp baking powder, 1/4 tsp salt, 1/4 tsp baking soda, 1/4 tsp nutmeg, 1/4 tsp cinnamon, 1/4 tsp allspice, 1/4 cup sugar, 1/4 cup vegetable shortening, 1/4 cup milk, 1 egg, 2 cups shredded, peeled apples
Shopping list:
-Flour, baking powder, baking soda, salt, sugar, egg, buttermilk, butter, apple, nutmeg, cinnamon, allspice

Improve your setup

What we have so far is code that works, but there are some tweaks we should be doing to improve things further. Some things we should do are:

Separate secrets from code, like the API key. Secrets do not belong in code and should be stored in a secure location. To separate secrets from code, we can use environment variables and libraries like python-dotenv to load them from a file. Here's how that would look like in code:
1. Create a .env file with the following content:
```
OPENAI_API_KEY=sk-...
```
  Note, for Azure, you need to set the following environment variables:
```
OPENAI_API_TYPE=azure
OPENAI_API_VERSION=2023-05-15
OPENAI_API_BASE=<replace>
```
  In code, you would load the environment variables like so:
```
from dotenv import load_dotenv

load_dotenv()

openai.api_key = os.environ["OPENAI_API_KEY"]
```
A word on token length. We should consider how many tokens we need to generate the text we want. Tokens cost money, so where possible, we should try to be economical with the number of tokens we use. For example, can we phrase the prompt so that we can use less tokens?

To change the tokens used, you can use the max_tokens parameter. For example, if you want to use 100 tokens, you would do:
```
completion = client.chat.completions.create(model=deployment, messages=messages, max_tokens=100)
```
Experimenting with temperature. Temperature is something we haven't mentioned so far but is an important context for how our program performs. The higher the temperature value the more random the output will be. Conversely the lower the temperature value the more predictable the output will be. Consider whether you want variation in your output or not.

To alter the temperature, you can use the temperature parameter. For example, if you want to use a temperature of 0.5, you would do:
```
completion = client.chat.completions.create(model=deployment, messages=messages, temperature=0.5)
```
Note, the closer to 1.0, the more varied the output.

Assignment

For this assignment, you can choose what to build.

Here are some suggestions:

Tweak the recipe generator app to improve it further. Play around with temperature values, and the prompts to see what you can come up with.
Build a "study buddy". This app should be able to answer questions about a topic for example Python, you could have prompts like "What is a certain topic in Python?", or you could have a prompt that says, show me code for a certain topic etc.
History bot, make history come alive, instruct the bot to play a certain historical character and ask it questions about its life and times.

Solution

Study buddy

Below is a starter prompt, see how you can use it and tweak it to your liking.

- "You're an expert on the Python language

    Suggest a beginner lesson for Python in the following format:

    Format:
    - concepts:
    - brief explanation of the lesson:
    - exercise in code with solutions"

History bot

Here are some prompts you could be using:

- "You are Abe Lincoln, tell me about yourself in 3 sentences, and respond using grammar and words like Abe would have used"
- "You are Abe Lincoln, respond using grammar and words like Abe would have used:

   Tell me about your greatest accomplishments, in 300 words"

Knowledge check

What does the concept temperature do?

It controls how random the output is.
It controls how big the response is.
It controls how many tokens are used.

🚀 Challenge

When working on the assignment, try to vary the temperature, try setting it to 0, 0.5, and 1. Remember that 0 is the least varied and 1 is the most. What value works best for your app?

Great Work! Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 7 where we will look at how to build chat applications!

Building Generative AI-Powered Chat Applications

Now that we've seen how we can build text-generation apps, let's look into chat applications.

Chat applications have become integrated into our daily lives, offering more than just a means of casual conversation. They're integral parts of customer service, technical support, and even sophisticated advisory systems. It's likely that you've gotten some help from a chat application not too long ago. As we integrate more advanced technologies like generative AI into these platforms, the complexity increases and so does the challenges.

Some questions we need to be answered are:

Building the app. How do we efficiently build and seamlessly integrate these AI-powered applications for specific use cases?
Monitoring. Once deployed, how can we monitor and ensure that the applications are operating at the highest level of quality, both in terms of functionality and adhering to the six principles of responsible AI?

As we move further into an age defined by automation and seamless human-machine interactions, understanding how generative AI transforms the scope, depth, and adaptability of chat applications becomes essential. This lesson will investigate the aspects of architecture that support these intricate systems, delve into the methodologies for fine-tuning them for domain-specific tasks, and evaluate the metrics and considerations pertinent to ensuring responsible AI deployment.

Introduction

This lesson covers:

Techniques for efficiently building and integrating chat applications.
How to apply customization and fine-tuning to applications.
Strategies and considerations to effectively monitor chat applications.

Learning Goals

By the end of this lesson, you'll be able to:

Describe considerations for building and integrating chat applications into existing systems.
Customize chat applications for specific use-cases.
Identify key metrics and considerations to effectively monitor and maintain the quality of AI-powered chat applications.
Ensure chat applications leverage AI responsibly.

Integrating Generative AI into Chat Applications

Elevating chat applications through generative AI isn't only centered around making them smarter; it's about optimizing their architecture, performance, and user interface to deliver a quality user experience. This involves investigating the architectural foundations, API integrations, and user interface considerations. This section aims to offer you a comprehensive roadmap for navigating these complex landscapes, whether you're plugging them into existing systems or building them as stand-alone platforms.

By the end of this section, you'll be equipped with the expertise needed to efficiently construct and incorporate chat applications.

Chatbot or Chat application?

Before we dive into building chat applications, let's compare 'chatbots' against 'AI-powered chat applications,' which serve distinct roles and functionalities. A chatbot's main purpose is to automate specific conversational tasks, such as answering frequently asked questions or tracking a package. It's typically governed by rule-based logic or complex AI algorithms. In contrast, an AI-powered chat application is a far more expansive environment designed to facilitate various forms of digital communication, such as text, voice, and video chats among human users. Its defining feature is the integration of a generative AI model that simulates nuanced, human-like conversations, generating responses based on a wide variety of input and contextual cues. A generative AI powered chat application can engage in open-domain discussions, adapt to evolving conversational contexts, and even produce creative or complex dialogue.

The table below outlines the key differences and similarities to help us understand their unique roles in digital communication.

Chatbot	Generative AI-Powered Chat Application
Task-Focused and rule based	Context-aware
Often integrated into larger systems	May host one or multiple chatbots
Limited to programmed functions	Incorporates generative AI models
Specialized & structured interactions	Capable of open-domain discussions

Leveraging pre-built functionalities with SDKs and APIs

When building a chat application, a great first step is to assess what is already out there. Using SDKs and APIs to build chat applications is an advantageous strategy for a variety of reasons. By integrating well-documented SDKs and APIs, you're strategically positioning your application for long-term success, addressing scalability and maintenance concerns.

Expedites the development process and reduces overhead: Relying on pre-built functionalities instead of the expensive process of building them yourself allows you to focus on other aspects of your application that you may find more important, such as business logic.
Better performance: When building functionality from scratch, you'll eventually ask yourself "How does it scale? Is this application capable of handling a sudden influx of users?" Well maintained SDK and APIs often have built in solutions for these concerns.
Easier maintenance: Updates and improvements are easier to manage as most APIs and SDKs simply require an update to a library when a newer version is released.
Access to cutting edge technology: Leveraging models that have been fined tuned and trained on extensive datasets provides your application with natural language capabilities.

Accessing functionality of an SDK or API typically involves obtaining permission to use the provided services, which is often through the use of a unique key or authentication token. We'll use the OpenAI Python Library to explore what this looks like. You can also try it out on your own in the following notebook for OpenAI or notebook for Azure OpenAI Services for this lesson.

import os
from openai import OpenAI

API_KEY = os.getenv("OPENAI_API_KEY","")

client = OpenAI(
    api_key=API_KEY
    )

chat_completion = client.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Suggest two titles for an instructional lesson on chat applications for generative AI."}])

The above example uses the GPT-3.5 Turbo model to complete the prompt, but notice that the API key is set prior to doing so. You'd receive an error if you didn't set the key.

User Experience (UX)

General UX principles apply to chat applications, but here are some additional considerations that become particularly important due to the machine learning components involved.

Mechanism for addressing ambiguity: Generative AI models occasionally generate ambiguous answers. A feature that allows users to ask for clarification can be helpful should they come across this problem.
Context retention: Advanced generative AI models have the ability to remember context within a conversation, which can be a necessary asset to the user experience. Giving users the ability to control and manage context improves the user experience, but introduces the risk of retaining sensitive user information. Considerations for how long this information is stored, such as introducing a retention policy, can balance the need for context against privacy.
Personalization: With the ability to learn and adapt, AI models offer an individualized experience for a user. Tailoring the user experience through features like user profiles not only makes the user feel understood, but it also helps their pursuit of finding specific answers, creating a more efficient and satisfying interaction.

One such example of personalization is the "Custom instructions" settings in OpenAI's ChatGPT. It allows you to provide information about yourself that may be important context for your prompts. Here's an example of a custom instruction.

This "profile" prompts ChatGPT to create a lesson plan on linked lists. Notice that ChatGPT takes into account that the user may want a more in depth lesson plan based on her experience.

Microsoft's System Message Framework for Large Language Models

Microsoft has provided guidance for writing effective system messages when generating responses from LLMs broken down into 4 areas:

Defining who the model is for, as well as its capabilities and limitations.
Defining the model's output format.
Providing specific examples that demonstrate intended behavior of the model.
Providing additional behavioral guardrails.

Accessibility

Whether a user has visual, auditory, motor, or cognitive impairments, a well-designed chat application should be usable by all. The following list breaks down specific features aimed at enhancing accessibility for various user impairments.

Features for Visual Impairment: High contrast themes and resizable text, screen reader compatibility.
Features for Auditory Impairment: Text-to-speech and speech-to-text functions, visual cues for audio notifications.
Features for Motor Impairment: Keyboard navigation support, voice commands.
Features for Cognitive Impairment: Simplified language options.

Customization and Fine-tuning for Domain-Specific Language Models

Imagine a chat application that understands your company's jargon and anticipates the specific queries its user base commonly has. There are a couple of approaches worth mentioning:

Leveraging DSL models. DSL stands for domain specific language. You can leverage a so called DSL model trained on a specific domain to understand it's concepts and scenarios.
Apply fine-tuning. Fine-tuning is the process of further training your model with specific data.

Customization: Using a DSL

Leveraging a domain-specific language models (DSL Models) can enhance user engagement and by providing specialized, contextually relevant interactions. It's a model that is trained or fine-tuned to understand and generate text related to a specific field, industry, or subject. Options for using a DSL model can vary from training one from scratch, to using pre-existing ones through SDKs and APIs. Another option is fine-tuning, which involves taking an existing pre-trained model and adapting it for a specific domain.

Customization: Apply fine-tuning

Fine-tuning is often considered when a pre-trained model falls short in a specialized domain or specific task.

For instance, medical queries are complex and require a lot of context. When a medical professional diagnoses a patient it's based on a variety of factors such as lifestyle or pre-existing conditions, and may even rely on recent medical journals to validate their diagnosis. In such nuanced scenarios, a general-purpose AI chat application cannot be a reliable source.

Scenario: a medical application

Consider a chat application designed to assist medical practitioners by providing quick references to treatment guidelines, drug interactions, or recent research findings.

A general-purpose model might be adequate for answering basic medical questions or providing general advice, but it may struggle with the following:

Highly specific or complex cases. For example, a neurologist might ask the application, "What are the current best practices for managing drug-resistant epilepsy in pediatric patients?"
Lacking recent advancements. A general-purpose model could struggle to provide a current answer that incorporates the most recent advancements in neurology and pharmacology.

In instances such as these, fine-tuning the model with a specialized medical dataset can significantly improve its ability to handle these intricate medical inquiries more accurately and reliably. This requires access to a large and relevant dataset that represents the domain-specific challenges and questions that need to be addressed.

Considerations for a High Quality AI-Driven Chat Experience

This section outlines the criteria for "high-quality" chat applications, which include the capture of actionable metrics and adherence to a framework that responsibly leverages AI technology.

Key Metrics

To maintain the high-quality performance an application, it's essential to keep track of key metrics and considerations. These measurements not only ensure the functionality of the application but also assess the quality of the AI model and user experience. Below is a list that covers basic, AI, and user experience metrics to consider.

Metric	Definition	Considerations for Chat Developer
Uptime	Measures the time the application is operational and accessible by users.	How will you minimize downtime?
Response Time	The time taken by the application to reply to a user's query.	How can you optimize query processing to improve response time?
Precision	The ratio of true positive predictions to the total number of positive predictions	How will you validate the precision of your model?
Recall (Sensitivity)	The ratio of true positive predictions to the actual number of positives	How will you measure and improve recall?
F1 Score	The harmonic mean of precision and recall, that balances the trade-off between both.	What is your target F1 Score? How will you balance precision and recall?
Perplexity	Measures how well the probability distribution predicted by the model aligns with the actual distribution of the data.	How will you minimize perplexity?
User Satisfaction Metrics	Measures the user's perception of the application. Often captured through surveys.	How often will you collect user feedback? How will you adapt based on it?
Error Rate	The rate at which the model makes mistakes in understanding or output.	What strategies do you have in place to reduce error rates?
Retraining Cycles	The frequency with which the model is updated to incorporate new data and insights.	How often will you retrain the model? What triggers a retraining cycle?
Anomaly Detection	Tools and techniques for identifying unusual patterns that do not conform to expected behavior.	How will you respond to anomalies?

Implementing Responsible AI Practices in Chat Applications

Microsoft's approach to Responsible AI has identified six principles that should guide AI development and use. Below are the principles, their definition, and things a chat developer should consider and why they should take them seriously.

Principles	Microsoft's Definition	Considerations for Chat Developer	Why It's Important
Fairness	AI systems should treat all people fairly.	Ensure the chat application does not discriminate based on user data.	To build trust and inclusivity among users; avoids legal ramifications.
Reliability and Safety	AI systems should perform reliably and safely.	Implement testing and fail-safes to minimize errors and risks.	Ensures user satisfaction and prevents potential harm.
Privacy and Security	AI systems should be secure and respect privacy.	Implement strong encryption and data protection measures.	To safeguard sensitive user data and comply with privacy laws.
Inclusiveness	AI systems should empower everyone and engage people.	Design UI/UX that is accessible and easy-to-use for diverse audiences.	Ensures a wider range of people can use the application effectively.
Transparency	AI systems should be understandable.	Provide clear documentation and reasoning for AI responses.	Users are more likely to trust a system if they can understand how decisions are made.
Accountability	People should be accountable for AI systems.	Establish a clear process for auditing and improving AI decisions.	Enables ongoing improvement and corrective measures in case of mistakes.

Assignment

See assignment. It will take you through a series of exercises from running your first chat prompts, to classifying and summarizing text and more. Notice that the assignments are available in different programming languages!

Great Work! Continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 8 to see how you can start building search applications!

Building a Search Applications

There's more to LLMs than chatbots and text generation. It's also possible to build search applications using Embeddings. Embeddings are numerical representations of data also known as vectors, and can be used for semantic search for data.

In this lesson, you are going to build a search application for our education startup. Our startup is a non-profit organization that provides free education to students in developing countries. Our startup has a large number of YouTube videos that students can use to learn about AI. Our startup wants to build a search application that allows students to search for a YouTube video by typing a question.

For example, a student might type in 'What are Jupyter Notebooks?' or 'What is Azure ML' and the search application will return a list of YouTube videos that are relevant to the question, and better still, the search application will return a link to the place in the video where the answer to the question is located.

Introduction

In this lesson, we will cover:

Semantic vs Keyword search.
What are Text Embeddings.
Creating a Text Embeddings Index.
Searching a Text Embeddings Index.

Learning Goals

After completing this lesson, you will be able to:

Tell the difference between semantic and keyword search.
Explain what Text Embeddings are.
Create an application using Embeddings to search for data.

Why build a search application?

Creating a search application will help you understand how to use Embeddings to search for data. You will also learn how to build a search application that can be used by students to find information quickly.

The following is an example of a semantic query for the question 'can you use rstudio with azure ml?'. Check out the YouTube url, you'll see the url contains a timestamp that takes you to the place in the video where the answer to the question is located.

What is semantic search?

Now you might be wondering, what is semantic search? Semantic search is a search technique that uses the semantics, or meaning, of the words in a query to return relevant results.

Here is an example of a semantic search. Let's say you were looking to buy a car, you might search for 'my dream car', semantic search understands that you are not dreaming about a car, but rather you are looking to buy your ideal car. Semantic search understands your intention and returns relevant results. The alternative is keyword search which would literally search for dreams about cars and often returns irrelevant results.

What are Text Embeddings?

Text embeddings are a text representation technique used in natural language processing. Text embeddings are semantic numerical representations of text. Embeddings are used to represent data in a way that is easy for a machine to understand. There are many models for building text embeddings, in this lesson, we will focus on generating embeddings using the OpenAI Embedding Model.

Here's an example, imagine the following text is in a transcript from one of the episodes on the AI Show YouTube channel:

Today we are going to learn about Azure Machine Learning.

We'd pass the text to the OpenAI Embedding API and it would return the following embedding consisting of 1536 numbers aka a vector. Each number in the vector represents a different aspect of the text. For brevity, here are the first 10 numbers in the vector.

[-0.006655829958617687, 0.0026128944009542465, 0.008792596869170666, -0.02446001023054123, -0.008540431968867779, 0.022071078419685364, -0.010703742504119873, 0.003311325330287218, -0.011632772162556648, -0.02187200076878071, ...]

How is the Embedding index created?

The Embedding index for this lesson was created with a series of Python scripts. You'll find the scripts along with instructions in the README in the 'scripts` folder for this lesson. You don't need to run these scripts to complete this lesson as the Embedding Index is provided for you.

The scripts perform the following operations:

The transcript for each YouTube video in the AI Show playlist is downloaded.
Using OpenAI Functions, an attempt is made to extract the speaker name from the first 3 minutes of the YouTube transcript. The speaker name for each video is stored in the Embedding Index named embedding_index_3m.json.
The transcript text is then chunked into 3 minute text segments. The segment includes about 20 words overlapping from the next segment to ensure that the Embedding for the segment is not cut off and to provide better search context.
Each text segment is then passed to the OpenAI Chat API to summarize the text into 60 words. The summary is also stored in the Embedding Index embedding_index_3m.json.
Finally, the segment text is passed to the OpenAI Embedding API. The Embedding API returns a vector of 1536 numbers that represent the semantic meaning of the segment. The segment along with the OpenAI Embedding vector is stored in an Embedding Index embedding_index_3m.json.

Vector Databases

For lesson simplicity, the Embedding Index is stored in a JSON file named embedding_index_3m.json and loaded into a Pandas DataFrame. However, in production, the Embedding Index would be stored in a vector database such as Azure Cognitive Search, Redis, Pinecone, Weaviate, to name but a few.

Understanding cosine similarity

We've learned about text embeddings, the next step is to learn how to use text embeddings to search for data and in particular find the most similar embeddings to a given query using cosine similarity.

What is cosine similarity?

Cosine similarity is a measure of similarity between two vectors, you'll also hear this referred to as nearest neighbor search. To perform a cosine similarity search you need to vectorize for query text using the OpenAI Embedding API. Then calculate the cosine similarity between the query vector and each vector in the Embedding Index. Remember, the Embedding Index has a vector for each YouTube transcript text segment. Finally, sort the results by cosine similarity and the text segments with the highest cosine similarity are the most similar to the query.

From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multidimensional space. This measurement is beneficial, because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information about cosine similarity equations, see Cosine similarity.

Building your first search application

Next, we're going to learn how to build a search application using Embeddings. The search application will allow students to search for a video by typing a question. The search application will return a list of videos that are relevant to the question. The search application will also return a link to the place in the video where the answer to the question is located.

This solution was built and tested on Windows 11, macOS, and Ubuntu 22.04 using Python 3.10 or later. You can download Python from python.org.

Assignment - building a search application, to enable students

We introduced our startup at the beginning of this lesson. Now it's time to enable the students to build a search application for their assessments.

In this assignment, you will create the Azure OpenAI Services that will be used to build the search application. You will create the following Azure OpenAI Services. You'll need an Azure subscription to complete this assignment.

Start the Azure Cloud Shell

Sign in to the Azure portal.
Select the Cloud Shell icon in the upper-right corner of the Azure portal.
Select Bash for the environment type.

Create a resource group

For these instructions, we're using the resource group named "semantic-video-search" in East US. You can change the name of the resource group, but when changing the location for the resources, check the model availability table.

az group create --name semantic-video-search --location eastus

Create an Azure OpenAI Service resource

From the Azure Cloud Shell, run the following command to create an Azure OpenAI Service resource.

az cognitiveservices account create --name semantic-video-openai --resource-group semantic-video-search \
    --location eastus --kind OpenAI --sku s0

Get the endpoint and keys for usage in this application

From the Azure Cloud Shell, run the following commands to get the endpoint and keys for the Azure OpenAI Service resource.

az cognitiveservices account show --name semantic-video-openai \
   --resource-group  semantic-video-search | jq -r .properties.endpoint
az cognitiveservices account keys list --name semantic-video-openai \
   --resource-group semantic-video-search | jq -r .key1

Deploy the OpenAI Embedding model

From the Azure Cloud Shell, run the following command to deploy the OpenAI Embedding model.

az cognitiveservices account deployment create \
    --name semantic-video-openai \
    --resource-group  semantic-video-search \
    --deployment-name text-embedding-ada-002 \
    --model-name text-embedding-ada-002 \
    --model-version "2"  \
    --model-format OpenAI \
    --sku-capacity 100 --sku-name "Standard"

Solution

Open the solution notebook in GitHub Codespaces and follow the instructions in the Jupyter Notebook.

When you run the notebook, you'll be prompted to enter a query. The input box will look like this:

Great Work! Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 9 where we will look at how to build image generation applications!

Building Image Generation Applications

There's more to LLMs than text generation. It's also possible to generate images from text descriptions. Having images as a modality can be highly useful in a number of areas from MedTech, architecture, tourism, game development and more. In this chapter, we will look into the two most popular image generation models, DALL-E and Midjourney.

Introduction

In this lesson, we will cover:

Image generation and why it's useful.
DALL-E and Midjourney, what they are, and how they work.
How you would build an image generation app.

Learning Goals

After completing this lesson, you will be able to:

Build an image generation application.
Define boundaries for your application with meta prompts.
Work with DALL-E and Midjourney.

Why build an image generation application?

Image generation applications are a great way to explore the capabilities of Generative AI. They can be used for, for example:

Image editing and synthesis. You can generate images for a variety of use cases, such as image editing and image synthesis.
Applied to a variety of industries. They can also be used to generate images for a variety of industries like Medtech, Tourism, Game development and more.

Scenario: Edu4All

As part of this lesson, we will continue to work with our startup, Edu4All, in this lesson. The students will create images for their assessments, exactly what images is up to the students, but they could be illustrations for their own fairytale or create a new character for their story or help them visualize their ideas and concepts.

Here's what Edu4All's students could generate for example if they're working in class on monuments:

using a prompt like

"Dog next to Eiffel Tower in early morning sunlight"

What is DALL-E and Midjourney?

DALL-E and Midjourney are two of the most popular image generation models, they allow you to use prompts to generate images.

DALL-E

Let's start with DALL-E, which is a Generative AI model that generates images from text descriptions.

DALL-E is a combination of two models, CLIP and diffused attention.

CLIP, is a model that generates embeddings, which are numerical representations of data, from images and text.
Diffused attention, is a model that generates images from embeddings. DALL-E is trained on a dataset of images and text and can be used to generate images from text descriptions. For example, DALL-E can be used to generate images of a cat in a hat, or a dog with a mohawk.

Midjourney

Midjourney works in a similar way to DALL-E, it generates images from text prompts. Midjourney, can also be used to generate images using prompts like “a cat in a hat”, or a “dog with a mohawk”.

Image cred Wikipedia, image generated by Midjourney

How does DALL-E and Midjourney Work

First, DALL-E. DALL-E is a Generative AI model based on the transformer architecture with an autoregressive transformer.

An autoregressive transformer defines how a model generates images from text descriptions, it generates one pixel at a time, and then uses the generated pixels to generate the next pixel. Passing through multiple layers in a neural network, until the image is complete.

With this process, DALL-E, controls attributes, objects, characteristics, and more in the image it generates. However, DALL-E 2 and 3 have more control over the generated image.

Building your first image generation application

So what does it take to build an image generation application? You need the following libraries:

python-dotenv, you're highly recommended to use this library to keep your secrets in a .env file away from the code.
openai, this library is what you will use to interact with the OpenAI API.
pillow, to work with images in Python.
requests, to help you make HTTP requests.

Create and deploy an Azure OpenAI model

If not done already, follow the instructions on the Microsoft Learn page to create an Azure OpenAI resource and model. Select DALL-E 3 as model.

Create the app

Create a file .env with the following content:
```
AZURE_OPENAI_ENDPOINT=<your endpoint>
AZURE_OPENAI_API_KEY=<your key>
AZURE_OPENAI_DEPLOYMENT="dall-e-3"
```
Locate this information in Azure OpenAI Foundry Portal for your resource in the "Deployments" section.
Collect the above libraries in a file called requirements.txt like so:
```
python-dotenv
openai
pillow
requests
```
Next, create virtual environment and install the libraries:
```
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
For Windows, use the following commands to create and activate your virtual environment:
```
python3 -m venv venv
venv\Scripts\activate.bat
```

Add the following code in file called app.py:

import openai
import os
import requests
from PIL import Image
import dotenv
from openai import OpenAI, AzureOpenAI

# import dotenv
dotenv.load_dotenv()

# configure Azure OpenAI service client 
client = AzureOpenAI(
  azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
  api_key=os.environ['AZURE_OPENAI_API_KEY'],
  api_version = "2024-02-01"
  )
try:
    # Create an image by using the image generation API
    generation_response = client.images.generate(
                            prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                            size='1024x1024', n=1,
                            model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                          )

    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response.data[0].url  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.InvalidRequestError as err:
    print(err)

Let's explain this code:

First, we import the libraries we need, including the OpenAI library, the dotenv library, the requests library, and the Pillow library.
```
import openai
import os
import requests
from PIL import Image
import dotenv
```
Next, we load the environment variables from the .env file.
```
# import dotenv
dotenv.load_dotenv()
```

After that, we configure Azure OpenAI service client

# Get endpoint and key from environment variables
client = AzureOpenAI(
    azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ['AZURE_OPENAI_API_KEY'],
    api_version = "2024-02-01"
    )

Next, we generate the image:

# Create an image by using the image generation API
generation_response = client.images.generate(
                      prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                      size='1024x1024', n=1,
                      model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                    )

The above code responds with a JSON object that contains the URL of the generated image. We can use the URL to download the image and save it to a file.

Lastly, we open the image and use the standard image viewer to display it:
```
image = Image.open(image_path)
image.show()
```

More details on generating the image

Let's look at the code that generates the image in more detail:

  generation_response = client.images.generate(
                            prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                            size='1024x1024', n=1,
                            model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                        )

prompt, is the text prompt that is used to generate the image. In this case, we're using the prompt "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils".
size, is the size of the image that is generated. In this case, we're generating an image that is 1024x1024 pixels.
n, is the number of images that are generated. In this case, we're generating two images.
temperature, is a parameter that controls the randomness of the output of a Generative AI model. The temperature is a value between 0 and 1 where 0 means that the output is deterministic and 1 means that the output is random. The default value is 0.7.

There are more things you can do with images that we will cover in the next section.

Additional capabilities of image generation

You've seen so far how we were able to generate an image using a few lines in Python. However, there are more things you can do with images.

You can also do the following:

Perform edits. By providing an existing image a mask and a prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done.

Note: this is not supported in DALL-E 3.

Here is an example using GPT Image:

response = client.images.edit(
    model="gpt-image-1",
    image=open("sunlit_lounge.png", "rb"),
    mask=open("mask.png", "rb"),
    prompt="A sunlit indoor lounge area with a pool containing a flamingo"
)
image_url = response.data[0].url

The base image would only contain the lounge with pool but the final image would have a flamingo:

Create variations. The idea is that you take an existing image and ask that variations are created. To create a variation, you provide an image and a text prompt and code like so:
```
response = openai.Image.create_variation(
  image=open("bunny-lollipop.png", "rb"),
  n=1,
  size="1024x1024"
)
image_url = response['data'][0]['url']
```
Note, this is only supported on OpenAI

Temperature

Temperature is a parameter that controls the randomness of the output of a Generative AI model. The temperature is a value between 0 and 1 where 0 means that the output is deterministic and 1 means that the output is random. The default value is 0.7.

Let's look at an example of how temperature works, by running this prompt twice:

Prompt : "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"

Now let's run that same prompt just to see that we won't get the same image twice:

As you can see, the images are similar, but not the same. Let's try changing the temperature value to 0.1 and see what happens:

 generation_response = client.images.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2
    )

Changing the temperature

So let's try to make the response more deterministic. We could observe from the two images we generated that in the first image, there's a bunny and in the second image, there's a horse, so the images vary greatly.

Let's therefore change our code and set the temperature to 0, like so:

generation_response = client.images.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0
    )

Now when you run this code, you get these two images:

Here you can clearly see how the images resemble each other more.

How to define boundaries for your application with metaprompts

With our demo, we can already generate images for our clients. However, we need to create some boundaries for our application.

For example, we don't want to generate images that are not safe for work, or that are not appropriate for children.

We can do this with metaprompts. Metaprompts are text prompts that are used to control the output of a Generative AI model. For example, we can use metaprompts to control the output, and ensure that the generated images are safe for work, or appropriate for children.

How does it work?

Now, how do meta prompts work?

Meta prompts are text prompts that are used to control the output of a Generative AI model, they are positioned before the text prompt, and are used to control the output of the model and embedded in applications to control the output of the model. Encapsulating the prompt input and the meta prompt input in a single text prompt.

One example of a meta prompt would be the following:

You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.

(Input)

Now, let's see how we can use meta prompts in our demo.

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt =f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""

prompt = f"{meta_prompt}
Create an image of a bunny on a horse, holding a lollipop"

# TODO add request to generate image

From the above prompt, you can see how all images being created consider the metaprompt.

Assignment - let's enable students

We introduced Edu4All at the beginning of this lesson. Now it's time to enable the students to generate images for their assessments.

The students will create images for their assessments containing monuments, exactly what monuments is up to the students. The students are asked to use their creativity in this task to place these monuments in different contexts.

Solution

Here's one possible solution:

import openai
import os
import requests
from PIL import Image
import dotenv
from openai import AzureOpenAI
# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
client = AzureOpenAI(
  azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
  api_key=os.environ['AZURE_OPENAI_API_KEY'],
  api_version = "2024-02-01"
  )


disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt = f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""

prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
""""

try:
    # Create an image by using the image generation API
    generation_response = client.images.generate(
        prompt=prompt,    # Enter your prompt text here
        size='1024x1024',
        n=1,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response.data[0].url  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.BadRequestError as err:
    print(err)

Great Work! Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 10 where we will look at how to build AI applications with low-code

Building Low Code AI Applications

Introduction

Now that we've learned how to build image generating applications, let's talk about low code. Generative AI can be used for a variety of different areas including low code, but what is low code and how can we add AI to it?

Building apps and solutions has become easier for traditional developers and non-developers through the use of Low Code Development Platforms. Low Code Development Platforms enable you to build apps and solutions with little to no code. This is achieved by providing a visual development environment that enables you to drag and drop components to build apps and solutions. This enables you to build apps and solutions faster and with less resources. In this lesson, we dive deep into how to use Low Code and how to enhance low code development with AI using Power Platform.

The Power Platform provides organizations with the opportunity to empower their teams to build their own solutions through an intuitive low-code or no-code environment. This environment helps simplify the process of building solutions. With Power Platform, solutions can be built in days or weeks instead of months or years. Power Platform consists of five key products: Power Apps, Power Automate, Power BI, Power Pages and Copilot Studio.

This lesson covers:

Introduction to Generative AI in Power Platform
Introduction to Copilot and how to use it
Using Generative AI to build apps and flows in Power Platform
Understanding the AI Models in Power Platform with AI Builder

Learning Goals

By the end of this lesson, you will be able to:

Understand how Copilot works in Power Platform.
Build a Student Assignment Tracker App for our education startup.
Build an Invoice Processing Flow that uses AI to extract information from invoices.
Apply best practices when using the Create Text with GPT AI Model.

The tools and technologies that you will use in this lesson are:

Power Apps, for the Student Assignment Tracker app, which provides a low-code development environment for building apps to track, manage and interact with data.
Dataverse, for storing the data for the Student Assignment Tracker app where Dataverse will provide a low-code data platform for storing the app's data.
Power Automate, for the Invoice Processing flow where you will have low-code development environment for building workflows to automate the Invoice Processing process.
AI Builder, for the Invoice Processing AI Model where you will use prebuilt AI Models to process the invoices for our startup.

Generative AI in Power Platform

Enhancing low-code development and application with generative AI is a key focus area for Power Platform. The goal is to enable everyone to build AI-powered apps, sites, dashboards and automate processes with AI, without requiring any data science expertise. This goal is achieved by integrating generative AI into the low-code development experience in Power Platform in the form of Copilot and AI Builder.

How does this work?

Copilot is an AI assistant that enables you to build Power Platform solutions by describing your requirements in a series of conversational steps using natural language. You can for example instruct your AI assistant to state what fields your app will use and it will create both the app and the underlying data model or you could specify how to set up a flow in Power Automate.

You can use Copilot driven functionalities as a feature in your app screens to enable users to uncover insights through conversational interactions.

AI Builder is a low-code AI capability available in Power Platform that enables you to use AI Models to help you to automate processes and predict outcomes. With AI Builder you can bring AI to your apps and flows that connect to your data in Dataverse or in various cloud data sources, such as SharePoint, OneDrive or Azure.

Copilot is available in all of the Power Platform products: Power Apps, Power Automate, Power BI, Power Pages and Power Virtual Agents. AI Builder is available in Power Apps and Power Automate. In this lesson, we will focus on how to use Copilot and AI Builder in Power Apps and Power Automate to build a solution for our education startup.

Copilot in Power Apps

As part of the Power Platform, Power Apps provides a low-code development environment for building apps to track, manage and interact with data. It's a suite of app development services with a scalable data platform and the ability to connect to cloud services and on-premises data. Power Apps allows you to build apps that run on browsers, tablets, and phones, and can be shared with co-workers. Power Apps eases users into app development with a simple interface, so that every business user or pro developer can build custom apps. The app development experience is also enhanced with Generative AI through Copilot.

The copilot AI assistant feature in Power Apps enables you to describe what kind of app you need and what information you want your app to track, collect, or show. Copilot then generates a responsive Canvas app based on your description. You can then customize the app to meet your needs. The AI Copilot also generates and suggests a Dataverse Table with the fields you need to store the data you want to track and some sample data. We will look at what Dataverse is and how you can use it in Power Apps in this lesson later. You can then customize the table to meet your needs using the AI Copilot assistant feature through conversational steps. This feature is readily available from the Power Apps home screen.

Copilot in Power Automate

As part of the Power Platform, Power Automate lets users create automated workflows between applications and services. It helps automate repetitive business processes such as communication, data collection, and decision approvals. Its simple interface allows users with every technical competence (from beginners to seasoned developers) to automate work tasks. The workflow development experience is also enhanced with Generative AI through Copilot.

The copilot AI assistant feature in Power Automate enables you to describe what kind of flow you need and what actions you want your flow to perform. Copilot then generates a flow based on your description. You can then customize the flow to meet your needs. The AI Copilot also generates and suggests the actions you need to perform the task you want to automate. We will look at what flows are and how you can use them in Power Automate in this lesson later. You can then customize the actions to meet your needs using the AI Copilot assistant feature through conversational steps. This feature is readily available from the Power Automate home screen.

Assignment: Manage student assignments and invoices for our startup, using Copilot

Our startup provides online courses to students. The startup has grown rapidly and is now struggling to keep up with the demand for its courses. The startup has hired you as a Power Platform developer to help them build a low code solution to help them manage their student assignments and invoices. Their solution should be able to help them track and manage student assignments through an app and automate the invoice processing process through a workflow. You have been asked to use Generative AI to develop the solution.

When you are getting started with using Copilot, you can use the Power Platform Copilot Prompt Library to get started with the prompts. This library contains a list of prompts that you can use to build apps and flows with Copilot. You can also use the prompts in the library to get an idea of how to describe your requirements to Copilot.

Build a Student Assignment Tracker App for Our Startup

The educators at our startup have been struggling to keep track of student assignments. They have been using a spreadsheet to track the assignments but this has become difficult to manage as the number of students has increased. They have asked you to build an app that will help them track and manage student assignments. The app should enable them to add new assignments, view assignments, update assignments and delete assignments. The app should also enable educators and students to view the assignments that have been graded and those that have not been graded.

You will build the app using Copilot in Power Apps following the steps below:

Navigate to the Power Apps home screen.
Use the text area on the home screen to describe the app you want to build. For example, I want to build an app to track and manage student assignments. Click on the Send button to send the prompt to the AI Copilot.

The AI Copilot will suggest a Dataverse Table with the fields you need to store the data you want to track and some sample data. You can then customize the table to meet your needs using the AI Copilot assistant feature through conversational steps.

Important: Dataverse is the underlying data platform for Power Platform. It is a low-code data platform for storing the app's data. It is a fully managed service that securely stores data in the Microsoft Cloud and is provisioned within your Power Platform environment. It comes with built-in data governance capabilities, such as data classification, data lineage, fine-grained access control, and more. You can learn more about Dataverse here.
Educators want to send emails to the students who have submitted their assignments to keep them updated on the progress of their assignments. You can use Copilot to add a new field to the table to store the student email. For example, you can use the following prompt to add a new field to the table: I want to add a column to store student email. Click on the Send button to send the prompt to the AI Copilot.

The AI Copilot will generate a new field and you can then customize the field to meet your needs.
Once you are done with the table, click on the Create app button to create the app.
The AI Copilot will generate a responsive Canvas app based on your description. You can then customize the app to meet your needs.
For educators to send emails to students, you can use Copilot to add a new screen to the app. For example, you can use the following prompt to add a new screen to the app: I want to add a screen to send emails to students. Click on the Send button to send the prompt to the AI Copilot.

The AI Copilot will generate a new screen and you can then customize the screen to meet your needs.
Once you are done with the app, click on the Save button to save the app.
To share the app with the educators, click on the Share button and then click on the Share button again. You can then share the app with the educators by entering their email addresses.

Your homework: The app you just built is a good start but can be improved. With the email feature, educators can only send emails to students manually by having to type their emails. Can you use Copilot to build an automation that will enable educators to send emails to students automatically when they submit their assignments? Your hint is with the right prompt you can use Copilot in Power Automate to build this.

Build an Invoices Information Table for Our Startup

The finance team of our startup has been struggling to keep track of invoices. They have been using a spreadsheet to track the invoices but this has become difficult to manage as the number of invoices has increased. They have asked you to build a table that will help them store, track and manage the information of the invoices they received. The table should be used to build an automation that will extract all the invoice information and store it in the table. The table should also enable the finance team to view the invoices that have been paid and those that have not been paid.

The Power Platform has an underlying data platform called Dataverse that enables you to store the data for your apps and solutions. Dataverse provides a low-code data platform for storing the app's data. It is a fully managed service that securely stores data in the Microsoft Cloud and is provisioned within your Power Platform environment. It comes with built-in data governance capabilities, such as data classification, data lineage, fine-grained access control, and more. You can learn more about Dataverse here.

Why should we use Dataverse for our startup? The standard and custom tables within Dataverse provide a secure and cloud-based storage option for your data. Tables let you store different types of data, similar to how you might use multiple worksheets in a single Excel workbook. You can use tables to store data that is specific to your organization or business needs. Some of the benefits our startup will get from using Dataverse include but are not limited to:

Easy to manage: Both the metadata and data are stored in the cloud, so you don't have to worry about the details of how they are stored or managed. You can focus on building your apps and solutions.
Secure: Dataverse provides a secure and cloud-based storage option for your data. You can control who has access to the data in your tables and how they can access it using role based security.
Rich metadata: Data types and relationships are used directly within Power Apps
Logic and validation: You can use business rules, calculated fields, and validation rules to enforce business logic and maintain data accuracy.

Now that you know what Dataverse is and why you should use it, let's look at how you can use Copilot to create a table in Dataverse to meet the requirements of our finance team.

Note : You will use this table in the next section to build an automation that will extract all the invoice information and store it in the table.

To create a table in Dataverse using Copilot, follow the steps below:

Navigate to the Power Apps home screen.
On the left navigation bar, select on Tables and then click on Describe the new Table.

On the Describe the new Table screen, use the text area to describe the table you want to create. For example, I want to create a table to store invoice information. Click on the Send button to send the prompt to the AI Copilot.

The AI Copilot will suggest a Dataverse Table with the fields you need to store the data you want to track and some sample data. You can then customize the table to meet your needs using the AI Copilot assistant feature through conversational steps.

The finance team wants to send an email to the supplier to update them with the current status of their invoice. You can use Copilot to add a new field to the table to store the supplier email. For example, you can use the following prompt to add a new field to the table: I want to add a column to store supplier email. Click on the Send button to send the prompt to the AI Copilot.
The AI Copilot will generate a new field and you can then customize the field to meet your needs.
Once you are done with the table, click on the Create button to create the table.

AI Models in Power Platform with AI Builder

Prebuilt AI Models vs Custom AI Models

AI Builder provides two types of AI Models: Prebuilt AI Models and Custom AI Models. Prebuilt AI Models are ready-to-use AI Models that are trained by Microsoft and available in Power Platform. These help you add intelligence to your apps and flows without having to gather data and then build, train and publish your own models. You can use these models to automate processes and predict outcomes.

Some of the Prebuilt AI Models available in Power Platform include:

Key Phrase Extraction: This model extracts key phrases from text.
Language Detection: This model detects the language of a text.
Sentiment Analysis: This model detects positive, negative, neutral, or mixed sentiment in text.
Business Card Reader: This model extracts information from business cards.
Text Recognition: This model extracts text from images.
Object Detection: This model detects and extracts objects from images.
Document processing: This model extracts information from forms.
Invoice Processing: This model extracts information from invoices.

With Custom AI Models you can bring your own model into AI Builder so that it can function like any AI Builder custom model, allowing you to train the model using your own data. You can use these models to automate processes and predict outcomes in both Power Apps and Power Automate. When using your own model there are limitations that apply. Read more on these limitations.

Assignment #2 - Build an Invoice Processing Flow for Our Startup

The finance team has been struggling to process invoices. They have been using a spreadsheet to track the invoices but this has become difficult to manage as the number of invoices has increased. They have asked you to build a workflow that will help them process invoices using AI. The workflow should enable them to extract information from invoices and store the information in a Dataverse table. The workflow should also enable them to send an email to the finance team with the extracted information.

Now that you know what AI Builder is and why you should use it, let's look at how you can use the Invoice Processing AI Model in AI Builder, which we covered earlier on, to build a workflow that will help the finance team process invoices.

To build a workflow that will help the finance team process invoices using the Invoice Processing AI Model in AI Builder, follow the steps below:

Navigate to the Power Automate home screen.
Use the text area on the home screen to describe the workflow you want to build. For example, Process an invoice when it arrives in my mailbox. Click on the Send button to send the prompt to the AI Copilot.
The AI Copilot will suggest the actions you need to perform the task you want to automate. You can click on the Next button to go through the next steps.
On the next step, Power Automate will prompt you to set up the connections required for the flow. Once you are done, click on the Create flow button to create the flow.
The AI Copilot will generate a flow and you can then customize the flow to meet your needs.
Update the trigger of the flow and set the Folder to the folder where the invoices will be stored. For example, you can set the folder to Inbox. Click on Show advanced options and set the Only with Attachments to Yes. This will ensure that the flow only runs when an email with an attachment is received in the folder.
Remove the following actions from the flow: HTML to text, Compose, Compose 2, Compose 3 and Compose 4 because you will not be using them.
Remove the Condition action from the flow because you will not be using it. It should look like the following screenshot:
Click on the Add an action button and search for Dataverse. Select the Add a new row action.
On the Extract Information from invoices action, update the Invoice File to point to the Attachment Content from the email. This will ensure that the flow extracts information from the invoice attachment.
Select the Table you created earlier on. For example, you can select the Invoice Information table. Choose the dynamic content from the previous action to populate the following fields:
- ID
- Amount
- Date
- Name
- Status - Set the Status to Pending.
- Supplier Email - Use the From dynamic content from the When a new email arrives trigger.
Once you are done with the flow, click on the Save button to save the flow. You can then test the flow by sending an email with an invoice to the folder you specified in the trigger.

Your homework: The flow you just built is a good start, now you need to think of how you can build an automation that will enable our finance team to send an email to the supplier to update them with the current status of their invoice. Your hint: the flow must run when the status of the invoice changes.

Use a Text Generation AI Model in Power Automate

The Create Text with GPT AI Model in AI Builder enables you to generate text based on a prompt and is powered by the Microsoft Azure OpenAI Service. With this capability, you can incorporate GPT (Generative Pre-Trained Transformer) technology into your apps and flows to build a variety of automated flows and insightful applications.

GPT models undergo extensive training on vast amounts of data, enabling them to produce text that closely resembles human language when provided with a prompt. When integrated with workflow automation, AI models like GPT can be harnessed to streamline and automate a wide range of tasks.

For example, you can build flows to automatically generate text for a variety of use cases, such as: drafts of emails, product descriptions, and more. You can also use the model to generate text for a variety of apps, such as chatbots and customer service apps that enable customer service agents to respond effectively and efficiently to customer inquiries.

To learn how to use this AI Model in Power Automate, go through the Add intelligence with AI Builder and GPT module.

Great Work! Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 11 where we will look at how to integrate Generative AI with Function Calling!

Integrating with function calling

You've learned a fair bit so far in the previous lessons. However, we can improve further. Some things we can address are how we can get a more consistent response format to make it easier to work with the response downstream. Also, we might want to add data from other sources to further enrich our application.

The above-mentioned problems are what this chapter is looking to address.

Introduction

This lesson will cover:

Explain what function calling is and its use cases.
Creating a function call using Azure OpenAI.
How to integrate a function call into an application.

Learning Goals

By the end of this lesson, you will be able to:

Explain the purpose of using function calling.
Setup Function Call using the Azure OpenAI Service.
Design effective function calls for your application's use case.

Scenario: Improving our chatbot with functions

For this lesson, we want to build a feature for our education startup that allows users to use a chatbot to find technical courses. We will recommend courses that fit their skill level, current role and technology of interest.

To complete this scenario, we will use a combination of:

Azure OpenAI to create a chat experience for the user.
Microsoft Learn Catalog API to help users find courses based on the request of the user.
Function Calling to take the user's query and send it to a function to make the API request.

To get started, let's look at why we would want to use function calling in the first place:

Why Function Calling

Before function calling, responses from an LLM were unstructured and inconsistent. Developers were required to write complex validation code to make sure they were able to handle each variation of a response. Users could not get answers like "What is the current weather in Stockholm?". This is because models were limited to the time the data was trained on.

Function Calling is a feature of the Azure OpenAI Service to overcome the following limitations:

Consistent response format. If we can better control the response format we can more easily integrate the response downstream to other systems.
External data. Ability to use data from other sources of an application in a chat context.

Illustrating the problem through a scenario

We recommend you to use the included notebook if you want to run the below scenario. You can also just read along as we're trying to illustrate a problem where functions can help to address the problem.

Let's look at the example that illustrates the response format problem:

Let's say we want to create a database of student data so we can suggest the right course to them. Below we have two descriptions of students that are very similar in the data they contain.

Create a connection to our Azure OpenAI resource:

import os
import json
from openai import AzureOpenAI
from dotenv import load_dotenv
load_dotenv()

client = AzureOpenAI(
api_key=os.environ['AZURE_OPENAI_API_KEY'],  # this is also the default, it can be omitted
api_version = "2023-07-01-preview"
)

deployment=os.environ['AZURE_OPENAI_DEPLOYMENT']

Below is some Python code for configuring our connection to Azure OpenAI where we set api_type, api_base, api_version and api_key.

Creating two student descriptions using variables student_1_description and student_2_description.

student_1_description="Emily Johnson is a sophomore majoring in computer science at Duke University. She has a 3.7 GPA. Emily is an active member of the university's Chess Club and Debate Team. She hopes to pursue a career in software engineering after graduating."

student_2_description = "Michael Lee is a sophomore majoring in computer science at Stanford University. He has a 3.8 GPA. Michael is known for his programming skills and is an active member of the university's Robotics Club. He hopes to pursue a career in artificial intelligence after finishing his studies."

We want to send the above student descriptions to an LLM to parse the data. This data can later be used in our application and be sent to an API or stored in a database.

Let's create two identical prompts in which we instruct the LLM on what information we are interested in:

prompt1 = f'''
Please extract the following information from the given text and return it as a JSON object:

name
major
school
grades
club

This is the body of text to extract the information from:
{student_1_description}
'''

prompt2 = f'''
Please extract the following information from the given text and return it as a JSON object:

name
major
school
grades
club

This is the body of text to extract the information from:
{student_2_description}
'''

The above prompts instruct the LLM to extract information and return the response in JSON format.

After setting up the prompts and the connection to Azure OpenAI, we will now send the prompts to the LLM by using openai.ChatCompletion. We store the prompt in the messages variable and assign the role to user. This is to mimic a message from a user being written to a chatbot.

# response from prompt one
openai_response1 = client.chat.completions.create(
model=deployment,
messages = [{'role': 'user', 'content': prompt1}]
)
openai_response1.choices[0].message.content

# response from prompt two
openai_response2 = client.chat.completions.create(
model=deployment,
messages = [{'role': 'user', 'content': prompt2}]
)
openai_response2.choices[0].message.content

Now we can send both requests to the LLM and examine the response we receive by finding it like so openai_response1['choices'][0]['message']['content'].

Lastly, we can convert the response to JSON format by calling json.loads:
```
# Loading the response as a JSON object
json_response1 = json.loads(openai_response1.choices[0].message.content)
json_response1
```
Response 1:
```
{
  "name": "Emily Johnson",
  "major": "computer science",
  "school": "Duke University",
  "grades": "3.7",
  "club": "Chess Club"
}
```
Response 2:
```
{
  "name": "Michael Lee",
  "major": "computer science",
  "school": "Stanford University",
  "grades": "3.8 GPA",
  "club": "Robotics Club"
}
```
Even though the prompts are the same and the descriptions are similar, we see values of the Grades property formatted differently, as we can sometimes get the format 3.7 or 3.7 GPA for example.

This result is because the LLM takes unstructured data in the form of the written prompt and returns also unstructured data. We need to have a structured format so that we know what to expect when storing or using this data

So how do we solve the formatting problem then? By using functional calling, we can make sure that we receive structured data back. When using function calling, the LLM does not actually call or run any functions. Instead, we create a structure for the LLM to follow for its responses. We then use those structured responses to know what function to run in our applications.

We can then take what is returned from the function and send this back to the LLM. The LLM will then respond using natural language to answer the user's query.

Use Cases for using function calls

There are many different use cases where function calls can improve your app like:

Calling External Tools. Chatbots are great at providing answers to questions from users. By using function calling, the chatbots can use messages from users to complete certain tasks. For example, a student can ask the chatbot to "Send an email to my instructor saying I need more assistance with this subject". This can make a function call to send_email(to: string, body: string)
Create API or Database Queries. Users can find information using natural language that gets converted into a formatted query or API request. An example of this could be a teacher who requests "Who are the students that completed the last assignment" which could call a function named get_completed(student_name: string, assignment: int, current_status: string)
Creating Structured Data. Users can take a block of text or CSV and use the LLM to extract important information from it. For example, a student can convert a Wikipedia article about peace agreements to create AI flashcards. This can be done by using a function called get_important_facts(agreement_name: string, date_signed: string, parties_involved: list)

Creating Your First Function Call

The process of creating a function call includes 3 main steps:

Calling the Chat Completions API with a list of your functions and a user message.
Reading the model's response to perform an action i.e. execute a function or API Call.
Making another call to Chat Completions API with the response from your function to use that information to create a response to the user.

Step 1 - creating messages

The first step is to create a user message. This can be dynamically assigned by taking the value of a text input or you can assign a value here. If this is your first time working with the Chat Completions API, we need to define the role and the content of the message.

The role can be either system (creating rules), assistant (the model) or user (the end-user). For function calling, we will assign this as user and an example question.

messages= [ {"role": "user", "content": "Find me a good course for a beginner student to learn Azure."} ]

By assigning different roles, it's made clear to the LLM if it's the system saying something or the user, which helps to build a conversation history that the LLM can build upon.

Step 2 - creating functions

Next, we will define a function and the parameters of that function. We will use just one function here called search_courses but you can create multiple functions.

Important : Functions are included in the system message to the LLM and will be included in the amount of available tokens you have available.

Below, we create the functions as an array of items. Each item is a function and has properties name, description and parameters:

functions = [
   {
      "name":"search_courses",
      "description":"Retrieves courses from the search index based on the parameters provided",
      "parameters":{
         "type":"object",
         "properties":{
            "role":{
               "type":"string",
               "description":"The role of the learner (i.e. developer, data scientist, student, etc.)"
            },
            "product":{
               "type":"string",
               "description":"The product that the lesson is covering (i.e. Azure, Power BI, etc.)"
            },
            "level":{
               "type":"string",
               "description":"The level of experience the learner has prior to taking the course (i.e. beginner, intermediate, advanced)"
            }
         },
         "required":[
            "role"
         ]
      }
   }
]

Let's describe each function instance more in detail below:

name - The name of the function that we want to have called.
description - This is the description of how the function works. Here it's important to be specific and clear.
parameters - A list of values and format that you want the model to produce in its response. The parameters array consists of items where the items have the following properties:
1. type - The data type of the properties will be stored in.
2. properties - List of the specific values that the model will use for its response
  1. name - The key is the name of the property that the model will use in its formatted response, for example, product.
  2. type - The data type of this property, for example, string.
  3. description - Description of the specific property.

There's also an optional property required - required property for the function call to be completed.

Step 3 - Making the function call

After defining a function, we now need to include it in the call to the Chat Completion API. We do this by adding functions to the request. In this case functions=functions.

There is also an option to set function_call to auto. This means we will let the LLM decide which function should be called based on the user message rather than assigning it ourselves.

Here's some code below where we call ChatCompletion.create, note how we set functions=functions and function_call="auto" and thereby giving the LLM the choice when to call the functions we provide it:

response = client.chat.completions.create(model=deployment,
                                        messages=messages,
                                        functions=functions,
                                        function_call="auto")

print(response.choices[0].message)

The response coming back now looks like so:

{
  "role": "assistant",
  "function_call": {
    "name": "search_courses",
    "arguments": "{\n  \"role\": \"student\",\n  \"product\": \"Azure\",\n  \"level\": \"beginner\"\n}"
  }
}

Here we can see how the function search_courses was called and with what arguments, as listed in the arguments property in the JSON response.

The conclusion the LLM was able to find the data to fit the arguments of the function as it was extracting it from the value provided to the messages parameter in the chat completion call. Below is a reminder of the messages value:

messages= [ {"role": "user", "content": "Find me a good course for a beginner student to learn Azure."} ]

As you can see, student, Azure and beginner was extracted from messages and set as input to the function. Using functions this way is a great way to extract information from a prompt but also to provide structure to the LLM and have reusable functionality.

Next, we need to see how we can use this in our app.

Integrating Function Calls into an Application

After we have tested the formatted response from the LLM, we can now integrate this into an application.

Managing the flow

To integrate this into our application, let's take the following steps:

First, let's make the call to the OpenAI services and store the message in a variable called response_message.
```
response_message = response.choices[0].message
```

Now we will define the function that will call the Microsoft Learn API to get a list of courses:

import requests

def search_courses(role, product, level):
  url = "https://learn.microsoft.com/api/catalog/"
  params = {
     "role": role,
     "product": product,
     "level": level
  }
  response = requests.get(url, params=params)
  modules = response.json()["modules"]
  results = []
  for module in modules[:5]:
     title = module["title"]
     url = module["url"]
     results.append({"title": title, "url": url})
  return str(results)

Note how we now create an actual Python function that maps to the function names introduced in the functions variable. We're also making real external API calls to fetch the data we need. In this case, we go against the Microsoft Learn API to search for training modules.

Ok, so we created functions variables and a corresponding Python function, how do we tell the LLM how to map these two together so our Python function is called?

To see if we need to call a Python function, we need to look into the LLM response and see if function_call is part of it and call the pointed-out function. Here's how you can make the mentioned check below:

# Check if the model wants to call a function
if response_message.function_call.name:
 print("Recommended Function call:")
 print(response_message.function_call.name)
 print()

 # Call the function.
 function_name = response_message.function_call.name

 available_functions = {
         "search_courses": search_courses,
 }
 function_to_call = available_functions[function_name]

 function_args = json.loads(response_message.function_call.arguments)
 function_response = function_to_call(**function_args)

 print("Output of function call:")
 print(function_response)
 print(type(function_response))


 # Add the assistant response and function response to the messages
 messages.append( # adding assistant response to messages
     {
         "role": response_message.role,
         "function_call": {
             "name": function_name,
             "arguments": response_message.function_call.arguments,
         },
         "content": None
     }
 )
 messages.append( # adding function response to messages
     {
         "role": "function",
         "name": function_name,
         "content":function_response,
     }
 )

These three lines, ensure we extract the function name, the arguments and make the call:

function_to_call = available_functions[function_name]

function_args = json.loads(response_message.function_call.arguments)
function_response = function_to_call(**function_args)

Below is the output from running our code:

Output

{
  "name": "search_courses",
  "arguments": "{\n  \"role\": \"student\",\n  \"product\": \"Azure\",\n  \"level\": \"beginner\"\n}"
}

Output of function call:
[{'title': 'Describe concepts of cryptography', 'url': 'https://learn.microsoft.com/training/modules/describe-concepts-of-cryptography/?
WT.mc_id=api_CatalogApi'}, {'title': 'Introduction to audio classification with TensorFlow', 'url': 'https://learn.microsoft.com/en-
us/training/modules/intro-audio-classification-tensorflow/?WT.mc_id=api_CatalogApi'}, {'title': 'Design a Performant Data Model in Azure SQL
Database with Azure Data Studio', 'url': 'https://learn.microsoft.com/training/modules/design-a-data-model-with-ads/?
WT.mc_id=api_CatalogApi'}, {'title': 'Getting started with the Microsoft Cloud Adoption Framework for Azure', 'url':
'https://learn.microsoft.com/training/modules/cloud-adoption-framework-getting-started/?WT.mc_id=api_CatalogApi'}, {'title': 'Set up the
Rust development environment', 'url': 'https://learn.microsoft.com/training/modules/rust-set-up-environment/?WT.mc_id=api_CatalogApi'}]
<class 'str'>

Now we will send the updated message, messages to the LLM so we can receive a natural language response instead of an API JSON formatted response.

print("Messages in next request:")
print(messages)
print()

second_response = client.chat.completions.create(
   messages=messages,
   model=deployment,
   function_call="auto",
   functions=functions,
   temperature=0
      )  # get a new response from GPT where it can see the function response


print(second_response.choices[0].message)

Output

{
  "role": "assistant",
  "content": "I found some good courses for beginner students to learn Azure:\n\n1. [Describe concepts of cryptography] (https://learn.microsoft.com/training/modules/describe-concepts-of-cryptography/?WT.mc_id=api_CatalogApi)\n2. [Introduction to audio classification with TensorFlow](https://learn.microsoft.com/training/modules/intro-audio-classification-tensorflow/?WT.mc_id=api_CatalogApi)\n3. [Design a Performant Data Model in Azure SQL Database with Azure Data Studio](https://learn.microsoft.com/training/modules/design-a-data-model-with-ads/?WT.mc_id=api_CatalogApi)\n4. [Getting started with the Microsoft Cloud Adoption Framework for Azure](https://learn.microsoft.com/training/modules/cloud-adoption-framework-getting-started/?WT.mc_id=api_CatalogApi)\n5. [Set up the Rust development environment](https://learn.microsoft.com/training/modules/rust-set-up-environment/?WT.mc_id=api_CatalogApi)\n\nYou can click on the links to access the courses."
}

Assignment

To continue your learning of Azure OpenAI Function Calling you can build:

More parameters of the function that might help learners find more courses.
Create another function call that takes more information from the learner like their native language
Create error handling when the function call and/or API call does not return any suitable courses

Hint: Follow the Learn API reference documentation page to see how and where this data is available.

Great Work! Continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 12, where we will look at how to design UX for AI applications!

Designing UX for AI Applications

User experience is a very important aspect of building apps. Users need to be able to use your app in an efficient way to perform tasks. Being efficient is one thing but you also need to design apps so that they can be used by everyone, to make them accessible. This chapter will focus on this area so you hopefully end up designing an app that people can and want to use.

Introduction

User experience is how a user interacts with and uses a specific product or service be it a system, tool, or design. When developing AI applications, developers not only focus on ensuring the user experience is effective but also ethical. In this lesson, we cover how to build Artificial Intelligence (AI) applications that address user needs.

The lesson will cover the following areas:

Introduction to User Experience and Understanding User Needs
Designing AI Applications for Trust and Transparency
Designing AI Applications for Collaboration and Feedback

Learning goals

After taking this lesson, you'll be able to:

Understand how to build AI applications that meet the user needs.
Design AI applications that promote trust and collaboration.

Prerequisite

Take some time and read more about user experience and design thinking.

Introduction to User Experience and Understanding User Needs

In our fictitious education startup, we have two primary users, teachers and students. Each of the two users has unique needs. A user-centered design prioritizes the user ensuring the products are relevant and beneficial for those it is intended for.

The application should be useful, reliable, accessible and pleasant to provide a good user experience.

Usability

Being useful means that the application has functionality that matches its intended purpose, such as automating the grading process or generating flashcards for revision. An application that automates the grading process should be able to accurately and efficiently assign scores to students' work based on predefined criteria. Similarly, an application that generates revision flashcards should be able to create relevant and diverse questions based on its data.

Reliability

Being reliable means that the application can perform its task consistently and without errors. However, AI just like humans is not perfect and may be prone to errors. The applications may encounter errors or unexpected situations that require human intervention or correction. How do you handle errors? In the last section of this lesson, we will cover how AI systems and applications are designed for collaboration and feedback.

Accessibility

Being accessible means extending the user experience to users with various abilities, including those with disabilities, ensuring no one is left out. By following accessibility guidelines and principles, AI solutions become more inclusive, usable, and beneficial for all users.

Pleasant

Being pleasant means that the application is enjoyable to use. An appealing user experience can have a positive impact on the user encouraging them to return to the application and increasing business revenue.

Not every challenge can be solved with AI. AI comes in to augment your user experience, be it automating manual tasks, or personalizing user experiences.

Designing AI Applications for Trust and Transparency

Building trust is critical when designing AI applications. Trust ensures a user is confident that the application will get the work done, deliver results consistently and the results are what the user needs. A risk in this area is mistrust and overtrust. Mistrust occurs when a user has little or no trust in an AI system, this leads to the user rejecting your application. Overtrust occurs when a user overestimates the capability of an AI system, leading to users trusting the AI system too much. For example, an automated grading system in the case of overtrust might lead the teacher not to proof through some of the papers to ensure the grading system works well. This could result in unfair or inaccurate grades for the students, or missed opportunities for feedback and improvement.

Two ways to ensure that trust is put right at the center of design are explainability and control.

Explainability

When AI helps inform decisions such as imparting knowledge to future generations, it is critical for teachers and parents to understand how AI decisions are made. This is explainability - understanding how AI applications make decisions. Designing for explainability includes adding details that highlight how AI arrived at the output. Audience must be aware that the output is generated by AI and not a human. For example, instead of saying "Start chatting with your tutor now" say "Use AI tutor that adapts to your needs and helps you learn at your pace."

Another example is how AI uses user and personal data. For example, a user with the persona student may have limitations based on their persona. The AI may not be able to reveal answers to questions but may help guide the user to think through how they can solve a problem.

One last key part of explainability is the simplification of explanations. Students and teachers may not be AI experts, therefore explanations of what the application can or cannot do should be simplified and easy to understand.

Control

Generative AI creates a collaboration between AI and the user, where for instance a user can modify prompts for different results. Additionally, once an output is generated, users should be able to modify the results giving them a sense of control. For example, when using Bing, you can tailor your prompt based on format, tone and length. Additionally, you can add changes to your output and modify the output as shown below:

Another feature in Bing that allows a user to have control over the application is the ability to opt in and opt out of the data AI uses. For a school application, a student might want to use their notes as well as the teachers' resources as revision material.

When designing AI applications, intentionality is key in ensuring users do not overtrust setting unrealistic expectations of its capabilities. One way to do this is by creating friction between the prompts and the results. Reminding the user, that this is AI and not a fellow human being

Designing AI Applications for Collaboration and Feedback

As earlier mentioned, generative AI creates a collaboration between the user and AI. Most engagements are with a user inputting a prompt and the AI generating an output. What if the output is incorrect? How does the application handle errors if they occur? Does the AI blame the user or take time to explain the error?

AI applications should be built in to receive and give feedback. This not only helps the AI system improve but also builds trust with the users. A feedback loop should be included in the design, an example can be a simple thumbs up or down on the output.

Another way to handle this is to clearly communicate the capabilities and limitations of the system. When a user makes an error requesting something beyond the AI capabilities, there should also be a way to handle this, as shown below.

System errors are common with applications where the user might need assistance with information outside the scope of the AI or the application may have a limit on how many questions/subjects a user can generate summaries. For example, an AI application trained with data on limited subjects for example, History and Math may not be able to handle questions around Geography. To mitigate this, the AI system can give a response like: "Sorry, our product has been trained with data in the following subjects....., I cannot be able to respond to the question you asked."

AI applications are not perfect, therefore, they are bound to make mistakes. When designing your applications, you should ensure you create room for feedback from users and error handling in a way that is simple and easily explainable.

Assignment

Take any AI apps you've built so far, consider implementing the below steps in your app:

Pleasant: Consider how you can make your app more pleasant. Are you adding explanations everywhere? Are you encouraging the user to explore? How are you wording your error messages?
Usability: Building a web app. Make sure your app is navigable by both mouse and keyboard.
Trust and transparency: Don't trust the AI completely and its output, consider how you would add a human to the process to verify the output. Also, consider and implement other ways to achieve trust and transparency.
Control: Give the user control of the data they provide to the application. Implement a way a user can opt-in and opt-out of data collection in the AI application.

Continue Your Learning!

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 13, where we will look at how to securing AI applications!

Securing Your Generative AI Applications

Introduction

This lesson will cover:

Security within the context of AI systems.
Common risks and threats to AI systems.
Methods and considerations for securing AI systems.

Learning Goals

After completing this lesson, you will have an understanding of:

The threats and risks to AI systems.
Common methods and practices for securing AI systems.
How implementing security testing can prevent unexpected results and erosion of user trust.

What does security mean within the context of generative AI?

As Artificial Intelligence (AI) and Machine Learning (ML) technologies increasingly shape our lives, it’s crucial to protect not only customer data but also the AI systems themselves. AI/ML is increasingly used in support of high-value decision-making processes in industries where the wrong decision may result in serious consequences.

Here are key points to consider:

Impact of AI/ML: AI/ML have significant impacts on daily life and as such safeguarding them has become essential.
Security Challenges: This impact that AI/ML has needs proper attention in order to address the need to protect AI-based products from sophisticated attacks, whether by trolls or organized groups.
Strategic Problems: The tech industry must proactively address strategic challenges to ensure long-term customer safety and data security.

Additionally, Machine Learning models are largely unable to discern between malicious input and benign anomalous data. A significant source of training data is derived from uncurated, unmoderated, public datasets, which are open to 3rd-party contributions. Attackers don’t need to compromise datasets when they're free to contribute to them. Over time, low-confidence malicious data becomes high-confidence trusted data, if the data structure/formatting remains correct.

This is why it is critical to ensure the integrity and protection of the data stores your models use to make decisions with.

Understanding the threats and risks of AI

In terms of AI and related systems, data poisoning stands out as the most significant security threat today. Data poisoning is when someone intentionally changes the information used to train an AI, causing it to make mistakes. This is due to the absence of standardized detection and mitigation methods, coupled with our reliance on untrusted or uncurated public datasets for training. To maintain data integrity and prevent a flawed training process, it is crucial to track the origin and lineage of your data. Otherwise, the old adage “garbage in, garbage out” holds true, leading to compromised model performance.

Here are examples of how data poisoning can affect your models:

Label Flipping: In a binary classification task, an adversary intentionally flips the labels of a small subset of training data. For instance, benign samples are labeled as malicious, leading the model to learn incorrect associations.
Example: A spam filter misclassifying legitimate emails as spam due to manipulated labels.
Feature Poisoning: An attacker subtly modifies features in the training data to introduce bias or mislead the model.
Example: Adding irrelevant keywords to product descriptions to manipulate recommendation systems.
Data Injection: Injecting malicious data into the training set to influence the model’s behavior.
Example: Introducing fake user reviews to skew sentiment analysis results.
Backdoor Attacks: An adversary inserts a hidden pattern (backdoor) into the training data. The model learns to recognize this pattern and behaves maliciously when triggered.
Example: A face recognition system trained with backdoored images that misidentifies a specific person.

The MITRE Corporation has created ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems), a knowledgebase of tactics and techniques employed by adversaries in real-world attacks on AI systems.

There are a growing number of vulnerabilities in AI-enabled systems, as the incorporation of AI increases the attack surface of existing systems beyond those of traditional cyber-attacks. We developed ATLAS to raise awareness of these unique and evolving vulnerabilities, as the global community increasingly incorporates AI into various systems. ATLAS is modeled after the MITRE ATT&CK® framework and its tactics, techniques, and procedures (TTPs) are complementary to those in ATT&CK.

Much like the MITRE ATT&CK® framework, which is extensively used in traditional cybersecurity for planning advanced threat emulation scenarios, ATLAS provides an easily searchable set TTPs that can help to better understand and prepare for defending against emerging attacks.

Additionally, the Open Web Application Security Project (OWASP) has created a "Top 10 list" of the most critical vulnerabilities found in applications utilizing LLMs. The list highlights the risks of threats such as the aforementioned data poisoning along with others such as:

Prompt Injection: a technique where attackers manipulate a Large Language Model (LLM) through carefully crafted inputs, causing it to behave outside of its intended behavior.
Supply Chain Vulnerabilities: The components and software that make up the applications used by an LLM, such as Python modules or external datasets, can themselves be compromised leading to unexpected results, introduced biases and even vulnerabilities in the underlying infrastructure.
Overreliance: LLMs are fallible and have been prone to hallucinate, providing inaccurate or unsafe results. In several documented circumstances, people have taken the results at face value leading to unintended real-world negative consequences.

Microsoft Cloud Advocate Rod Trent has written a free ebook, Must Learn AI Security, that dives deeply into these and other emerging AI threats and provides extensive guidance on how to best tackle these scenarios.

Security Testing for AI Systems and LLMs

Artificial intelligence (AI) is transforming various domains and industries, offering new possibilities and benefits for society. However, AI also poses significant challenges and risks, such as data privacy, bias, lack of explainability, and potential misuse. Therefore, it is crucial to ensure that AI systems are secure and responsible, meaning that they adhere to ethical and legal standards and can be trusted by users and stakeholders.

Security testing is the process of evaluating the security of an AI system or LLM, by identifying and exploiting their vulnerabilities. This can be performed by developers, users, or third-party auditors, depending on the purpose and scope of the testing. Some of the most common security testing methods for AI systems and LLMs are:

Data sanitization: This is the process of removing or anonymizing sensitive or private information from the training data or the input of an AI system or LLM. Data sanitization can help prevent data leakage and malicious manipulation by reducing the exposure of confidential or personal data.
Adversarial testing: This is the process of generating and applying adversarial examples to the input or output of an AI system or LLM to evaluate its robustness and resilience against adversarial attacks. Adversarial testing can help identify and mitigate the vulnerabilities and weaknesses of an AI system or LLM that may be exploited by attackers.
Model verification: This is the process of verifying the correctness and completeness of the model parameters or architecture of an AI system or LLM. Model verification can help detect and prevent model stealing by ensuring that the model is protected and authenticated.
Output validation: This is the process of validating the quality and reliability of the output of an AI system or LLM. Output validation can help detect and correct malicious manipulation by ensuring that the output is consistent and accurate.

OpenAI, a leader in AI systems, has setup a series of safety evaluations as part of their red teaming network initiative, aimed at testing the output AI systems in the hopes of contributing to AI safety.

Evaluations can range from simple Q&A tests to more-complex simulations. As concrete examples, here are sample evaluations developed by OpenAI for evaluating AI behaviors from a number of angles:

Persuasion

MakeMeSay: How well can an AI system trick another AI system into saying a secret word?
MakeMePay: How well can an AI system convince another AI system to donate money?
Ballot Proposal: How well can an AI system influence another AI system’s support of a political proposition?

Steganography (hidden messaging)

Steganography: How well can an AI system pass secret messages without being caught by another AI system?
Text Compression: How well can an AI system compress and decompress messages, to enable hiding secret messages?
Schelling Point: How well can an AI system coordinate with another AI system, without direct communication?

AI Security

It's imperative that we aim to protect AI systems from malicious attacks, misuse, or unintended consequences. This includes taking steps to ensure the safety, reliability, and trustworthiness of AI systems, such as:

Securing the data and algorithms that are used to train and run AI models
Preventing unauthorized access, manipulation, or sabotage of AI systems
Detecting and mitigating bias, discrimination, or ethical issues in AI systems
Ensuring the accountability, transparency, and explainability of AI decisions and actions
Aligning the goals and values of AI systems with those of humans and society

AI security is important for ensuring the integrity, availability, and confidentiality of AI systems and data. Some of the challenges and opportunities of AI security are:

Opportunity: Incorporating AI in cybersecurity strategies since it can play a crucial role in identifying threats and improving response times. AI can help automate and augment the detection and mitigation of cyberattacks, such as phishing, malware, or ransomware.
Challenge: AI can also be used by adversaries to launch sophisticated attacks, such as generating fake or misleading content, impersonating users, or exploiting vulnerabilities in AI systems. Therefore, AI developers have a unique responsibility to design systems that are robust and resilient against misuse.

Data Protection

LLMs can pose risks to the privacy and security of the data that they use. For example, LLMs can potentially memorize and leak sensitive information from their training data, such as personal names, addresses, passwords, or credit card numbers. They can also be manipulated or attacked by malicious actors who want to exploit their vulnerabilities or biases. Therefore, it is important to be aware of these risks and take appropriate measures to protect the data used with LLMs. There are several steps that you can take to protect the data that is used with LLMs. These steps include:

Limiting the amount and type of data that they share with LLMs: Only share the data that is necessary and relevant for the intended purposes, and avoid sharing any data that is sensitive, confidential, or personal. Users should also anonymize or encrypt the data that they share with LLMs, such as by removing or masking any identifying information, or using secure communication channels.
Verifying the data that LLMs generate: Always check the accuracy and quality of the output generated by LLMs to ensure they don't contain any unwanted or inappropriate information.
Reporting and alerting any data breaches or incidents: Be vigilant of any suspicious or abnormal activities or behaviors from LLMs, such as generating texts that are irrelevant, inaccurate, offensive, or harmful. This could be an indication of a data breach or security incident.

Data security, governance, and compliance are critical for any organization that wants to leverage the power of data and AI in a multi-cloud environment. Securing and governing all your data is a complex and multifaceted undertaking. You need to secure and govern different types of data (structured, unstructured, and data generated by AI) in different locations across multiple clouds, and you need to account for existing and future data security, governance, and AI regulations. To protect your data, you need to adopt some best practices and precautions, such as:

Use cloud services or platforms that offer data protection and privacy features.
Use data quality and validation tools to check your data for errors, inconsistencies, or anomalies.
Use data governance and ethics frameworks to ensure your data is used in a responsible and transparent manner.

Emulating real-world threats - AI red teaming

Emulating real-world threats is now considered a standard practice in building resilient AI systems by employing similar tools, tactics, procedures to identify the risks to systems and test the response of defenders.

The practice of AI red teaming has evolved to take on a more expanded meaning: it not only covers probing for security vulnerabilities, but also includes probing for other system failures, such as the generation of potentially harmful content. AI systems come with new risks, and red teaming is core to understanding those novel risks, such as prompt injection and producing ungrounded content. - Microsoft AI Red Team building future of safer AI

Below are key insights that have shaped Microsoft’s AI Red Team program.

Expansive Scope of AI Red Teaming: AI red teaming now encompasses both security and Responsible AI (RAI) outcomes. Traditionally, red teaming focused on security aspects, treating the model as a vector (e.g., stealing the underlying model). However, AI systems introduce novel security vulnerabilities (e.g., prompt injection, poisoning), necessitating special attention. Beyond security, AI red teaming also probes fairness issues (e.g., stereotyping) and harmful content (e.g., glorification of violence). Early identification of these issues allows prioritization of defense investments.
Malicious and Benign Failures: AI red teaming considers failures from both malicious and benign perspectives. For example, when red teaming the new Bing, we explore not only how malicious adversaries can subvert the system but also how regular users may encounter problematic or harmful content. Unlike traditional security red teaming, which focuses mainly on malicious actors, AI red teaming accounts for a broader range of personas and potential failures.
Dynamic Nature of AI Systems: AI applications constantly evolve. In large language model applications, developers adapt to changing requirements. Continuous red teaming ensures ongoing vigilance and adaptation to evolving risks.

AI red teaming is not all encompassing and should be considered a complementary motion to additional controls such as role-based access control (RBAC) and comprehensive data management solutions. It's meant to supplement a security strategy that focuses on employing safe and responsible AI solutions that account for privacy and security while aspiring to minimize biases, harmful content and misinformation that can erode user confidence.

Here's a list of additional reading that can help you better understand how red teaming can help identify and mitigate risks in your AI systems:

Planning red teaming for large language models (LLMs) and their applications
What is the OpenAI Red Teaming Network?
AI Red Teaming - A Key Practice for Building Safer and More Responsible AI Solutions
MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems), a knowledgebase of tactics and techniques employed by adversaries in real-world attacks on AI systems.

Knowledge check

What could be a good approach to maintaining data integrity and preventing misuse?

Have strong role-based controls for data access and data management
Implement and audit data labeling to prevent data misrepresentation or misuse
Ensure your AI infrastructure supports content filtering

A:1, While all three are great recommendations, ensuring that you're assigning the proper data access privileges to users will go a long way to preventing manipulation and misrepresentation of the data used by LLMs.

🚀 Challenge

Read up more on how you can govern and protect sensitive information in the age of AI.

Great Work, Continue Your Learning

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Head over to Lesson 14 where we will look at the Generative AI Application Lifecycle!

The Generative AI Application Lifecycle

An important question for all AI applications is the relevance of AI features, as AI is a fast evolving field, to ensure that your application remains relevant, reliable, and robust, you need to monitor, evaluate, and improve it continuously. This is where the generative AI lifecycle comes in.

The generative AI lifecycle is a framework that guides you through the stages of developing, deploying, and maintaining a generative AI application. It helps you to define your goals, measure your performance, identify your challenges, and implement your solutions. It also helps you to align your application with the ethical and legal standards of your domain and your stakeholders. By following the generative AI lifecycle, you can ensure that your application is always delivering value and satisfying your users.

Introduction

In this chapter, you will:

Understand the Paradigm Shift from MLOps to LLMOps
The LLM Lifecycle
Lifecycle Tooling
Lifecycle Metrification and Evaluation

Understand the Paradigm Shift from MLOps to LLMOps

LLMs are a new tool in the Artificial Intelligence arsenal, they are incredibly powerful in analysis and generation tasks for applications, however this power has some consequences in how we streamline AI and Classic Machine Learning tasks.

With this, we need a new Paradigm to adapt this tool in a dynamic, with the correct incentives. We can categorize older AI apps as "ML Apps" and newer AI Apps as "GenAI Apps" or just "AI Apps", reflecting the mainstream technology and techniques used at the time. This shifts our narrative in multiple ways, look at the following comparison.

Notice that in LLMOps, we are more focused on the App Developers, using integrations as a key point, using "Models-as-a-Service" and thinking in the following points for metrics.

Quality: Response quality
Harm: Responsible AI
Honesty: Response groundedness (Makes sense? It is correct?)
Cost: Solution Budget
Latency: Avg. time for token response

The LLM Lifecycle

First, to understand the lifecycle and the modifications, let's note the next infographic.

As you may note, this is different from the usual Lifecycles from MLOps. LLMs have many new requirements, as Prompting, different techniques to improve quality (Fine-Tuning, RAG, Meta-Prompts), different assessment and responsibility with responsible AI, lastly, new evaluation metrics (Quality, Harm, Honesty, Cost and Latency).

For instance, take a look at how we ideate. Using prompt engineering to experiment with various LLMs to explore possibilities to test if their Hypothesis could be correct.

Note that this is not linear, but integrated loops, iterative and with an overarching cycle.

How could we explore those steps? Let's step into detail in how could we build a lifecycle.

This may look a bit complicated, lets focus on the three big steps first.

Ideating/Exploring: Exploration, here we can explore according to our business needs. Prototyping, creating a PromptFlow and test if is efficient enough for our Hypothesis.
Building/Augmenting: Implementation, now, we start to evaluate for bigger datasets implement techniques, like Fine-tuning and RAG, to check the robustness of our solution. If it does not, re-implementing it, adding new steps in our flow or restructuring the data, might help. After testing our flow and our scale, if it works and check our Metrics, it is ready for the next step.
Operationalizing: Integration, now adding Monitoring and Alerts Systems to our system, deployment and application integration to our Application.

Then, we have the overarching cycle of Management, focusing on security, compliance and governance.

Congratulations, now you have your AI App ready to go and operational. For a hands on experience, take a look on the Contoso Chat Demo.

Now, what tools could we use?

Lifecycle Tooling

For Tooling, Microsoft provides the Azure AI Platform and PromptFlow facilitate and make your cycle easy to implement and ready to go.

The Azure AI Platform, allows you to use AI Studio. AI Studio is a web portal allows you to Explore models, samples and tools. Managing your resources, UI development flows and SDK/CLI options for Code-First development.

Azure AI, allows you to use multiple resources, to manage your operations, services, projects, vector search and databases needs.

Construct, from Proof-of-Concept(POC) until large scale applications with PromptFlow:

Design and Build apps from VS Code, with visual and functional tools
Test and fine-tune your apps for quality AI, with ease.
Use Azure AI Studio to Integrate and Iterate with cloud, Push and Deploy for quick integration.

Great! Continue your Learning!

Amazing, now learn more about how we structure an application to use the concepts with the Contoso Chat App, to check how Cloud Advocacy adds those concepts in demonstrations. For more content, check our Ignite breakout session!

Now, check Lesson 15, to understand how Retrieval Augmented Generation and Vector Databases impact Generative AI and to make more engaging Applications!

Retrieval Augmented Generation (RAG) and Vector Databases

In the search applications lesson, we briefly learned how to integrate your own data into Large Language Models (LLMs). In this lesson, we will delve further into the concepts of grounding your data in your LLM application, the mechanics of the process and the methods for storing data, including both embeddings and text.

Video Coming Soon

Introduction

In this lesson we will cover the following:

An introduction to RAG, what it is and why it is used in AI (artificial intelligence).
Understanding what vector databases are and creating one for our application.
A practical example on how to integrate RAG into an application.

Learning Goals

After completing this lesson, you will be able to:

Explain the significance of RAG in data retrieval and processing.
Setup RAG application and ground your data to an LLM
Effective integration of RAG and Vector Databases in LLM Applications.

Our Scenario: enhancing our LLMs with our own data

For this lesson, we want to add our own notes into the education startup, which allows the chatbot to get more information on the different subjects. Using the notes that we have, learners will be able to study better and understand the different topics, making it easier to revise for their examinations. To create our scenario, we will use:

Azure OpenAI: the LLM we will use to create our chatbot
AI for beginners' lesson on Neural Networks: this will be the data we ground our LLM on
Azure AI Search and Azure Cosmos DB: vector database to store our data and create a search index

Users will be able to create practice quizzes from their notes, revision flash cards and summarize it to concise overviews. To get started, let us look at what is RAG and how works:

Retrieval Augmented Generation (RAG)

An LLM powered chatbot processes user prompts to generate responses. It is designed to be interactive and engages with users on a wide array of topics. However, its responses are limited to the context provided and its foundational training data. For instance, GPT-4 knowledge cutoff is September 2021, meaning, it lacks knowledge of events that have occurred after this period. In addition, the data used to train LLMs excludes confidential information such as personal notes or a company's product manual.

How RAGs (Retrieval Augmented Generation) work

Suppose you want to deploy a chatbot that creates quizzes from your notes, you will require a connection to the knowledge base. This is where RAG comes to the rescue. RAGs operate as follows:

Knowledge base: Before retrieval, these documents need to be ingested and preprocessed, typically breaking down large documents into smaller chunks, transforming them to text embedding and storing them in a database.
User Query: the user asks a question
Retrieval: When a user asks a question, the embedding model retrieves relevant information from our knowledge base to provide more context that will be incorporated into the prompt.
Augmented Generation: the LLM enhances its response based on the data retrieved. It allows the response generated to be not only based on pre-trained data but also relevant information from the added context. The retrieved data is used to augment the LLM's responses. The LLM then returns an answer to the user's question.

The architecture for RAGs is implemented using transformers consisting of two parts: an encoder and a decoder. For example, when a user asks a question, the input text 'encoded' into vectors capturing the meaning of words and the vectors are 'decoded' into our document index and generates new text based on the user query. The LLM uses both an encoder-decoder model to generate the output.

Two approaches when implementing RAG according to the proposed paper: Retrieval-Augmented Generation for Knowledge intensive NLP (natural language processing software) Tasks are:

RAG-Sequence using retrieved documents to predict the best possible answer to a user query
RAG-Token using documents to generate the next token, then retrieve them to answer the user's query

Why would you use RAGs?

Information richness: ensures text responses are up to date and current. It, therefore, enhances performance on domain specific tasks by accessing the internal knowledge base.
Reduces fabrication by utilizing verifiable data in the knowledge base to provide context to the user queries.
It is cost effective as they are more economical compared to fine-tuning an LLM

Creating a knowledge base

Our application is based on our personal data i.e., the Neural Network lesson on AI For Beginners curriculum.

Vector Databases

A vector database, unlike traditional databases, is a specialized database designed to store, manage and search embedded vectors. It stores numerical representations of documents. Breaking down data to numerical embeddings makes it easier for our AI system to understand and process the data.

We store our embeddings in vector databases as LLMs have a limit of the number of tokens they accept as input. As you cannot pass the entire embeddings to an LLM, we will need to break them down into chunks and when a user asks a question, the embeddings most like the question will be returned together with the prompt. Chunking also reduces costs on the number of tokens passed through an LLM.

Some popular vector databases include Azure Cosmos DB, Clarifyai, Pinecone, Chromadb, ScaNN, Qdrant and DeepLake. You can create an Azure Cosmos DB model using Azure CLI with the following command:

az login
az group create -n <resource-group-name> -l <location>
az cosmosdb create -n <cosmos-db-name> -r <resource-group-name>
az cosmosdb list-keys -n <cosmos-db-name> -g <resource-group-name>

From text to embeddings

Before we store our data, we will need to convert it to vector embeddings before it is stored in the database. If you are working with large documents or long texts, you can chunk them based on queries you expect. Chunking can be done at sentence level, or at a paragraph level. As chunking derives meanings from the words around them, you can add some other context to a chunk, for example, by adding the document title or including some text before or after the chunk. You can chunk the data as follows:

def split_text(text, max_length, min_length):
    words = text.split()
    chunks = []
    current_chunk = []

    for word in words:
        current_chunk.append(word)
        if len(' '.join(current_chunk)) < max_length and len(' '.join(current_chunk)) > min_length:
            chunks.append(' '.join(current_chunk))
            current_chunk = []

    # If the last chunk didn't reach the minimum length, add it anyway
    if current_chunk:
        chunks.append(' '.join(current_chunk))

    return chunks

Once chunked, we can then embed our text using different embedding models. Some models you can use include: word2vec, ada-002 by OpenAI, Azure Computer Vision and many more. Selecting a model to use will depend on the languages you're using, the type of content encoded (text/images/audio), the size of input it can encode and length of the embedding output.

An example of embedded text using OpenAI's text-embedding-ada-002 model is:

Retrieval and Vector Search

When a user asks a question, the retriever transforms it into a vector using the query encoder, it then searches through our document search index for relevant vectors in the document that are related to the input. Once done, it converts both the input vector and document vectors into text and passes it through the LLM.

Retrieval

Retrieval happens when the system tries to quickly find the documents from the index that satisfy the search criteria. The goal of the retriever is to get documents that will be used to provide context and ground the LLM on your data.

There are several ways to perform search within our database such as:

Keyword search - used for text searches
Vector search - converts documents from text to vector representations using embedding models, permitting a semantic search using the meaning of words. Retrieval will be done by querying the documents whose vector representations are closest to the user question.
Hybrid - a combination of both keyword and vector search.

A challenge with retrieval comes in when there is no similar response to the query in the database, the system will then return the best information they can get, however, you can use tactics like set up the maximum distance for relevance or use hybrid search that combines both keywords and vector search. In this lesson we will use hybrid search, a combination of both vector and keyword search. We will store our data into a dataframe with columns containing the chunks as well as embeddings.

Vector Similarity

The retriever will search through the knowledge database for embeddings that are close together, the closest neighbour, as they are texts that are similar. In the scenario a user asks a query, it is first embedded then matched with similar embeddings. The common measurement that is used to find how similar different vectors are is cosine similarity which is based on the angle between two vectors.

We can measure similarity using other alternatives we can use are Euclidean distance which is the straight line between vector endpoints and dot product which measures the sum of the products of corresponding elements of two vectors.

Search index

When doing retrieval, we will need to build a search index for our knowledge base before we perform search. An index will store our embeddings and can quickly retrieve the most similar chunks even in a large database. We can create our index locally using:

from sklearn.neighbors import NearestNeighbors

embeddings = flattened_df['embeddings'].to_list()

# Create the search index
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(embeddings)

# To query the index, you can use the kneighbors method
distances, indices = nbrs.kneighbors(embeddings)

Re-ranking

Once you have queried the database, you might need to sort the results from the most relevant. A reranking LLM utilizes Machine Learning to improve the relevance of search results by ordering them from the most relevant. Using Azure AI Search, reranking is done automatically for you using a semantic reranker. An example of how reranking works using nearest neighbours:

# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])

index = []
# Print the most similar documents
for i in range(3):
    index = indices[0][i]
    for index in indices[0]:
        print(flattened_df['chunks'].iloc[index])
        print(flattened_df['path'].iloc[index])
        print(flattened_df['distances'].iloc[index])
    else:
        print(f"Index {index} not found in DataFrame")

Bringing it all together

The last step is adding our LLM into the mix to be able to get responses that are grounded on our data. We can implement it as follows:

user_input = "what is a perceptron?"

def chatbot(user_input):
    # Convert the question to a query vector
    query_vector = create_embeddings(user_input)

    # Find the most similar documents
    distances, indices = nbrs.kneighbors([query_vector])

    # add documents to query  to provide context
    history = []
    for index in indices[0]:
        history.append(flattened_df['chunks'].iloc[index])

    # combine the history and the user input
    history.append(user_input)

    # create a message object
    messages=[
        {"role": "system", "content": "You are an AI assistant that helps with AI questions."},
        {"role": "user", "content": "\n\n".join(history) }
    ]

    # use chat completion to generate a response
    response = openai.chat.completions.create(
        model="gpt-4",
        temperature=0.7,
        max_tokens=800,
        messages=messages
    )

    return response.choices[0].message

chatbot(user_input)

Evaluating our application

Evaluation Metrics

Quality of responses supplied ensuring it sounds natural, fluent and human-like
Groundedness of the data: evaluating whether the response that came from supplied docs
Relevance: evaluating the response matches and is related to the question asked
Fluency - whether the response makes sense grammatically

Use Cases for using RAG (Retrieval Augmented Generation) and vector databases

There are many different use cases where function calls can improve your app like:

Question and Answering: grounding your company data to a chat that can be used by employees to ask questions.
Recommendation Systems: where you can create a system that matches the most similar values e.g. movies, restaurants and many more.
Chatbot services: you can store chat history and personalize the conversation based on the user data.
Image search based on vector embeddings, useful when doing image recognition and anomaly detection.

Summary

We have covered the fundamental areas of RAG from adding our data to the application, the user query and output. To simplify creation of RAG, you can use frameworks such as Semanti Kernel, Langchain or Autogen.

Assignment

To continue your learning of Retrieval Augmented Generation (RAG) you can build:

Build a front-end for the application using the framework of your choice
Utilize a framework, either LangChain or Semantic Kernel, and recreate your application.

Congratulations for completing the lesson 👏.

Learning does not stop here, continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Introduction

The world of open-source LLMs is exciting and constantly evolving. This lesson aims to provide an in-depth look at open source models. If you are looking for information on how proprietary models compare to open source models, go to the "Exploring and Comparing Different LLMs" lesson. This lesson will also cover the topic of fine-tuning but a more detailed explanation can be found in the "Fine-Tuning LLMs" lesson.

Learning goals

Gain an understanding of open source Models
Understanding the benefits of working with open source Models
Exploring the open models available on Hugging Face and the Azure AI Studio

What are Open Source Models?

Open source software has played a crucial role in the growth of technology across various fields. The Open Source Initiative (OSI) has defined 10 criteria for software to be classified as open source. The source code must be openly shared under a license approved by the OSI.

While the development of LLMs has similar elements to developing software, the process is not exactly the same. This has brought much discussion in the community on the definition of open source in the context of LLMs. For a model to be aligned with the traditional definition of open source the following information should be publicly available:

Datasets used to train the model.
Full model weights as a part of the training.
The evaluation code.
The fine-tuning code.
Full model weights and training metrics.

For this lesson, we will refer to the models as "open models" going forward as they may not match the criteria above at the time of writing.

Benefits of Open Models

Highly Customizable - Since open models are released with detailed training information, researchers and developers can modify the model's internals. This enables the creation of highly specialized models that are fine-tuned for a specific task or area of study. Some examples of this are code generation, mathematical operations and biology.

Cost - The cost per token for using and deploying these models is lower than that of proprietary models. When building Generative AI applications, looking at performance vs price when working with these models on your use case should be done.

Source: Artificial Analysis

Flexibility - Working with open models enables you to be flexible in terms of using different models or combining them. An example of this is the HuggingChat Assistants where a user can select the model being used directly in the user interface:

Exploring Different Open Models

Llama 2

Mistral

Falcon

Falcon is an LLM created by the Technology Innovation Institute (TII). The Falcon-40B was trained on 40 billion parameters which has been shown to perform better than GPT-3 with less compute budget. This is due to its use of the FlashAttention algorithm and multiquery attention that enables it to cut down on the memory requirements at inference time. With this reduced inference time, the Falcon-40B is suitable for chat applications.

How to Choose

There is no one answer for choosing an open model. A good place to start is by using the Azure AI Studio's filter by task feature. This will help you understand what types of tasks the model has been trained for. Hugging Face also maintains an LLM Leaderboard which shows you the best performing models based on certain metrics.

When looking to compare LLMs across the different types, Artificial Analysis is another great resource:

Source: Artificial Analysis

If working on a specific use case, searching for fine-tuned versions that are focused on the same area can be effective. Experimenting with multiple open models to see how they perform according to your and your users' expectations is another good practice

Next Steps

The best part about open models is that you can get started working with them pretty quickly. Check out the Azure AI Foundry Model Catalog, which features a specific Hugging Face collection with these models we discussed here.

Learning does not stop here, continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Introduction

AI Agents represent an exciting development in Generative AI, enabling Large Language Models (LLMs) to evolve from assistants into agents capable of taking actions. AI Agent frameworks enable developers to create applications that give LLMs access to tools and state management. These frameworks also enhance visibility, allowing users and developers to monitor the actions planned by LLMs, thereby improving experience management.

The lesson will cover the following areas:

Understanding what an AI Agent is - What exactly is an AI Agent?
Exploring four different AI Agent Frameworks - What makes them unique?
Applying these AI Agents to different use cases - When should we use AI Agents?

Learning goals

After taking this lesson, you'll be able to:

Explain what AI Agents are and how they can be used.
Have an understanding of the differences between some of the popular AI Agent Frameworks, and how they differ.
Understand how AI Agents function in order to build applications with them.

What Are AI Agents?

AI Agents are a very exciting field in the world of Generative AI. With this excitement comes sometimes a confusion of terms and their application. To keep things simple and inclusive of most of the tools that refer to AI Agents, we are going to use this definition:

AI Agents allow Large Language Models (LLMs) to perform tasks by giving them access to a state and tools.

Let's define these terms:

Large Language Models - These are the models referred throughout this course such as GPT-3.5, GPT-4, Llama-2, etc.

State - This refers to the context that the LLM is working in. The LLM uses the context of its past actions and the current context, guiding its decision-making for subsequent actions. AI Agent Frameworks allow developers to maintain this context easier.

Tools - To complete the task that the user has requested and that the LLM has planned out, the LLM needs access to tools. Some examples of tools can be a database, an API, an external application or even another LLM!

These definitions will hopefully give you a good grounding going forward as we look at how they are implemented. Let's explore a few different AI Agent frameworks:

LangChain Agents

LangChain Agents is an implementation of the definitions we provided above.

To manage the state , it uses a built-in function called the AgentExecutor. This accepts the defined agent and the tools that are available to it.

The Agent Executor also stores the chat history to provide the context of the chat.

LangChain offers a catalog of tools that can be imported into your application in which the LLM can get access to. These are made by the community and by the LangChain team.

You can then define these tools and pass them to the Agent Executor.

Visibility is another important aspect when talking about AI Agents. It is important for application developers to understand which tool the LLM is using and why.. For that, the team at LangChain have developed LangSmith.

AutoGen

The next AI Agent framework we will discuss is AutoGen. The main focus of AutoGen is conversations. Agents are both conversable and customizable.

Conversable - LLMs can start and continue a conversation with another LLM in order to complete a task. This is done by creating AssistantAgents and giving them a specific system message.


autogen.AssistantAgent( name="Coder", llm_config=llm_config, ) pm = autogen.AssistantAgent( name="Product_manager", system_message="Creative in software product ideas.", llm_config=llm_config, )

Customizable - Agents can be defined not only as LLMs but be a user or a tool. As a developer, you can define a UserProxyAgent which is responsible for interacting with the user for feedback in completing a task. This feedback can either continue the execution of the task or stop it.

user_proxy = UserProxyAgent(name="user_proxy")

State and Tools

To change and manage state, an assistant Agent generates Python code to complete the task.

Here is an example of the process:

LLM Defined with a System Message

system_message="For weather related tasks, only use the functions you have been provided with. Reply TERMINATE when the task is done."

This system messages directs this specific LLM to which functions are relevant for its task. Remember, with AutoGen you can have multiple defined AssistantAgents with different system messages.

Chat is Initiated by User

user_proxy.initiate_chat( chatbot, message="I am planning a trip to NYC next week, can you help me pick out what to wear? ", )

This message from the user_proxy (Human) is what will start the process of the Agent to explore the possible functions that it should execute.

Function is Executed

chatbot (to user_proxy):

***** Suggested tool Call: get_weather ***** Arguments: {"location":"New York City, NY","time_periond:"7","temperature_unit":"Celsius"} ******************************************************** --------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION get_weather... user_proxy (to chatbot): ***** Response from calling function "get_weather" ***** 112.22727272727272 EUR ****************************************************************

Once the initial chat is processed, the Agent will send the suggest tool to call. In this case, it is a function called get_weather. Depending on your configuration, this function can be automatically executed and read by the Agent or can be executed based on user input.

You can find a list of AutoGen code samples to further explore how to get started building.

Taskweaver

The next agent framework we will explore is Taskweaver. It is known as a "code-first" agent because instead of working strictly with strings , it can work with DataFrames in Python. This becomes extremely useful for data analysis and generation tasks. This can be things like creating graphs and charts or generating random numbers.

State and Tools

To manage the state of the conversation, TaskWeaver uses the concept of a Planner. The Planner is a LLM that takes the request from the users and maps out the tasks that need to be completed to fulfill this request.

To complete the tasks the Planner is exposed to the collection of tools called Plugins. This can be Python classes or a general code interpreter. This plugins are stored as embeddings so that the LLM can better search for the correct plugin.

Here is an example of a plugin to handle anomaly detection:

class AnomalyDetectionPlugin(Plugin): def __call__(self, df: pd.DataFrame, time_col_name: str, value_col_name: str):

The code is verified before executing. Another feature to manage context in Taskweaver is experience. Experience allows for the context of a conversation to be stored over to the long term in a YAML file. This can be configured so that the LLM improves over time on certain tasks given that it is exposed to prior conversations.

JARVIS

The last agent framework we will explore is JARVIS. What makes JARVIS unique is that it uses an LLM to manage the state of the conversation and the toolsare other AI models. Each of the AI models are specialized models that perform certain tasks such as object detection, transcription or image captioning.

The LLM, being a general purpose model, receives the request from the user and identifies the specific task and any arguments/data that is needed to complete the task.

[{"task": "object-detection", "id": 0, "dep": [-1], "args": {"image": "e1.jpg" }}]

The LLM then formats the request in a manner that the specialized AI model can interpret, such as JSON. Once the AI model has returned its prediction based on the task, the LLM receives the response.

If multiple models are required to complete the task, it will also interpret the response from those models before bringing them together to generate to the response to the user.

The example below shows how this would work when a user is requesting a description and count of the objects in a picture:

Assignment

To continue your learning of AI Agents you can build with AutoGen:

An application that simulates a business meeting with different departments of an education startup.
Create system messages that guide LLMs in understanding different personas and priorities, and enable the user to pitch a new product idea.
The LLM should then generate follow-up questions from each department to refine and improve the pitch and the product idea

Learning does not stop here, continue the Journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Fine-Tuning Your LLM

Using large language models to build generative AI applications comes with new challenges. A key issue is ensuring response quality (accuracy and relevance) in content generated by the model for a given user request. In previous lessons, we discussed techniques like prompt engineering and retrieval-augmented generation that try to solve the problem by modifying the prompt input to the existing model.

In today's lesson, we discuss a third technique, fine-tuning, which tries to address the challenge by retraining the model itself with additional data. Let's dive into the details.

Learning Objectives

This lesson introduces the concept of fine-tuning for pre-trained language models, explores the benefits and challenges of this approach, and provides guidance on when and how to use fine tuning to improve the performance of your generative AI models.

By the end of this lesson, you should be able to answer the following questions:

What is fine tuning for language models?
When, and why, is fine tuning useful?
How can I fine-tune a pre-trained model?
What are the limitations of fine-tuning?

Ready? Let's get started.

Illustrated Guide

Want to get the big picture of what we'll cover before we dive in? Check out this illustrated guide that describes the learning journey for this lesson - from learning the core concepts and motivation for fine-tuning, to understanding the process and best practices for executing the fine-tuning task. This is a fascinating topic for exploration, so don't forget to check out the Resources page for additional links to support your self-guided learning journey!

Illustrated Guide to Fine Tuning Language Models

What is fine-tuning for language models?

By definition, large language models are pre-trained on large quantities of text sourced from diverse sources including the internet. As we've learned in previous lessons, we need techniques like prompt engineering and retrieval-augmented generation to improve the quality of the model's responses to the user's questions ("prompts").

A popular prompt-engineering technique involves giving the model more guidance on what is expected in the response either by providing instructions (explicit guidance) or giving it a few examples (implicit guidance). This is referred to as few-shot learning but it has two limitations:

Model token limits can restrict the number of examples you can give, and limit the effectiveness.
Model token costs can make it expensive to add examples to every prompt, and limit flexibility.

Fine-tuning is a common practice in machine learning systems where we take a pre-trained model and retrain it with new data to improve its performance on a specific task. In the context of language models, we can fine-tune the pre-trained model with a curated set of examples for a given task or application domain to create a custom model that may be more accurate and relevant for that specific task or domain. A side-benefit of fine-tuning is that it can also reduce the number of examples needed for few-shot learning - reducing token usage and related costs.

When and why should we fine-tune models?

In this context, when we talk about fine-tuning, we are referring to supervised fine-tuning where the retraining is done by adding new data that was not part of the original training dataset. This is different from an unsupervised fine-tuning approach where the model is retrained on the original data, but with different hyperparameters.

The key thing to remember is that fine-tuning is an advanced technique that requires a certain level of expertise to get the desired results. If done incorrectly, it may not provide the expected improvements, and may even degrade the performance of the model for your targeted domain.

So, before you learn "how" to fine-tune language models, you need to know "why" you should take this route, and "when" to start the process of fine-tuning. Start by asking yourself these questions:

Use Case: What is your use case for fine-tuning? What aspect of the current pre-trained model do you want to improve upon?
Alternatives: Have you tried other techniques to achieve the desired outcomes? Use them to create a baseline for comparison.
- Prompt engineering: Try techniques like few-shot prompting with examples of relevant prompt responses. Evaluate the quality of responses.
- Retrieval Augmented Generation: Try augmenting prompts with query results retrieved by searching your data. Evaluate the quality of responses.
Costs: Have you identified the costs for fine-tuning?
- Tunability - is the pre-trained model available for fine-tuning?
- Effort - for preparing training data, evaluating & refining model.
- Compute - for running fine-tuning jobs, and deploying fine-tuned model
- Data - access to sufficient quality examples for fine-tuning impact
Benefits: Have you confirmed the benefits for fine-tuning?
- Quality - did fine-tuned model outperform the baseline?
- Cost - does it reduce token usage by simplifying prompts?
- Extensibility - can you repurpose the base model for new domains?

By answering these questions, you should be able to decide if fine-tuning is the right approach for your use case. Ideally, the approach is valid only if the benefits outweigh the costs. Once you decide to proceed, it's time to think about how you can fine tune the pre-trained model.

How can we fine-tune a pre-trained model?

To fine-tune a pre-trained model, you need to have:

a pre-trained model to fine-tune
a dataset to use for fine-tuning
a training environment to run the fine-tuning job
a hosting environment to deploy fine-tuned model

Fine-Tuning In Action

The following resources provide step-by-step tutorials to walk you through a real example using a selected model with a curated dataset. To work through these tutorials, you need an account on the specific provider, along with access to the relevant model and datasets.

Provider	Tutorial	Description
OpenAI	How to fine-tune chat models	Learn to fine-tune a `gpt-35-turbo` for a specific domain ("recipe assistant") by preparing training data, running the fine-tuning job, and using the fine-tuned model for inference.
Azure OpenAI	GPT 3.5 Turbo fine-tuning tutorial	Learn to fine-tune a `gpt-35-turbo-0613` model on Azure by taking steps to create & upload training data, run the fine-tuning job. Deploy & use the new model.
Hugging Face	Fine-tuning LLMs with Hugging Face	This blog post walks you fine-tuning an open LLM (ex: `CodeLlama 7B`) using the transformers library & Transformer Reinforcement Learning (TRL) with open datasets on Hugging Face.

🤗 AutoTrain	Fine-tuning LLMs with AutoTrain	AutoTrain (or AutoTrain Advanced) is a python library developed by Hugging Face that allows finetuning for many different tasks including LLM finetuning. AutoTrain is a no-code solution and finetuning can be done in your own cloud, on Hugging Face Spaces or locally. It supports both a web-based GUI, CLI and training via yaml config files.

🦥 Unsloth	Fine-tuning LLMs with Unsloth	Unsloth is an open-source framework that supports LLM fine-tuning and reinforcement learning (RL). Unsloth streamlines local training, evaluation, and deployment with ready to use notebooks. It also supports text-to-speech (TTS), BERT and multimodal models. To get started, read their step-by-step Fine-tuning LLMs Guide.

Assignment

Select one of the tutorials above and walk through them. We may replicate a version of these tutorials in Jupyter Notebooks in this repo for reference only. Please use the original sources directly to get the latest versions.

Great Work! Continue Your Learning.

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Congratulations!! You have completed the final lesson from the v2 series for this course! Don't stop learning and building. **Check out the RESOURCES page for a list of additional suggestions for just this topic.

Our v1 series of lessons has also been updated with more assignments and concepts. So take a minute to refresh your knowledge - and please share your questions and feedback to help us improve these lessons for the community.

Introduction to Small Language Models for Generative AI for Beginners

Generative AI is a fascinating field of artificial intelligence that focuses on creating systems capable of generating new content. This content can range from text and images to music and even entire virtual environments. One of the most exciting applications of generative AI is in the realm of language models.

What Are Small Language Models?

A Small Language Model (SLM) represents a scaled-down variant of a large language model (LLM), leveraging many of the architectural principles and techniques of LLMs, while exhibiting a significantly reduced computational footprint.

SLMs are a subset of language models designed to generate human-like text. Unlike their larger counterparts, such as GPT-4, SLMs are more compact and efficient, making them ideal for applications where computational resources are limited. Despite their smaller size, they can still perform a variety of tasks. Typically, SLMs are constructed by compressing or distilling LLMs, aiming to retain a substantial portion of the original model's functionality and linguistic capabilities. This reduction in model size decreases the overall complexity, making SLMs more efficient in terms of both memory usage and computational requirements. Despite these optimizations, SLMs can still perform a wide range of natural language processing (NLP) tasks:

Text Generation: Creating coherent and contextually relevant sentences or paragraphs.
Text Completion: Predicting and completing sentences based on a given prompt.
Translation: Converting text from one language to another.
Summarization: Condensing long pieces of text into shorter, more digestible summaries.

Albeit with some trade-offs in performance or depth of understanding compared to their larger counterparts.

How Do Small Language Models Work?

SLMs are trained on vast amounts of text data. During training, they learn the patterns and structures of language, enabling them to generate text that is both grammatically correct and contextually appropriate. The training process involves:

Data Collection: Gathering large datasets of text from various sources.
Preprocessing: Cleaning and organizing the data to make it suitable for training.
Training: Using machine learning algorithms to teach the model how to understand and generate text.
Fine-Tuning: Adjusting the model to improve its performance on specific tasks.

The development of SLMs aligns with the increasing need for models that can be deployed in resource-constrained environments, such as mobile devices or edge computing platforms, where full-scale LLMs may be impractical due to their heavy resource demands. By focusing on efficiency, SLMs balance performance with accessibility, enabling broader application across various domains.

slm

Learning Objectives

In this lesson, we hope to introduce the knowledge of SLM and combine it with Microsoft Phi-3 to learn different scenarios in text content, vision and MoE.

By the end of this lesson, you should be able to answer the following questions:

What is SLM?
What is the difference between SLM and LLM?
What is the Microsoft Phi-3/3.5 Family?
How to run inference with the Microsoft Phi-3/3.5 Family?

Ready? Let's get started.

The Distinctions between Large Language Models (LLMs) and Small Language Models (SLMs)

Both LLMs and SLMs are built upon foundational principles of probabilistic machine learning, following similar approaches in their architectural design, training methodologies, data generation processes, and model evaluation techniques. However, several key factors differentiate these two types of models.

Applications of Small Language Models

SLMs have a wide range of applications, including:

Chatbots: Providing customer support and engaging with users in a conversational manner.
Content Creation: Assisting writers by generating ideas or even drafting entire articles.
Education: Helping students with writing assignments or learning new languages.
Accessibility: Creating tools for individuals with disabilities, such as text-to-speech systems.

Size

A primary distinction between LLMs and SLMs lies in the scale of the models. LLMs, such as ChatGPT (GPT-4), can comprise an estimated 1.76 trillion parameters, while open-source SLMs like Mistral 7B are designed with significantly fewer parameters—approximately 7 billion. This disparity is primarily due to differences in model architecture and training processes. For instance, ChatGPT employs a self-attention mechanism within an encoder-decoder framework, whereas Mistral 7B uses sliding window attention, which enables more efficient training within a decoder-only model. This architectural variance has profound implications for the complexity and performance of these models.

Comprehension

SLMs are typically optimized for performance within specific domains, making them highly specialized but potentially limited in their ability to provide broad contextual understanding across multiple fields of knowledge. In contrast, LLMs aim to simulate human-like intelligence on a more comprehensive level. Trained on vast, diverse datasets, LLMs are designed to perform well across a variety of domains, offering greater versatility and adaptability. Consequently, LLMs are more suitable for a wider range of downstream tasks, such as natural language processing and programming.

Computing

The training and deployment of LLMs are resource-intensive processes, often requiring significant computational infrastructure, including large-scale GPU clusters. For example, training a model like ChatGPT from scratch may necessitate thousands of GPUs over extended periods. In contrast, SLMs, with their smaller parameter counts, are more accessible in terms of computational resources. Models like Mistral 7B can be trained and run on local machines equipped with moderate GPU capabilities, although training still demands several hours across multiple GPUs.

Bias

Bias is a known issue in LLMs, primarily due to the nature of the training data. These models often rely on raw, openly available data from the internet, which may underrepresent or misrepresent certain groups, introduce erroneous labeling, or reflect linguistic biases influenced by dialect, geographic variations, and grammatical rules. Additionally, the complexity of LLM architectures can inadvertently exacerbate bias, which may go unnoticed without careful fine-tuning. On the other hand, SLMs, being trained on more constrained, domain-specific datasets, are inherently less susceptible to such biases, though they are not immune to them.

Inference

The reduced size of SLMs affords them a significant advantage in terms of inference speed, allowing them to generate outputs efficiently on local hardware without the need for extensive parallel processing. In contrast, LLMs, due to their size and complexity, often require substantial parallel computational resources to achieve acceptable inference times. The presence of multiple concurrent users further slows down LLMs' response times, especially when deployed at scale.

In summary, while both LLMs and SLMs share a foundational basis in machine learning, they differ significantly in terms of model size, resource requirements, contextual understanding, susceptibility to bias, and inference speed. These distinctions reflect their respective suitability for different use cases, with LLMs being more versatile but resource-heavy, and SLMs offering more domain-specific efficiency with reduced computational demands.

Note: In this lesson, we will introduce SLM using Microsoft Phi-3 / 3.5 as an example.

Introduce Phi-3 / Phi-3.5 Family

Phi-3 / 3.5 Family mainly targets text, vision, and Agent (MoE) application scenarios:

Phi-3 / 3.5 Instruct

Mainly for text generation, chat completion, and content information extraction, etc.

Phi-3-mini

The 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama. Phi-3 models significantly outperform language models of equal and larger sizes on key benchmarks (see benchmark numbers below, higher numbers are better). Phi-3-mini outperforms models twice its size, while Phi-3-small and Phi-3-medium outperform larger models, including GPT-3.5.

Phi-3-small & medium

With just 7B parameters, Phi-3-small beats GPT-3.5T on a variety of language, reasoning, coding, and math benchmarks.

The Phi-3-medium with 14B parameters continues this trend and outperforms the Gemini 1.0 Pro.

Phi-3.5-mini

We can think of it as an upgrade of Phi-3-mini. While the parameters remain unchanged, it improves the ability to support multiple languages (support 20+ languages: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian) and adds stronger support for long context.

Phi-3.5-mini with 3.8B parameters outperforms language models of the same size and is on par with models twice its size.

Phi-3 / 3.5 Vision

We can think of the Instruct model of Phi-3/3.5 as Phi's ability to understand, and Vision is what gives Phi eyes to understand the world.

Phi-3-Vision

Phi-3-vision, with only 4.2B parameters, continues this trend and outperforms larger models such as Claude-3 Haiku and Gemini 1.0 Pro V on general visual reasoning tasks, OCR, and table and diagram comprehension tasks.

Phi-3.5-Vision

Phi-3.5-Vision is also an upgrade of Phi-3-Vision, adding support for multiple images. You can think of it as an improvement in vision, not only can you see pictures, but also videos.

Phi-3.5-vision outperforms larger models such as Claude-3.5 Sonnet and Gemini 1.5 Flash across OCR, table and chart understanding tasks and on par on general visual knowledge reasoning tasks. Support multi-frame input, i.e., perform reasoning on multiple input images

Phi-3.5-MoE

Mixture of Experts(MoE) enables models to be pretrained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model. In particular, a MoE model should achieve the same quality as its dense counterpart much faster during pretraining.

Phi-3.5-MoE comprises 16x3.8B expert modules. Phi-3.5-MoE with only 6.6B active parameters achieves a similar level of reasoning, language understanding, and math as much larger models

We can use the Phi-3/3.5 Family model based on different scenarios. Unlike LLM, you can deploy Phi-3/3.5-mini or Phi-3/3.5-Vision on edge devices.

How to use Phi-3/3.5 Family models

We hope to use Phi-3/3.5 in different scenarios. Next, we will use Phi-3/3.5 based on different scenarios.

phi3

Inference via Cloud APIs

GitHub Models

GitHub Models is the most direct way. You can quickly access the Phi-3/3.5-Instruct model through GitHub Models. Combined with the Azure AI Inference SDK / OpenAI SDK, you can access the API through code to complete the Phi-3/3.5-Instruct call. You can also test different effects through Playground.

Demo: Comparison of the effects of Phi-3-mini and Phi-3.5-mini in Chinese scenarios

phi3

phi35

Azure AI Studio

Or if we want to use the vision and MoE models, you can use Azure AI Studio to complete the call. If you are interested, you can read the Phi-3 Cookbook to learn how to call Phi-3/3.5 Instruct, Vision, MoE through Azure AI Studio Click this link

NVIDIA NIM

In addition to the cloud-based Model Catalog solutions provided by Azure and GitHub, you can also use NVIDIA NIM to complete related calls. You can visit NVIDIA NIM to complete the API calls of the Phi-3/3.5 Family. NVIDIA NIM (NVIDIA Inference Microservices) is a set of accelerated inference microservices designed to help developers deploy AI models efficiently across various environments, including clouds, data centers, and workstations.

Here are some key features of NVIDIA NIM:

Ease of Deployment: NIM allows for the deployment of AI models with a single command, making it straightforward to integrate into existing workflows.
Optimized Performance: It leverages NVIDIA’s pre-optimized inference engines, such as TensorRT and TensorRT-LLM, to ensure low latency and high throughput.
Scalability: NIM supports autoscaling on Kubernetes, enabling it to handle varying workloads effectively.
Security and Control: Organizations can maintain control over their data and applications by self-hosting NIM microservices on their own managed infrastructure.
Standard APIs: NIM provides industry-standard APIs, making it easy to build and integrate AI applications like chatbots, AI assistants, and more.

NIM is part of NVIDIA AI Enterprise, which aims to simplify the deployment and operationalization of AI models, ensuring they run efficiently on NVIDIA GPUs.

Demo: Using NVIDIA NIM to call Phi-3.5-Vision-API [Click this link]

Running Phi-3/3.5 Locally

Inference in relation to Phi-3, or any language model like GPT-3, refers to the process of generating responses or predictions based on the input it receives. When you provide a prompt or question to Phi-3, it uses its trained neural network to infer the most likely and relevant response by analyzing patterns and relationships in the data it was trained on.

Hugging Face Transformer Hugging Face Transformers is a powerful library designed for natural language processing (NLP) and other machine learning tasks. Here are some key points about it:

Pretrained Models: It provides thousands of pretrained models that can be used for various tasks such as text classification, named entity recognition, question answering, summarization, translation, and text generation.
Framework Interoperability: The library supports multiple deep learning frameworks, including PyTorch, TensorFlow, and JAX. This allows you to train a model in one framework and use it in another.
Multimodal Capabilities: Besides NLP, Hugging Face Transformers also supports tasks in computer vision (e.g., image classification, object detection) and audio processing (e.g., speech recognition, audio classification).
Ease of Use: The library offers APIs and tools to easily download and fine-tune models, making it accessible for both beginners and experts.
Community and Resources: Hugging Face has a vibrant community and extensive documentation, tutorials, and guides to help users get started and make the most of the library. official documentation or their GitHub repository.

This is the most commonly used method, but it also requires GPU acceleration. After all, scenarios such as Vision and MoE require a lot of calculations, which will be very slow on CPU if they are not quantized.

Demo: Using Transformer to call Phi-3.5-Instruct Click this link
Demo: Using Transformer to call Phi-3.5-Vision Click this link
Demo: Using Transformer to call Phi-3.5-MoE Click this link

Ollama Ollama is a platform designed to make it easier to run large language models (LLMs) locally on your machine. It supports various models like Llama 3.1, Phi 3, Mistral, and Gemma 2, among others. The platform simplifies the process by bundling model weights, configuration, and data into a single package, making it more accessible for users to customize and create their own models. Ollama is available for macOS, Linux, and Windows. It’s a great tool if you’re looking to experiment with or deploy LLMs without relying on cloud services. Ollama is the most direct way, you just need to execute the following command.


ollama run phi3.5

ONNX Runtime for GenAI

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime for Generative AI (GENAI) is a powerful tool that helps you run generative AI models efficiently across various platforms.

What is ONNX Runtime?

ONNX Runtime is an open-source project that enables high-performance inference of machine learning models. It supports models in the Open Neural Network Exchange (ONNX) format, which is a standard for representing machine learning models.ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms.

What is Generative AI?

Generative AI refers to AI systems that can generate new content, such as text, images, or music, based on the data they have been trained on. Examples include language models like GPT-3 and image generation models like Stable Diffusion. ONNX Runtime for GenAI library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.

ONNX Runtime for GENAI

ONNX Runtime for GENAI extends the capabilities of ONNX Runtime to support generative AI models. Here are some key features:

Broad Platform Support: It works on various platforms, including Windows, Linux, macOS, Android, and iOS.
Model Support: It supports many popular generative AI models, such as LLaMA, GPT-Neo, BLOOM, and more.
Performance Optimization: It includes optimizations for different hardware accelerators like NVIDIA GPUs, AMD GPUs, and more2.
Ease of Use: It provides APIs for easy integration into applications, allowing you to generate text, images, and other content with minimal code
Users can call a high level generate() method, or run each iteration of the model in a loop, generating one token at a time, and optionally updating generation parameters inside the loop.
ONNX runtime also has support for greedy/beam search and TopP, TopK sampling to generate token sequences and built-in logits processing like repetition penalties. You can also easily add custom scoring.

We have learned the basics of Phi-3/3.5 Family, but to learn more about SLM we need more knowledge. You can find the answers in the Phi-3 Cookbook. If you want to learn more, please visit the Phi-3 Cookbook.

Building with Mistral Models

Introduction

This lesson will cover:

Exploring the different Mistral Models
Understanding the use-cases and scenarios for each model
Exploring code samples that show the unique features of each model.

The Mistral Models

In this lesson, we will explore 3 different Mistral models: Mistral Large, Mistral Small and Mistral Nemo.

Each of these models is available free on the GitHub Model marketplace. The code in this notebook will be using these models to run the code. Here are more details on using GitHub Models to prototype with AI models.

Mistral Large 2 (2407)

Mistral Large 2 is currently the flagship model from Mistral and is designed for enterprise use.

The model is an upgrade to the original Mistral Large by offering

Larger Context Window - 128k vs 32k
Better performance on Math and Coding Tasks - 76.9% average accuracy vs 60.4%
Increased multilingual performance - languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.

With these features, Mistral Large excels at

Retrieval Augmented Generation (RAG) - due to the larger context window
Function Calling - this model has native function calling which allows integration with external tools and APIs. These calls can be made both in parallel or one after another in a sequential order.
Code Generation - this model excels on Python, Java, TypeScript and C++ generation.

RAG Example using Mistral Large 2

In this example, we are using Mistral Large 2 to run a RAG pattern over a text document. The question is written in Korean and asks about the author's activities before college.

It uses Cohere Embeddings Model to create embeddings of the text document as well as the question. For this sample, it uses the faiss Python package as a vector store.

The prompt sent to the Mistral model includes both the questions and the retrieved chunks that are similar to the question. The Model then provides a natural language response.

pip install faiss-cpu

import requests
import numpy as np
import faiss
import os

from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference import EmbeddingsClient

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
len(chunks)

embed_model_name = "cohere-embed-v3-multilingual" 

embed_client = EmbeddingsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(token)
)

embed_response = embed_client.embed(
    input=chunks,
    model=embed_model_name
)



text_embeddings = []
for item in embed_response.data:
    length = len(item.embedding)
    text_embeddings.append(item.embedding)
text_embeddings = np.array(text_embeddings)


d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요?"

question_embedding = embed_client.embed(
    input=[question],
    model=embed_model_name
)

question_embeddings = np.array(question_embedding.data[0].embedding)


D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index
retrieved_chunks = [chunks[i] for i in I.tolist()[0]]

prompt = f"""
Context information is below.
---------------------
{retrieved_chunks}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""


chat_response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content=prompt),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(chat_response.choices[0].message.content)

Mistral Small

Mistral Small is another model in the Mistral family of models under the premier/enterprise category. As the name implies, this model is a Small Language Model (SLM). The advantages of using Mistral Small are that it is:

Cost Saving compared to Mistral LLMs like Mistral Large and NeMo - 80% price drop
Low latency - faster response compared to Mistral's LLMs
Flexible - can be deployed across different environments with less restrictions on required resources.

Mistral Small is great for:

Text based tasks such as summarization, sentiment analysis and translation.
Applications where frequent requests are made due to its cost effectiveness
Low latency code tasks like review and code suggestions

Comparing Mistral Small and Mistral Large

To show differences in latency between Mistral Small and Large, run the below cells.

You should see a difference in response times between 3-5 seconds. Also note the response lengths and style over the same prompt.


import os 
endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-small"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)


import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "Mistral-large"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful coding assistant."),
        UserMessage(content="Can you write a Python function to the fizz buzz test?"),
    ],
    temperature=1.0,
    top_p=1.0,
    max_tokens=1000,
    model=model_name
)

print(response.choices[0].message.content)

Mistral NeMo

Compared to the other two models discussed in this lesson, Mistral NeMo is the only free model with an Apache2 License.

It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral 7B.

Some other features of the NeMo model are:

More efficient tokenization: This model uses the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
Finetuning: The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed.
Native Function Calling - Like Mistral Large, this model has been trained on function calling. This makes it unique as being one of the first open source models to do so.

Comparing Tokenizers

In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large.

Both samples take the same prompt but you should see that NeMo returns fewer tokens than Mistral Large.

pip install mistral-common

# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "open-mistral-nemo"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the user's location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "mistral-large-latest"

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the user's location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

Learning does not stop here, continue the journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

Building With the Meta Family Models

Introduction

This lesson will cover:

Exploring the two main Meta family models - Llama 3.1 and Llama 3.2
Understanding the use-cases and scenarios for each model
Code sample to show the unique features of each model

The Meta Family of Models

In this lesson, we will explore 2 models from the Meta family or "Llama Herd" - Llama 3.1 and Llama 3.2.

These models come in different variants and are available on the GitHub Model marketplace. Here are more details on using GitHub Models to prototype with AI models.

Model Variants:

Llama 3.1 - 70B Instruct
Llama 3.1 - 405B Instruct
Llama 3.2 - 11B Vision Instruct
Llama 3.2 - 90B Vision Instruct

Note: Llama 3 is also available on GitHub Models but won't be covered in this lesson

Llama 3.1

At 405 Billion Parameters, Llama 3.1 fits into the open source LLM category.

The model is an upgrade to the earlier release Llama 3 by offering:

Larger context window - 128k tokens vs 8k tokens
Larger Max Output Tokens - 4096 vs 2048
Better Multilingual Support - due to increase in training tokens

These enable Llama 3.1 to handle more complex use cases when building GenAI applications including:

Native Function Calling - the ability to call external tools and functions outside of the LLM workflow
Better RAG Performance - due to the higher context window
Synthetic Data Generation - the ability to create effective data for tasks such as fine-tuning

Native Function Calling

Llama 3.1 has been fine-tuned to be more effective at making function or tool calls. It also has two built-in tools that the model can identify as needing to be used based on the prompt from the user. These tools are:

Brave Search - Can be used to get up-to-date information like the weather by performing a web search
Wolfram Alpha - Can be used for more complex mathematical calculations so writing your own functions is not required.

You can also create your own custom tools that the LLM can call.

In the code example below:

We define the available tools (brave_search, wolfram_alpha) in the system prompt.
Send a user prompt that asks about the weather in a certain city.
The LLM will respond with a tool call to the Brave Search tool which will look like this <|python_tag|>brave_search.call(query="Stockholm weather")

*Note: This example only makes the tool call, if you would like to get the results, you will need to create a free account on the Brave API page and define the function itself.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
model_name = "meta-llama-3.1-405b-instruct"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)


tool_prompt=f"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|>
"""

messages = [
    SystemMessage(content=tool_prompt),
    UserMessage(content="What is the weather in Stockholm?"),

]

response = client.complete(messages=messages, model=model_name)

print(response.choices[0].message.content)

Llama 3.2

Despite being an LLM, one limitation of Llama 3.1 is its lack of multimodality. That is, the inability to use different types of input such as images as prompts and provide responses. This ability is one of the main features of Llama 3.2. These features also include:

Multimodality - has the ability to evaluate both text and image prompts
Small to Medium size variations (11B and 90B) - this provides flexible deployment options,
Text-only variations (1B and 3B) - this allows the model to be deployed on edge / mobile devices and provides low latency

The multimodal support represents a big step in the world of open source models. The code example below takes both an image and text prompt to get an analysis of the image from Llama 3.2 90B.

Multimodal Support with Llama 3.2

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import (
    SystemMessage,
    UserMessage,
    TextContentItem,
    ImageContentItem,
    ImageUrl,
    ImageDetailLevel,
)
from azure.core.credentials import AzureKeyCredential

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
model_name = "Llama-3.2-90B-Vision-Instruct"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        SystemMessage(
            content="You are a helpful assistant that describes images in details."
        ),
        UserMessage(
            content=[
                TextContentItem(text="What's in this image?"),
                ImageContentItem(
                    image_url=ImageUrl.load(
                        image_file="sample.jpg",
                        image_format="jpg",
                        detail=ImageDetailLevel.LOW)
                ),
            ],
        ),
    ],
    model=model_name,
)

print(response.choices[0].message.content)

Learning does not stop here, continue the journey

After completing this lesson, check out our Generative AI Learning collection to continue leveling up your Generative AI knowledge!

课后巩固

本文知识点配套的闪卡与测验，帮助巩固记忆

闪卡复习

AI 认知基础，共 116 张

什么是提示词工程（Prompt Engineering）？

Few-shot 与 Zero-shot 的区别？

查看全部并练习

知识测验

AI 认知基础，共 112 题

以下哪种提示技术让模型展示推理过程？

Temperature=0 时，模型输出的特点是？

开始测验