In this post, we’ll look at integrating multiple AI Large Language Models (LLMs) utilizing a vector database to deliver a custom AI chatbot to answer questions about recycling in the local area. The corpus will come from the Winneshiek County Waste Reduction website.
We’ll use the Retrieval-Augmentation Generation technique or RAG for short. RAG allows us to take advantage of powerful generative LLMs like ChatGPT for interpretation and human-like conversation with hyper-attention to our relevant data. We achieve this by sending ChatGPT a question along with relevant data with instructions to answer only from the provided data.
Without RAG, ChatGPT is too general for our specific use case. For example, if we ask if plastic with resin identification number 5 can be recycled, ChatGPT might say yes. While in general #5 plastics can be recycled, in Winneshiek County they cannot. Only #1 and #2 plastics can be recycled in the local area.
There are many other use cases for RAG. LLMs are trained with vast amounts of static data often limited to a specific date. RAG is a method to introduce new and proprietary data in a cost-effective way. It’s important to consider security when using your company’s proprietary data. Consider using Enterprise Level LLM services that offer industry standard security protocols or host your own LLM in your internal networked environment. Both are great options for using proprietary data.
Since the data in our example is public, we’ll use the standard edition of ChatGPT. There’s no limitation to custom integrations. In our example, we’ll add a multimodal image-to-text LLM to the mix so our chatbot can answer questions about items found in images. Let’s have a look.
Met Re-cyclebot. An AI chatbot that answers questions about recycling in Winneshiek County.
Click here to try me!
|
How it works:
If an image is uploaded, it’s sent to a multimodal LLM and the image is converted to text (an image caption). The image caption is sent to ChatGPT with instructions to make a question about recycling from the text.
Given the image, the multimodal LLM returns:
A plastic container with strawberries in it.
ChatGPT returns:How to recycle a plastic container? |
As you can see from the above example, our multimodal LLM isn’t perfect. It identifies strawberries as being present which clearly, they’re not. To increase accuracy, this LLM could be fine-tuned using images with correctly labeled captions specific to our use case. We’ll cover training and fine-tuning in a later post. For this example, the LLMs perform good enough. Plus, we give the user a chance to confirm the image caption is correct. If not, the user updates the description. If it’s correct, the RAG process begins.
If text is entered initially instead of an image, it’s sent to ChatGPT with instructions to classify the text in the following way: Greetings are responded to with a greeting. Comments are converted to a question if applicable or handled as a greeting. Questions begin the RAG process.
Retrieval
Once we have a vetted question, we need to retrieve relevant domain specific information to provide an answer. We’ll use semantic search which is great for AI chatbots or other apps that use natural language. Semantic search leverages contextual meaning, therefore, questions of unlimited variation can be used as a search string to retrieve data of similar context. This is possible through the use of a vector database.
Information, in this case about recycling in Winneshiek County, is collected and grouped together by category. Data about plastics are grouped together, metal, aluminum and so on. These groups of data are converted to embeddings which are vector representations of the contextual meaning. They’re stored in the vector database in a manner allowing relative distance between vectors to be indexed for fast retrieval of nearest neighbors. The distance between embeddings in vector space corresponds to the similarity of their contextual meaning.
As such questions are converted to embeddings on the fly and passed to the vector database. Vector math is performed to retrieve embeddings closest to the query vector thereby retrieving the most relevant data. The original text of the data can be stored with the vector in the vector database or an ID is stored to reference the original text located in another location. Once the relevant data is retrieved, it’s augmented with the question and sent to the generative LLM.
Augmentation
Prompt engineering is used to compose an augmented prompt. You might have to play around with your prompt to get it just right. Additionally, you may want to utilize JSON response format to process results in a consistent manner. A simple augmentation prompt might resemble something like this:
prompt = """
DOCUMENT:\n
insert relevant text from semantic search results here\n\n
QUESTION:\n
insert user question here\n\n
INSTRUCTIONS:\n
Answer the question from information grounded in the document precisely.
If the document does not contain information about the question, respond with N/A.
"""
Generation
The final step is the generation. ChatGPT generates an answer by following the augmentation prompt.
Improvements
If you tried out the chatbot, you probably discovered there lots of room for improvement. One area for improvement might involve regrouping and organizing the data. For this example, I clumped together large chucks of data. The data could be separated into smaller chunks with greater similarity. Additionally, we could add logic for the chatbot to ask more questions about the item to narrow the scope for the response. All of the LLMs could be fine-tuned as mentioned earlier.
Another area for improvement is in the implementation of the semantic search. I used dense vector embeddings. In my next post, I’ll show an example using sparse vector embeddings and we’ll cover the difference between the two.
Other methods for improvement include utilization of user feedback loops. Included in this example were a couple yes/no buttons to capture feedback useful for insight. Checkout the live statistics below.
If you liked this example and would like more information, stay tuned. I’ll be adding more details about the following aspects about this project:
- Data collection and preprocessing
- Setting up the vector database
- Embeddings in vector space explained
Chatbot statistics
Yes | ||
No |
Yes | ||
No |
Comments converted to questions:
Unique items asked about:
Click here to try me!
|