Hello, my name is Peter, a server-side engineer at Unifa.
Generative AI is one of the hottest topics today, and there are a lot of services that help you to build your generative AI solution. However, as a software engineer, wouldn’t it be fun to build your own so you customize it the best you can? Today, I am going to share a few tools that can help you make your own generative AI, without needing to use ChatGPT or other 3rd party api services.
Before we explain how to build one, let me explain what a large language model.
What is large language model (LLM)
A large language model (LLM) is a type of artificial intelligence (AI) model that is trained on a large corpus of text data to generate language outputs that are coherent and natural-sounding. The goal of an LLM is to be able to understand and generate language as well as a human, and in some cases even surpass human-level performance.
LLM is typically built through a neural network, a machine learning algorithm that can be fine tuned through data. It is a preferable approach because it is able to be trained by a large amount of data in a relatively short time compared to other machine learning algorithms.
You can sort of imagine a LLM as a database, you feed in a lot of input / output data to train it, so when a user sends an input, it should get a related output response.
Keep in mind that the result is only as good as the trained data, so you should choose a suitable model for your questions. Currently most models out there are trained through vast amounts of high quality internet data, to support a more broad based scenario.
Run your local large language model
In previous article in https://tech.unifa-e.com/entry/2023/12/24/090000 we talked about using Hugging Face to setup a local LLM, an alternative and arguably simpler way to introduce LLM is to use Ollama.
Ollama is a tool that allows you to access many of the latest large language models in your local machine. You can see more details in https://github.com/ollama/ollama.
Once you download it, you can run the command “ollama run llama2” to use llama2 model from Facebook!
You can try to prompt the model with questions such as “What is a large language model?” and see what it tells you!
What is generative AI software?
Generative AI software refers to a type of artificial intelligence that can generate new, original content, such as images, videos, music, text, or even entire conversations. These models are trained on large amounts of data and use complex algorithms to learn patterns and relationships within the data, allowing them to create new content that resembles the original data.
Below, we will focus on text generation through langchain.
What is langchain?
Langchain is a popular open source library that provides a variety of tools to help you deal with natural language processing tasks, which is necessary for you to build your customized generative AI software. Instead of doing the boring things yourself, it is certainly nice to have tools do it for you!
Langchain has both nodeJS and Python version, so you can choose what you see fit.
How can I use my own data for my generative AI software?
In our first example, we are utilizing a generic model, however, if you have your own data, the generic model is not going to know! However, you certainly do not want to retrain your model, right? There is a simple trick to help you do that!
Below we will talk about embedding and vector DB.
Use embedding and vector DB to import your data to the LLM
In a natural language processing context, sentences are transformed into vectors to preserve the order of words, so that sentence search using the similarity search can be more accurate. Embedding is the tool to transform sentences into vectors.
Therefore, by introducing your vector DB and importing data into the DB, the LLM will now have a good source to help answer questions.
Real example of LLM utilizing your data
Below is an example I borrowed from langchain tutorial https://python.langchain.com/docs/get_started/quickstart, to show how to import your data from a website into vector DB through embedding, and utilize the LLM to get high quality answer for your question.
// Import custom data from website from langchain_community.document_loaders import WebBaseLoader loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide") docs = loader.load() // Utilize vector DB from langchain_community.vectorstores import FAISS from langchain_text_splitters import RecursiveCharacterTextSplitter // Transform data from web into a format vector DB can store text_splitter = RecursiveCharacterTextSplitter() documents = text_splitter.split_documents(docs) vector = FAISS.from_documents(documents, embeddings) from langchain.chains.combine_documents import create_stuff_documents_chain prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context: <context> {context} </context> Question: {input}""") document_chain = create_stuff_documents_chain(llm, prompt) from langchain.chains import create_retrieval_chain // Use vector DB as the source of content for LLM retriever = vector.as_retriever() retrieval_chain = create_retrieval_chain(retriever, document_chain) response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"}) print(response["answer"])
Conclusion
Hopefully, through this article, it helps you to understand how easy it is to run your own local version LLM, as well as write your generative AI service tailored for your specific use case in no time. Certainly exciting times!