llama-cpp-chat-memory

This project is intended as an example and a basic framework for a locally run chatbot with documents. The target user group is developers with some understanding about python and llm framworks. If you want to learn about llm and AI, when you can take a look at my llm resources for beginners or PygWiki. This project is mainly intended to serve as a more fleshed out tutorial and a basic frame to test various things like document embeddings. For this reason, the chatbot itself is intended to be lightweight and simple. You can also use this chatbot to test models and prompts. The document fetching can be disabled by setting collection to "" in the config files. This leaves you with just a basic character chatbot.

Everything is designed to run locally. The model is run with llama.cpp and it's python bindings, the UI is Chainlit, the vector database is Chroma and everythin is glued together with Langchain. Document processing uses Spacy and Sentence Transformers and Playwright. There are no dependencies to external api's. Llama.cpp can use gpu acceleration with Cuda and Blas. See the documentation for llama-cpp-python for documentation.

The chatbot uses character cards as prompts. The supported cards are Tavern and V2. Internal lorebooks are not supported yet. There are several scripts for parsing json lorebooks, pdt, textfiles and scarping web pages for the memory content. Also included are scripts for parsing metadata from documents automatically.