Andrej Baranovskij Blog
Blog about Oracle, Full Stack, Machine Learning and Cloud
Monday, May 13, 2024
Invoice Data Preprocessing for LLM
Data preprocessing is important step for LLM pipeline. I show various approaches to preprocess invoice data, before feeding it to LLM. This is quite challenging step, especially to preprocess tables.
Monday, May 6, 2024
You Don't Need RAG to Extract Invoice Data
Documents like invoices or receipts can be processed by LLM directly, without RAG. I explain how you can do this locally with Ollama and Instructor. Thanks to Instructor, structured output from LLM can be validated with your own Pydantic class.
Monday, April 29, 2024
LLM JSON Output with Instructor RAG and WizardLM-2
With Instructor library you can implement simple RAG without Vector DB or dependencies to other LLM libraries. The key RAG components - good data pre-processing and cleaning, powerful local LLM (such as WizardLM-2, Nous Hermes 2 PRO or Llama3) and Ollama or MLX backend.
Monday, April 22, 2024
Local RAG Explained with Unstructured and LangChain
In this tutorial, I do a code walkthrough and demonstrate how to implement the RAG pipeline using Unstructured, LangChain, and Pydantic for processing invoice data and extracting structured JSON data.
Monday, April 15, 2024
Local LLM RAG with Unstructured and LangChain [Structured JSON]
Using unstructured library to pre-process PDF document content, to be in a cleaner format. This helps LLM to produce more accurate response. JSON response is generated thanks to Nous Hermes 2 PRO LLM. Without any additional post-processing. Using Pydantic dynamic class to validate response to make sure it matches request.
Sunday, March 31, 2024
LlamaIndex Upgrade to 0.10.x Experience
I explain key points you should keep in mind when upgrading to LlamaIndex 0.10.x.
Labels:
LlamaIndex,
LLM,
RAG
Monday, March 25, 2024
LLM Structured Output for Function Calling with Ollama
I explain how function calling works with LLM. This is often confused concept, LLM doesn't call a function - LLM retuns JSON response with values to be used for function call from your environment. In this example I'm using Sparrow agent, to call a function.
Subscribe to:
Posts (Atom)