Andrej Baranovskij Blog

Monday, May 13, 2024

Invoice Data Preprocessing for LLM

Data preprocessing is important step for LLM pipeline. I show various approaches to preprocess invoice data, before feeding it to LLM. This is quite challenging step, especially to preprocess tables.

Monday, May 6, 2024

You Don't Need RAG to Extract Invoice Data

Documents like invoices or receipts can be processed by LLM directly, without RAG. I explain how you can do this locally with Ollama and Instructor. Thanks to Instructor, structured output from LLM can be validated with your own Pydantic class.

Monday, April 29, 2024

LLM JSON Output with Instructor RAG and WizardLM-2

With Instructor library you can implement simple RAG without Vector DB or dependencies to other LLM libraries. The key RAG components - good data pre-processing and cleaning, powerful local LLM (such as WizardLM-2, Nous Hermes 2 PRO or Llama3) and Ollama or MLX backend.

Monday, April 22, 2024

Local RAG Explained with Unstructured and LangChain

In this tutorial, I do a code walkthrough and demonstrate how to implement the RAG pipeline using Unstructured, LangChain, and Pydantic for processing invoice data and extracting structured JSON data.

Monday, April 15, 2024

Local LLM RAG with Unstructured and LangChain [Structured JSON]

Using unstructured library to pre-process PDF document content, to be in a cleaner format. This helps LLM to produce more accurate response. JSON response is generated thanks to Nous Hermes 2 PRO LLM. Without any additional post-processing. Using Pydantic dynamic class to validate response to make sure it matches request.

Sunday, March 31, 2024

LlamaIndex Upgrade to 0.10.x Experience

I explain key points you should keep in mind when upgrading to LlamaIndex 0.10.x.

Monday, March 25, 2024

LLM Structured Output for Function Calling with Ollama

I explain how function calling works with LLM. This is often confused concept, LLM doesn't call a function - LLM retuns JSON response with values to be used for function call from your environment. In this example I'm using Sparrow agent, to call a function.