LangChain vs Building Your Own: When Frameworks Help and When They Hurt

Author David Park

Published on: November 10, 2023

LangChain has become the default framework for building LLM applications. It has 65,000+ GitHub stars, extensive documentation, and integration with every LLM provider and vector database on the market. It is also one of the most controversial tools in the AI engineering community, with vocal critics arguing it adds more complexity than it removes. Both the enthusiasts and the critics are partially right.

Article Overview

LangChain vs Building Your Own: When Frameworks Help and …

6 sections · Reading flow

01
What LangChain Actually Provides

→

02
Where LangChain Genuinely Helps

→

03
Where LangChain Creates Problems

→

04
Our Decision Framework

→

05
The Middle Path: Use the Parts, Skip the Framework

→

06
Alternatives Worth Considering

HARBOR SOFTWARE · Engineering Insights

We have used LangChain in production at Harbor Software. We have also ripped it out of production and replaced it with custom code. Both decisions were correct for their respective contexts. Here is a practical, experience-based analysis of when LangChain helps, when it hurts, and how to extract value from the ecosystem without taking on unnecessary complexity.

What LangChain Actually Provides

Strip away the marketing and LangChain provides five categories of functionality. Understanding these categories is important because most teams do not need all five, and adopting the entire framework when you only need one or two categories is the root of most LangChain frustration.

Provider abstraction. A unified interface for calling different LLM providers (OpenAI, Anthropic, Cohere, HuggingFace, Google, etc.). Switch providers by changing an import and a model name string.
Prompt management. Template strings with variable substitution, few-shot example selectors, and prompt versioning utilities. Essentially a prompt templating engine.
Chain composition. A way to compose multiple LLM calls and processing steps into a pipeline using LCEL (LangChain Expression Language), their pipe-based composition syntax.
Memory and state. Conversation history management, summarization of long conversations, sliding window memory, and various memory strategies for chat applications.
Retrieval (RAG). Document loaders (PDF, HTML, Notion, Confluence, etc.), text splitters, vector store integrations (Pinecone, Chroma, Weaviate, FAISS, etc.), and retrieval strategies for building RAG applications.

Each category has a different value proposition and a different cost of adoption. The mistake most teams make is installing LangChain for one category and then gradually using more of it because “it is already there,” eventually becoming deeply coupled to abstractions they did not need.

Where LangChain Genuinely Helps

Rapid Prototyping and Proof of Concepts

LangChain is unbeatable for prototyping. If you need to demo a RAG application to a stakeholder by Friday, LangChain gets you there faster than any alternative. The pre-built document loaders, text splitters, and vector store integrations mean you can go from “I have a folder of PDFs” to “I have a working Q&A chatbot” in about 100 lines of code and 2 hours of work.

from langchain.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Load and split documents - 3 lines
loader = DirectoryLoader('./docs', glob='**/*.pdf', loader_cls=PyPDFLoader)
docs = loader.load()
chunks = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200
).split_documents(docs)

# Create vector store - 1 line
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())

# Create QA chain - 4 lines
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(temperature=0),
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

# Query
result = qa('What is the refund policy?')
print(result['result'])
print('Sources:', [doc.metadata['source'] for doc in result['source_documents']])

This code works. It is not production-ready (no error handling, no caching, no monitoring, hardcoded configuration), but it validates the concept quickly. The equivalent custom code with direct API calls and manual vector store management would be 400-500 lines and a full day of work. For a proof of concept that might be thrown away, LangChain’s speed-to-demo is a genuine advantage.

Multi-Provider Evaluation

If you are evaluating multiple LLM providers against each other (which you should be), LangChain’s provider abstraction is genuinely useful. You write your evaluation harness once and swap providers by changing the model initialization. This is the one LangChain abstraction that we keep even in production applications, because it reduces the engineering effort for the provider comparison we described in our previous post.

Document Loading and Text Splitting

LangChain has loaders for PDFs, Word docs, HTML, Markdown, Notion exports, Confluence spaces, Google Drive, Slack threads, GitHub repos, and dozens of other sources. Building these from scratch is tedious engineering work that does not differentiate your product. The loaders are the most stable and least controversial part of the framework, and they work as standalone utilities without requiring the rest of LangChain.

The RecursiveCharacterTextSplitter is particularly well-implemented. It splits text by attempting separators in order (paragraph breaks, then line breaks, then sentences, then words, then characters), which produces much better chunks than naive character splitting. This is worth using even if you use nothing else from LangChain.

Where LangChain Creates Problems

Abstraction Layers That Obscure Understanding

LangChain’s core problem is that it abstracts away the things you need to understand to debug production issues. When your RAG application returns bad results, you need to diagnose: Was the chunking wrong? Were the embeddings poor? Was the retrieval finding irrelevant documents? Was the prompt constructed incorrectly? Was the LLM hallucinating despite having correct context?

With LangChain, the answer to each of these questions is buried under multiple layers of abstraction. The RetrievalQA chain hides the prompt construction. The retriever hides the similarity search parameters. The text splitter hides the chunking logic. When something goes wrong in production at 3am, you end up reading LangChain’s source code on GitHub to understand what your own application is doing. This is not a theoretical concern; we have spent hours doing exactly this.

Compare this to custom code where every step is explicit and inspectable:

// Custom RAG pipeline - every step visible, logged, and debuggable
async function answerQuestion(question: string): Promise {
  const startTime = Date.now();

  // 1. Embed the question
  const queryEmbedding = await openai.embeddings.create({
    model: 'text-embedding-ada-002',
    input: question
  });
  logger.debug('Embedding generated', { latencyMs: Date.now() - startTime });

  // 2. Retrieve relevant chunks
  const chunks = await vectorDb.query({
    vector: queryEmbedding.data[0].embedding,
    topK: 5,
    includeMetadata: true
  });
  logger.debug('Retrieved chunks', {
    count: chunks.length,
    scores: chunks.map(c => c.score),
    sources: chunks.map(c => c.metadata.source)
  });

  // 3. Build the prompt (fully visible, no hidden templates)
  const context = chunks.map(c => c.text).join('nn---nn');
  const prompt = `Based on the following context, answer the question. If the context does not contain the answer, say "I don't have enough information to answer that."

Context:
${context}

Question: ${question}

Answer:`;
  logger.debug('Prompt assembled', { promptLength: prompt.length, contextChunks: chunks.length });

  // 4. Generate response
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0,
    max_tokens: 500
  });
  logger.debug('Response generated', {
    tokensUsed: response.usage.total_tokens,
    latencyMs: Date.now() - startTime
  });

  return {
    answer: response.choices[0].message.content,
    sources: chunks.map(c => ({ source: c.metadata.source, score: c.score })),
    tokensUsed: response.usage.total_tokens,
    latencyMs: Date.now() - startTime
  };
}

This is more code, but every step is visible, logged, and debuggable. When the answer is wrong, you can inspect the retrieved chunks (are they relevant?), read the exact prompt (does it make sense?), check the scores (was the best document ranked correctly?), and understand exactly what happened. This debuggability is worth more than any abstraction, especially at 3am during a production incident.

LCEL Complexity

LangChain Expression Language (LCEL) is the pipe-based composition syntax for building chains. In theory, it enables elegant pipeline composition. In practice, it creates code that is harder to read, harder to debug, and harder to modify than equivalent procedural code. It introduces its own concepts (Runnables, RunnablePassthrough, RunnableLambda, RunnableParallel) that developers must learn on top of the domain concepts.

# LCEL syntax - requires knowledge of LCEL-specific concepts
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
result = chain.invoke("What is the refund policy?")

# Equivalent procedural code - uses only standard Python
async def answer(question):
    docs = await retriever.get_relevant_documents(question)
    context = format_docs(docs)
    prompt_text = template.format(context=context, question=question)
    response = await llm.invoke(prompt_text)
    return response.content

The procedural version is longer by two lines but immediately understandable to any Python developer. The LCEL version requires knowing what RunnablePassthrough does, how the pipe operator composes runnables, how the dict syntax maps inputs to chain components, and what StrOutputParser does. This is accidental complexity that does not serve the application’s goals. New team members need LangChain training, not just Python training.

API Instability and Breaking Changes

LangChain’s API has changed significantly between versions throughout 2023. Import paths change (langchain.llms became langchain_openai became langchain_community). Class signatures change. Method names change. Code written six months ago often requires substantial refactoring to work with the current version.

For a production application that needs stability and predictability, this is a serious liability. Pinning versions helps but means you miss bug fixes, security patches, and new provider integrations. For a framework that is supposed to reduce engineering effort, the maintenance burden of keeping up with API changes is significant.

Performance Overhead

LangChain adds latency. Each abstraction layer is a function call with its own overhead. The chain composition adds routing logic. The callback system (used for logging and tracing) adds processing time at each step. For a single LLM call, this is negligible (5-10ms). For an agent that makes 10+ tool calls in a loop, the cumulative overhead can add 100-300ms to the total request time. In latency-sensitive applications, this matters.

Our Decision Framework

Based on building and maintaining multiple LLM-powered products, here is when we use LangChain and when we go custom:

Use LangChain when:

You are prototyping or building a proof of concept that may be thrown away
You need to load documents from many different source formats (PDF, HTML, Notion, etc.)
You are evaluating multiple LLM providers and want a unified interface for comparison
Your team is new to LLM development and benefits from the opinionated structure as a learning tool
The application is internal, low-traffic, and does not have strict latency requirements
You need something working by the end of the week

Build custom when:

You are building a production application that must be reliable and maintainable for years
You need fine-grained control over the RAG pipeline (custom chunking, retrieval strategies, reranking)
Your use case does not fit LangChain’s abstractions (most real-world use cases diverge from the standard patterns)
You need stability and cannot tolerate monthly breaking changes in a core dependency
Your team has enough LLM experience to build the 200-300 lines of core code
You need to debug production issues quickly and cannot afford to trace through framework internals

The Middle Path: Use the Parts, Skip the Framework

The best approach for many teams is to use LangChain’s utility components without adopting the framework’s composition model. Cherry-pick the valuable parts and build the rest yourself.

# Use LangChain utilities only, custom code for the critical path
from langchain.document_loaders import PyPDFLoader, UnstructuredWordDocumentLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# === LangChain utilities for commodity tasks ===

def load_documents(paths: list[str]) -> list[dict]:
    """Use LangChain loaders because building PDF parsing is tedious"""
    docs = []
    for path in paths:
        if path.endswith('.pdf'):
            docs.extend(PyPDFLoader(path).load())
        elif path.endswith('.docx'):
            docs.extend(UnstructuredWordDocumentLoader(path).load())
    return docs

def chunk_documents(docs: list, chunk_size: int = 1000) -> list[dict]:
    """LangChain's recursive splitter is genuinely good"""
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=200,
        separators=['nn', 'n', '. ', ' ', '']
    )
    return splitter.split_documents(docs)

# === Custom code for the application-critical path ===
# - Embedding generation: direct OpenAI API call (full control over model, batching)
# - Vector storage: direct Pinecone client (full control over indexing, metadata)
# - Retrieval: custom logic with reranking and filtering
# - Prompt construction: simple string templates (fully visible)
# - LLM call: direct provider SDK (full control over parameters)
# - Response parsing: Zod validation (type safety)
# - Monitoring: custom metrics (full visibility)

This gives you the best of both worlds: LangChain’s breadth of document integrations for commodity tasks where you do not need fine-grained control, and full ownership over the application-critical path where debuggability, performance, and reliability matter.

Alternatives Worth Considering

If you decide against LangChain for your core application logic, several lighter-weight alternatives address specific needs without the full framework overhead:

LlamaIndex – Focused specifically on data indexing and retrieval for RAG. Less ambitious than LangChain, which makes it more focused and stable for that specific use case. If your primary need is document Q&A or knowledge base search, LlamaIndex is often a better fit. It has better default retrieval strategies and clearer documentation.
Haystack by deepset – A production-oriented NLP framework with less hype but more battle-testing. Strong focus on evaluation, pipeline clarity, and production deployment. The API is more stable between versions.
Provider SDKs directly – OpenAI’s Python and Node.js SDKs, Anthropic’s SDK, etc. For simple use cases (single LLM call, structured output, embeddings), the provider SDK is literally all you need. No framework required.
Vercel AI SDK – For TypeScript/Next.js applications specifically. Handles streaming, provider abstraction, and tool use with minimal abstraction overhead. Excellent developer experience for web applications.
Instructor – A lightweight library specifically for structured output extraction. If your main need is getting reliable JSON from LLMs, Instructor does this one thing very well with minimal abstraction.

Conclusion

LangChain is a good prototyping tool, a useful utility library, and a mediocre production framework. Its document loaders and text splitters are genuinely useful standalone components. Its chain composition (LCEL) and memory abstractions add complexity that hinders debugging and maintenance in production applications. Its rapid API evolution makes long-term dependency management painful.

The core logic of most LLM applications is not complex. An embedding call, a vector search, a prompt assembly, and an LLM call. That is 50-100 lines of application code that you can read, understand, debug, and maintain. Adding a framework on top of those 50-100 lines makes sense during prototyping when you value speed over clarity. In production, those lines of explicit, debuggable, stable code are worth more than any framework’s abstractions.

Use the parts of LangChain that save you genuine engineering effort on commodity tasks. Build the critical path yourself. You will end up with a system that is more understandable, more debuggable, and more maintainable than either extreme of “all LangChain” or “nothing from LangChain.”

One final thought: the LLM application framework landscape is maturing rapidly. What is true about LangChain today may not be true in six months. The principles, however, are timeless: prefer explicit over implicit, debuggable over abstract, and simple over clever. When evaluating any framework, ask yourself: does this make my system easier to understand and debug, or harder? If the answer is harder, the framework is costing you more than it saves, regardless of how many GitHub stars it has or how many tutorials recommend it.

The best LLM applications we have seen in production share a common trait: their core logic is boring. No fancy framework abstractions. No novel composition patterns. Just straightforward code that makes API calls, validates responses, logs everything, and handles errors gracefully. The sophistication lives in the prompt engineering, the evaluation infrastructure, and the production monitoring, not in the application architecture. Keep the architecture simple and invest your complexity budget where it actually improves the product.

If you are starting a new LLM project today, here is our concrete recommendation: use the OpenAI or Anthropic SDK directly for your first version. Add LangChain’s document loaders if you need to ingest PDFs or other document formats. Write the rest yourself. You will have a working, debuggable, maintainable system in less time than it takes to learn LangChain’s abstraction model, and you will understand every line of code in your production pipeline. That understanding is worth more than any framework’s convenience when you are debugging a production issue at 2am.