Why We Open Source Our Internal Tools
In the past 18 months, Harbor Software has open-sourced seven internal tools. Not as a marketing exercise or a developer relations strategy, but because open-sourcing them made the tools better, made our engineering team more effective, and — counterintuitively — gave us a competitive advantage. The decision to open source internal tools is usually framed as altruism or marketing. For us, it is an engineering strategy with measurable returns. Here is the honest accounting of why we do it, what it costs, and what we get back.
The Seven Tools We Open Sourced
To ground this discussion in specifics, here is what we released and why each one existed internally before we open-sourced it:
- inference-bench — A benchmarking harness for ML model inference that measures latency, throughput, and memory usage across different hardware and batch sizes. We built it because we were comparing model performance across GPU types and needed reproducible benchmarks. 1,200 GitHub stars.
- schema-drift — Detects schema differences between database migrations and the actual database state. We built it after a production incident where a manually applied schema change was never captured in a migration file. 680 stars.
- cost-guard — Token cost tracking and budgeting for LLM applications. We built it after the $47 runaway agent incident described in our observability post. 2,100 stars.
- prompt-cache-analyzer — Analyzes OpenAI API usage logs to identify prompt caching optimization opportunities by finding common prompt prefixes across requests. Saved us $3,400/month in API costs. 450 stars.
- env-validate — Validates environment variable configurations against a schema at application startup. Prevents the “it works on my machine” class of deployment bugs where a missing or malformed env var causes runtime errors minutes into operation. 890 stars.
- migrate-safe — Database migration runner with automatic rollback, pre-migration snapshots, and drift detection. Wraps any migration tool (Prisma, Drizzle, raw SQL) and adds safety guardrails. 520 stars.
- otel-ai — OpenTelemetry instrumentation for AI/LLM applications. Structured spans for model calls, retrieval steps, and agent decisions with semantic conventions specific to AI workloads. 1,600 stars.
Total engineering investment to open-source these: approximately 340 hours over 18 months. That includes cleaning up internal code, writing documentation, setting up CI/CD for open-source repos, responding to issues, and reviewing community pull requests. That is roughly 4.5 hours per tool per month, or about 19 hours/month total across all seven tools.
The Quality Forcing Function
The most valuable benefit of open-sourcing internal tools is that it forces you to build them properly. When a tool is internal-only, it accumulates shortcuts: hardcoded paths, undocumented configuration options, assumptions about your specific infrastructure, implicit dependencies on other internal systems. These shortcuts create maintenance debt that compounds until the tool becomes fragile and hard to modify even for the team that built it.
When you decide to open-source a tool, you are forced to confront every shortcut:
- Hardcoded paths become configurable options with sensible defaults
- Implicit dependencies become explicit (documented in package.json, referenced in README)
- Undocumented behaviors get documented or removed
- Internal jargon in variable names and comments gets replaced with domain-standard terminology
- Tests that depend on internal infrastructure get rewritten to be self-contained
- Error messages that assume internal context (“Check Confluence page X”) get rewritten to be self-explanatory
This cleanup is not busywork. Every one of these changes makes the tool more maintainable for your own team. New hires can understand the tool without tribal knowledge. On-call engineers can debug it without consulting the original author. We have tracked defect rates on our tools before and after open-sourcing: the average defect rate dropped by 38% in the 6 months after open-sourcing, compared to the 6 months before. The act of making the code public forces a level of rigor that internal tools rarely receive.
// Before open-sourcing — internal shortcuts everywhere
const DB_HOST = 'harbor-prod-db.internal';
const configPath = '/opt/harbor/schema-drift.yaml';
// Undocumented assumption: requires Harbor's internal auth library
import { getToken } from '@harbor/auth';
// After open-sourcing — configurable, documented, dependency-free
import { SchemaDrift } from 'schema-drift';
const drift = new SchemaDrift({
connectionString: process.env.DATABASE_URL,
migrationsDir: './migrations',
// Every option documented with JSDoc, TypeScript types, and README examples
strictMode: true,
ignoreExtensions: ['pg_trgm', 'uuid-ossp'],
reporter: 'json' // 'json' | 'table' | 'github-annotation'
});
const report = await drift.check();
if (report.hasDrift) {
console.error(report.format());
process.exit(1);
}
Community Contributions We Did Not Expect
The contributions that matter most are not code patches — they are use cases we did not anticipate. When external users try to use your tool in environments and workflows you never designed for, they surface assumptions and limitations you would not have found internally. This is essentially free QA from highly motivated testers (they are trying to solve their own problems, which is the strongest motivation for thorough testing).
Three examples from our repos that directly improved our own usage:
cost-guard received a PR for Azure OpenAI pricing. We built it for OpenAI’s API exclusively. An enterprise user at a Fortune 500 company submitted a PR adding Azure OpenAI pricing models, which differ from OpenAI’s direct pricing in subtle ways (different token counting for some models, commitment-based pricing tiers, different rate limit headers). We would never have built this ourselves because we do not use Azure OpenAI, but several of our consulting clients do. That PR saved us 20+ hours of work we did not know we needed. We now offer cost-guard to consulting clients on Azure without any additional development effort.
schema-drift received a bug report about CockroachDB compatibility. We tested against PostgreSQL and MySQL. A user discovered that CockroachDB’s information_schema returns column metadata in a different format (specifically, the column_default field wraps default values in additional type annotations), causing false-positive drift reports. The fix was three lines of code, but the bug would have bitten us if we ever migrated to CockroachDB (which was on our 2026 roadmap). We got that compatibility for free, 8 months ahead of when we would have needed it.
otel-ai received a feature request for Anthropic Claude instrumentation. We built it for OpenAI initially. The feature request came with a detailed spec of Claude’s API differences (different token counting methodology, different streaming event format, different tool-calling structure). The requester implemented it themselves and submitted a well-tested PR. We now support both providers without having done any of the research or implementation work for the second provider. This pattern repeated with Google’s Gemini — another community member added Gemini support 3 months later.
The Hiring Signal
Open-source repositories are the most effective hiring signal we have found. When a candidate has used one of our tools, starred our repos, or submitted an issue or PR, we know three things before the interview starts: they work on problems similar to ours, they are engaged enough to evaluate tools rather than just using whatever is popular, and they have demonstrated technical judgment by choosing a tool that solves a real problem.
In the past 18 months, 4 of our 7 engineering hires first encountered Harbor Software through our open-source tools. Two of them had submitted issues or PRs before applying. The quality of these hires, measured by 6-month performance reviews, is higher on average than hires from other channels. This is not because open-source contributors are inherently better engineers — it is because the self-selection filter is more precise than any screening process we could design. Someone who found our schema-drift tool, evaluated it against alternatives, and used it in their own project has already demonstrated the problem-solving approach and technical taste we look for.
We also include links to our open-source repos in job postings. Candidates can read our actual code before applying. This filters out candidates who would not be a style or quality fit, saving interview time for both sides. One candidate told us in their interview: “I read your schema-drift codebase and I liked the error handling patterns and the way you structured the plugin system. That is how I like to write code too.” That signal is worth more than any take-home assignment.
The reverse is also true: candidates who visit our repos and decide not to apply have self-selected out, which is valuable. If someone reads our code and thinks “this is not how I work,” they are probably right, and we have both saved the time of an interview process that would not have ended in a successful hire.
What We Do Not Open Source
Not everything should be open-sourced. We have clear criteria for what stays internal:
- Business logic. Our model serving router, which implements our proprietary multi-model routing algorithm, stays internal. This is our competitive differentiation — the logic that decides which model handles which request based on cost, latency, and quality predictions. The routing algorithm’s effectiveness comes from our specific evaluation data and tuning, which we cannot share.
- Infrastructure glue code. Scripts that are tightly coupled to our specific AWS setup, deployment pipeline, or monitoring configuration. These are not generalizable enough to be useful to others and would require significant abstraction work to open-source.
- Tools with fewer than 3 potential external users. If we cannot imagine at least 3 companies that would use a tool, the maintenance burden of open-sourcing it (documentation, issue triage, PR reviews) exceeds the benefits. We have a 10-minute exercise where we try to list 3 specific companies or team types that would use the tool. If we cannot, it stays internal.
- Anything touching customer data schemas. Even if the tool itself is generic, if the tests or examples reveal the structure of our customer data, it stays internal. We scrub all test fixtures and example configurations before open-sourcing.
The Operational Cost Is Real but Manageable
Open-sourcing is not free. Here is the honest cost breakdown across our 7 repos over 18 months:
Activity | Hours/month (avg across all 7 repos)
-------------------------------|---------------------------------------
Initial cleanup + docs | 12 (one-time, amortized over 18 months)
Issue triage and response | 3
PR review and feedback | 4
Security vulnerability review | 1
Release management (versioning)| 2
Documentation updates | 1.5
-------------------------------|---------------------------------------
Total ongoing | ~11.5 hours/month
Total with amortized init | ~12.2 hours/month
Roughly 12 hours per month across 7 repos. That is about 1.7 hours per tool per month — less than a single standup meeting per tool. The return on this investment includes the quality improvements (38% defect reduction), community contributions (Azure pricing, CockroachDB compat, Claude instrumentation, Gemini support), and hiring signal (4 of 7 hires sourced through open-source).
The key to keeping the cost manageable is setting expectations upfront. Our repos have clear CONTRIBUTING.md files, issue templates that require reproduction steps and environment details, and a stated response time SLA (we aim to respond to issues within 5 business days and PRs within 10 business days). We do not promise immediate responses, and we explicitly state that the tools are “actively maintained but not our full-time job.” This sets realistic expectations and filters out users who need enterprise-grade support (we direct them to our consulting services, which generates revenue).
We also automate what we can. Dependabot handles dependency updates. GitHub Actions runs tests on every PR. Release Please automates version bumping and changelog generation. CodeQL scans for security vulnerabilities. These automations reduce the per-PR review burden from roughly 20 minutes to 5 minutes because the CI pipeline catches most issues before a human looks at the code.
License Choice: MIT vs Apache 2.0 vs AGPL
License selection matters more than most teams realize, especially when your tools might be used by enterprise customers who have strict compliance requirements around open-source license compatibility. We standardized on MIT for all our open-source tools after evaluating the tradeoffs:
MIT is the most permissive and enterprise-friendly license. It allows anyone to use, modify, and distribute the code with no restrictions beyond preserving the copyright notice. Enterprise legal departments approve MIT-licensed dependencies without review in most organizations. This matters because if an enterprise developer cannot get legal approval to use your tool, they will not use it, regardless of how good it is. We want our tools to have the widest possible adoption, so MIT is the default.
Apache 2.0 includes an explicit patent grant that MIT lacks. If your tool implements novel algorithms or techniques that you have patented (or might patent), Apache 2.0 ensures users are not exposed to patent infringement claims from you. We considered Apache 2.0 for inference-bench because it implements specific benchmarking methodologies, but decided the patent risk was negligible and stayed with MIT for consistency.
AGPL requires anyone who runs modified versions of the software to release their modifications as open source, even if they are running it as a service (not distributing it). This is sometimes called a “copyleft” or “share-alike” license. AGPL is effective at preventing competitors from forking your tool, adding proprietary features, and offering it as a commercial service without contributing back. However, AGPL is actively blocked by many enterprise legal policies (Google famously prohibits AGPL dependencies), which severely limits adoption. We do not use AGPL for any of our tools.
One additional consideration: if you are building tools that interact with cloud services or databases, make sure your license is compatible with the licenses of your dependencies. We had a contributor submit a PR to schema-drift that included code adapted from a GPL-licensed database tool. We had to decline the PR because GPL code cannot be included in an MIT-licensed project without relicensing the entire project under GPL. This was an awkward conversation that could have been avoided with clearer contribution guidelines — we now explicitly state in our CONTRIBUTING.md that all contributions must be compatible with the MIT license.
Documentation That Drives Adoption
The documentation quality of an open-source tool determines its adoption more than the code quality. A well-documented mediocre tool will get more users than a poorly-documented excellent tool. We learned this the hard way: our first open-source release (inference-bench) had excellent code but a minimal README that assumed the reader already understood ML benchmarking methodology. It sat at 50 stars for 3 months until we rewrote the documentation. After the rewrite, it hit 500 stars in 6 weeks.
Our documentation template for every open-source tool:
- One-sentence description. What does this tool do? If you cannot explain it in one sentence, the tool’s scope is too broad or the naming is unclear.
- Why this exists. What problem does it solve? What was the pain point that motivated building it? Users need to recognize their own pain in your description to decide the tool is relevant to them.
- Quick start. Install command + minimal code example that demonstrates the core value proposition. This must work with copy-paste in under 2 minutes. We test our quick start examples on every release by running them in a fresh environment.
- API reference. Every public function and configuration option, with types, defaults, and examples. Generated from JSDoc/TSDoc comments using TypeDoc so it stays in sync with the code.
- Recipes. Common use cases with complete, working code examples. These are the pages that drive the most traffic from search engines because people search for “how to detect database schema drift with Prisma” not “schema-drift npm package.”
- Troubleshooting. Common errors and their solutions, written as the user would experience them (“Error: Connection refused” not “TCP connection failure”). This section grows organically from GitHub issues — every issue that gets asked more than twice becomes a troubleshooting entry.
We host documentation on the repository’s GitHub Wiki for simple tools and on a dedicated docs site (built with Docusaurus) for tools with more extensive documentation. The decision threshold is roughly 10 pages of documentation: below that, a GitHub Wiki is fine; above that, a dedicated site with search, versioning, and navigation is worth the setup cost.
How to Decide: A Framework
If you are considering open-sourcing an internal tool, evaluate it against these four criteria:
- Is it a solved problem with a novel approach? Tools that solve well-known problems (database migrations, environment validation, cost tracking) in a better way get adopted. Tools that solve problems nobody else has get ignored regardless of how good they are.
- Is it decoupled from your business logic? If extracting the tool requires abstracting away business-specific details, the cleanup effort is justified because your internal code gets cleaner. If it requires exposing business-specific details to be useful, it should stay internal.
- Can you maintain it with less than 2 hours/month? If the tool is complex enough to require significant ongoing maintenance, the cost may exceed the benefits unless it drives material hiring or revenue value.
- Would you use it if someone else published it? If yes, other teams will too, and the virtuous cycle of contributions and improvements will justify your investment. If you are not sure, it probably is not solving a common enough problem to attract contributors.
Open-sourcing internal tools is not charity. It is an engineering strategy that improves code quality, attracts contributions you did not know you needed, and creates hiring advantages that are difficult to replicate through other channels. The cost is real but modest compared to the returns. Start with one tool — your most generic, most polished internal utility — and measure the results over 6 months. If the experience is anything like ours, you will be open-sourcing the next one within a quarter.