Evaluating Open Source Third-Party Dependencies
It’s wise to be selective when choosing which third-party dependencies to use. Learn how we evaluate and identify a quality dependency when using external code.
Modern software rarely gets built in a pristine, idyllic context wherein engineers have the luxury to construct every piece of every system by hand, character by character, line by line, program by program, until a flawless Da Vincian masterpiece emerges.
Instead, we have time and budget constraints that mean we often need to reach for pre-existing software to which we can outsource a subset of responsibility to deliver the necessary features on a reasonable timeline. Developers are expensive and though we may dream of the fantasy of building everything from scratch, as pragmatic individuals, we recognize the value in building on top of quality work produced by other people.
However, every piece of external code we pull in and depend on is a potential liability. It’s wise to be selective when choosing which third-party dependencies to use.
Often I see guidance on this topic focus on security concerns which are important risks to consider, especially from a business perspective. But there are many other qualities to evaluate when selecting which libraries to depend on that will substantially impact whether or not that decision comes back to haunt you later.
Are people using the thing?
A project with a high number of stars on GitHub, or a large number of weekly downloads or installs on
npm signals that people in the community recognize the project’s need, value, and quality.
Sometimes these metrics can be misleading. For example, there are old, outdated projects on GitHub with many stars that were once relevant but have been replaced by newer, better ideas and libraries. Make sure you check to see whether something is actively maintained in addition to its popularity.
Are there plenty of articles, blog posts & other learning resources being published about the project?
A good signal to gauge popularity is whether people write about using a thing. This helps you get a sense of which real problems the thing solves and gives you anecdotal evidence of how well it solves those problems.
These resources are an invaluable source of information about potential pitfalls, shortcomings, or edge cases that are tricky to deal with.
Does the project have good documentation?
Good documentation, simple as it sounds, actually encapsulates a wide variety of different considerations. A quality dependency will have detailed documentation that’s well-organized, easy to navigate, has good sample code, discusses trade-offs and comparisons with competing libraries, helps guide developers with how best to use the tool, and teaches you how to avoid the footguns.
Does the project have a variety of non-trivial examples showing how to use it effectively?
Having good documentation is crucial for effectively using a library, but having great examples can significantly accelerate adoption speed. It also demonstrates that the authors have an awareness and consideration for how developers actually use the library and signals which use cases the maintainers are actively thinking about.
Software Design #
Software design is a complex topic. It encompasses everything from data structures to algorithms to architecture and APIs. Here are a few questions you can ask yourself to aid in your evaluation of how well-designed a project is:
Is it going to be easy to work with and flexible to my expected use cases?
Naturally, a good starting point is to understand what steps you will need to take to stand up a basic integration utilizing a particular dependency for the underlying implementation. Which steps are required to install the dependency? Which code needs to be imported? Which functions do I need to call, and with what arguments or which objects do I need to construct? How do I achieve the specific user-facing behavior within my application by leveraging the primitives and features of this dependency?
If it doesn’t directly support my need, does it have sensible escape hatches?
Many, but not all, open-source software tends to skew towards designs that attempt to satisfy a wide variety of use cases. When a tool is designed this way it often contains a lot of extra code that’s irrelevant to what you need it to do. That code, and the associated configuration, documentation, and API surface area, is a distraction for you as you attempt to express your specific feature within its API.
If your use case falls outside of that scope, then you are at the mercy of the authors’ design decisions when it comes to tweaking or extending the built-in behavior. A well-designed dependency accounts for this needed flexibility through in-built escape hatches, giving you the right hooks to bend it to your will.
Some tools are made much sharper, targeting a narrower set of use cases. If it includes your specific need, they might be a better choice than an all-purpose library.
Does the API compose well with other parts of my program?
Well-designed dependencies are composable. They’re broken down into logical pieces that can be combined in interesting and flexible ways to serve a given purpose, often augmented by other parts of your system or other dependencies.
Many facets contribute to how composable a piece of software is. One example is the programming paradigm. Functional and object-oriented code typically don’t compose well together. Another good example is synchronous and asynchronous code. One can be run inside the other, but not vice versa (sync runs in async, but async does not run in sync).
Did the authors choose the right tools to author the library in a way that facilitates easy integration?
Is the project marked as deprecated or in maintenance mode?
It’s common in the open-source software world for maintainers to move on from a particular project. They lose interest, a better tool comes along, or adoption wanes, and the motivation to improve the code disappears.
These projects should be avoided unless there’s no other option. Don’t expect any bug or security fixes, new features, curation of reported issues, review and merging of pull requests, nor support from the authors. You’ll be left to handle all of this, likely in a fork of the original code.
When was the most recent commit pushed?
How recently the code was updated can give you an idea of how actively maintained the project is and how outdated it might be. If the last commit was several years ago, then it’s fair to assume that new features are unlikely to be on the horizon, and the approach of the library authors may not reflect the latest idioms and best practices standard in the ecosystem today.
Whether this signal is relevant is contextual. A library implementing a stable specification may not need updating and the last commit date is largely irrelevant.
Does the project have a lot of open issues or pull requests?
A project with many open issues can signal that either there’s a very large community of engineers using it, surfacing problems, and making feature requests, or that triaging and addressing issues is not a priority of the maintainers or both.
In general, it’s not a great sign. But more telling, in my experience, is a project with many open pull requests. This is a signal that the maintainers aren’t keeping up with feature development or community contributions well or are working on too many things concurrently. Once a backlog of pull requests piles up, many cycles are needed to review, approve, rebase, and resolve conflicts and then get everything merged in.
Is the project backed by a community of individuals versus a company or organization?
Both possibilities have some risks and benefits to consider.
Projects backed by companies or organizations typically have far more resources consistently available to maintain projects over time and can rotate individual contributors on and off tasks more efficiently, which can help preserve a high degree of engagement and forward progress.
These projects typically have more robust governance processes and ownership, which can help mitigate the risks of lengthy debates about priorities and direction and protects users and contributors from potentially malicious actors within the larger community.
However, these projects can sometimes suffer from being too narrowly guided by the organization’s needs rather than a broader community-driven set of use cases.
Projects backed by a community of individuals have a far greater reliance on the motivation and dedication of the individuals who choose to maintain and contribute to that codebase which can lead to unpredictable swings in how actively supported the project is and the quality and pace of bug fixes and new feature development
These projects often require one or more highly motivated champions to lead governance efforts and guide the community toward healthy debate, prioritization, and conflict resolution processes. It’s less clear in these projects whose vision should guide the adoption or rejection of proposed features. Sometimes it’s left to a more democratic process, and other times a strong owner takes responsibility for saying “this, not that,” and so forth.
Does the project use proper semantic versioning?
It can sometimes be tricky to get semantic versioning right and there are infinitely more ways to do it incorrectly. A package that does not adhere to the specification can break unexpectedly when you install minor or patch updates. If semver is followed properly, you should have no fear of installing the latest minor version without expecting any breaking changes. Keeping dependencies up to date is important for security but also compatibility and eliminating bugs.
Open-source projects you depend on may affect the performance characteristics of your software. Short of fully integrating them and doing your own testing, there are a few things you can look out for ahead of time to anticipate potential issues.
Does the library support partial compilation? Many open-source libraries offer a breadth of functionality that includes features unnecessary for your specific use case. Some languages and frameworks will optimize imports and eliminate the unneeded code so that you end up shipping to users only the code necessary to run your application.
What is the installed size of the library? On the web, frontend package size is correlated with browser performance. The larger the size of a library, the more bytes need to be sent over the network and end up parsed and evaluated by browsers, taking more time and data. For system dependencies, the primary constraints are memory and CPU usage, which are more difficult to infer based on the size of the dependency’s code.
Do the library authors provide any performance benchmark information? Measurements of library performance can be a helpful point of comparison with other popular options. Unfortunately, the recording of metrics is not a standardized process and is prone to bias and cherry-picking. Which packages are offered as comparisons, and the methodology for capturing benchmarks is up to the library authors. I tend to be skeptical when looking at author-provided benchmarks. But they can be a useful signal when combined with the others outlined in this article.
None of these metrics alone are enough to determine whether a project is worth depending on. An overall evaluation should come from looking at as many of these signals as possible in order to make a final determination.
It’s wise to be discerning when choosing to include third-party code in your application. And if you want more high-quality libraries to choose from, don’t neglect to support open-source software projects so their talented contributors can continue to create value for us all.