At WePay, we have a well defined process and structure to document software design. This post describes our design process and software design template.
Every early stage software company has one goal: to ship software ASAP. This attitude is essential for the business to survive. But it promotes the idea that “Code is the documentation”. This gets out of hand quickly.
WePay started off developing software using a LAMP stack. In the beginning, the code was not complicated. As the company rolled out new features and grew the team, the lack of documentation slowed developer productivity.
Teams were not communicating well, either. For example, we evaluated the open source version of Couchbase, but decided not to support it. This decision was not communicated or documented. Separately, another team built an application using Couchbase. This led to wasted developer effort.These are just a few of the problems we were facing. With this in mind, we set out to create a lightweight design document process.
Let’s Do It
It was clear that we needed a process to define:
- When to write a design document?
- What the document should contain?
- Who should write it?
We decided that any new software must have a design document. The owner of the feature must write the documentation. “Features” might be a standalone micro-service, an internal tool, a piece of infrastructure, or a reusable library. Most new feature development happened in microservices, so this is where we focused most.
A template was created for developers to document their design.
- Design document template for documenting Microservices.
- Infrastructure design does not follow a standard template. Good design for any infrastructure should contain an introduction, high level architecture, terminology, setup instructions, testing as some of the sections. Examples of infrastructure design can be found here.
All principal engineers were added to a “design review approval” group. New design documents were submitted to this group. We also decided not to attempt to document all existing systems–just new work.
Houston, we have a problem
Some themes emerged as developers began to use the new process.
Every engineer would submit a design for review and immediately schedule a meeting with the “design review approval” group. One meeting was not usually adequate and would require multiple follow-up. The review panel always had similar requests:
- “What are the hard dependencies for your service”
- “What are the access patterns of your service?”
- “What are the API’s it exposes?”
- “How are you securing the sensitive data?”
- “How will you ensure HA?”
- “Are the operations idempotent?”
The “design review approval” group also became a bottleneck. Reviews took longer than a month. Developers began adding the new features to existing services so that they could avoid design reviews.
Software is not static. It has to be maintained, bug fixed, and patched. Within a year, design documents grew out of sync with reality.
Finally, work was being done on development tools, infrastructure code, and reusable libraries–not just services. Our design template was not suitable for these types of systems. Developers began using their own templates, or skipping the documentation all together.
Iterate and Improve
We began to fine tune the process to address the shortcomings. Ultimately, our goal was to make it easy for the developers to write design documents. We added many of the common questions coming out of design meetings to the template.
We also decentralized reviews for standard microservices. Rather than a centralized committee, we began requiring “sponsors”. Sponsors could be any senior engineer–provided they were from another team. The sponsor’s job was to provide early feedback. A site reliability engineer was also required, so that all the infrastructure related issues can be uncovered at an early stage.
The “design review approval” no longer needed to approve standard microservice designs. These designs were reviewed and approved by the sponsor, embedded SRE, and the team’s manager. The “design review approval” was notified by email, and the group was opened so anyone could participate.
We also introduced documentation into our SDLC. If code is landing with a major version bump, the CI system requires the developer to link the updated design document to the commit using a pre-commit hook. The template was generalized as well. It now worked for development tools, infrastructure code, and reusable libraries.
Required approvals remained for non-standard designs, but the approval group was expanded to include all the Staff engineers. We also set a strict SLA of 2 weeks enforced. Unless anyone objects to a design review request, reviews were considered approved after 2 weeks.
After almost 3 years, we have close to 100 well written and reviewed design documents which reflect the current state of the system. One of the success measures is how quickly our new hires can on-board and become productive. Even non-engineers like product managers, account managers, and sales reps use the documentation.