Interviewing at WePay

In my last post, I consumed your valuable screen and brain space discussing what our interviewing process looks like - what we do, and what we are trying to achieve. This post (taking up more of your valuable screen real estate) will focus on particular decisions that we made, and delve more into why we made them.

Everyone interviews

We’ve decided that we want everyone to be able to be part of the engineering interview process. Part of that is selfish - the more people we have interviewing, the fewer interviews each individual person has to do. But the larger part is that we want as broad a cross-section of the company as we can get to interview candidates, and the module system lets us create a system where we can easily slot people into an interview, even if they aren’t engineers - we have people on the product team who are qualified to do background and design interviews, for example. It means that our candidates are exposed to the full diversity of employees at WePay, that they are evaluated from a wide variety of perspectives, and helps us to maintain that diversity as we hire more people. Enforcing that everyone interviews also means that we can better avoid echo chamber effects, either where a particular team diverges from the rest of the company in terms of their standards, or where “The Interviewers” end up losing touch with what sorts of skills and qualities they should actually be evaluating.

Teaching interviewing

Of course, you can’t say that everyone interviews and then just throw them into a room with a candidate. Interviewers need training, examples, and experience to be able to interview well, so we established a tiered training system:

The lowest level are Newbies. This is where people start out and learn to interview. They may observe interviews, or run them with someone else observing them who is ready to help out if needed
The second level are Users, people who have been certified as able to conduct interviews well enough to handle them on their own. They have the skills to evaluate candidates fairly, handle unexpected situations gracefully, and generally do the right thing in interviews. But they may not yet be able to effectively teach someone else to interview at their level, either because they aren’t fully aware of why they are doing certain things or because they haven’t fully developed their teaching and mentorship skills.
The third level are Admins: the people who have enough experience and awareness around interviewing to be able to teach someone how to handle interviewing situations, and (more importantly) why they should be handled that way. They mentor Newbies and Users and help them develop their interviewing skills.

We settled on this system as a way to get the most that we could out of our limited live training time. We also recognize that training interviewers with real candidates is inherently more risky than the alternatives - this is someone that we may want to hire, and on who we need to be making a good impression. However, as much as we tried to get an interviewing system that didn’t involve a lot of time spent in rooms with candidates, there were too many instances that we couldn’t reliably capture outside an actual interview - novel approaches, belligerent candidates, unexpected difficulties, technical glitches, and more. Those are all things that Users need to be able to handle smoothly, and that we haven’t found a training tool short of putting them in a variety of interviews that adequately prepares them. Making sure that there is always someone in the room who can step in and handle any issues that might arise serves as a safety net, and means that nobody is left with a situation that they don’t have the confidence to handle.

Everyone discusses their evaluations

In addition to written feedback (“I asked this, the candidate did that well, etc.”), we have all of the interviewers join a post-interview discussion to talk about what they saw, what they thought, and to come to a consensus on whether we want to proceed. We opted for this model over the other models that I have seen (a centralized hiring committee, a decision from the hiring manager, or “no more than one no”) for several reasons:

Getting all of the interviewers in the same room allows for greater discussion of the specifics and tradeoffs involved with a particular candidate. Are they good enough at coding that it makes up for their weakness in design? That might not have been something that the coding interviewer was writing their feedback toward, but it’s a valid and significant question to ask, and the answer will be specific enough to a particular interview that establishing a guideline is difficult.
Tying in with the point above, it gives people new to interviewing more chances to see how other people interview and how they evaluate. It also provides an excellent place to establish a consistent bar across the company - by discussing among ourselves, it is easier to spread and internalize where the bar is, and what a fair, appropriate, and consistent evaluation looks like across a variety of modules.
Relatedly, it disconnects the bar from an individual team’s current hiring need. A hiring manager may be tempted (in a way that a panel of engineers from across the company likely won’t be) to compromise on a certain candidate because they have a project for them right now, and hiring below the bar that we’ve set undermines our goal of maintaining a world-class engineering team.
It has been shown that biases in decision making can be reduced when the decision maker will have to justify their decision to someone else. By using a discussion and consensus model, rather than one where the interviewers don’t have to engage with the hiring decision, we hope to make our interview process more objective and ensure that we’re making our decisions for the right reasons.
The discussion allows concerns that came up in other modules to be confirmed or assuaged. For example, a coding interview that picked up hints of communication issues but didn’t go deeper into them could mention those and get a second data point from the people who ran the communication and background module. Addressing those concerns in a discussion means both that the coding interviewer doesn’t feel like they have to dig into that at the expense of coding time and that those interviewers don’t feel like a potential flag was missed because it didn’t come up in the right interview.
By using the interviewers, we are able to keep our feedback loop (both to interviewers and to candidates) short, and we’re able to keep interviewers engaged in the process. We don’t want interviewers to feel like they are submitting their feedback to the void, particularly if it’s a candidate that they are excited about, and if there is feedback for the interviewer we want to be able to get it to them while the interview is still fresh in their mind.

Two consistent criticisms of the discussion model that I’ve seen directed at it in the past are also at least somewhat mitigated by the rest of our interviewing system. The first, that a persuasive interviewer who liked the candidate can sway the entire panel, is lessened by the nature of the module structure - because each interviewer is evaluating a discreet area, discussion becomes less around which interview gave a better picture of a candidate’s true abilities and more around if a strength in one area makes up for a weakness in another. While individual persuasiveness can still play a role there, it becomes much harder to simply overrule another interviewer when each evaluated different areas. The second, that hiring managers may not have a chance to review people who are placed on their team, we avoid by creating a separate module for hiring managers to meet with the candidate. This gives the hiring manager a chance to get a feel for the candidate without putting them in a position to overrule one of the technical modules. It also serves the important function of letting the candidate ask questions that would be less appropriate to ask a fellow engineer (compensation and promotion, or time off standards) - we know that candidates are evaluating us in addition to us evaluating them.

Questions: To standardize or not to standardize?

We decided that, despite our focus on interview stability, we didn’t want a pool of ‘approved’ questions. The argument for standardized questions was that it would ensure that everyone was being evaluated to the same standard, and there wasn’t much variation in the difficulty of the questions themselves. It would also make the requirements for our interviewers lower - they would only have to thoroughly learn the official questions, and wouldn’t need more than that. However, there were two overwhelming negatives to having an approved question list:

By necessity, such a list would be relatively short, and that would create a vulnerability to interview prep sites. While we think that our questions are at least reasonably resistant to memorized solutions, a candidate who knows they will only be asked one of five questions can significantly cut down on their preparation. That focus means that it becomes much more difficult to separate the candidate’s abilities from their preparation. While we could come up with new questions, the economics of us vs. the internet are not in our favor.
One of the things that we were trying to create was the sense that individual interviewers were responsible for the interview system. By defining what a ‘good’ question was, but allowing people to come up with their own questions that they liked that fit those guidelines, interviewers feel like it’s still their interview rather than a script.

Of course, that doesn’t mean that there isn’t a question library - but it’s a resource, rather than a mandate. It also doesn’t mean that we don’t have standards for what a good question is - we absolutely want to make sure that our interviewers are asking questions that are fair and are evaluating the right areas, but that’s a subject for another blog post.

Whiteboard vs. computer-based coding

This is likely the most controversial of the decisions here: At WePay, all of our coding questions expect to be whiteboard-quality. That’s not to say that we would deny candidates the use of tools: during phone interviews we have the ability to execute code, and we certainly wouldn’t prevent a candidate from using a laptop if they wanted to. But we always only expect “whiteboard-quality” code, meaning that the code should be broadly correct but can have syntax or other minor errors as long as they don’t demonstrate difficulties with the underlying language or principles. There were two important factors that lead us to focus on building our interview experience around whiteboard coding: flexibility and focus.

Flexibility. A candidate can code any problem they are asked, in any language they want, on a whiteboard. We can interview people in Java, Python, C, Swift, Javascript, PHP, whatever. Providing and maintaining an environment for each of our questions in each of those languages is an engineering investment that we don’t think is worth the benefit that it offers.
Focus. On a whiteboard, it’s easy to hand wave things that would be needed, but which aren’t relevant to the question at hand - a simple implementation of Java’s Comparator interface, or code to read and write to a file. But on a laptop, not only are those things required to allow the code to work (taking valuable interviewing time), but there is also a tendency to focus on errors or issues within those portions rather than on the meat of the problem

So, that’s some more of the decisions that we made when shaping WePay’s interview process, and why we made them the way that we did. Each of these decisions were made as part of a deliberate effort to create a scalable, consistent, efficient process to interview and select good engineers. Being able to build and refine that process is what is going to allow us to grow our engineering team while maintaining the same high standards we have now. As we scale this out, we’ll also be looking at how we can apply this system and it’s principles beyond only our engineering team - I’m sure there’s another blog post in there as we see what makes the transition well and what doesn’t.