How Guerrilla User Testing Uncovered an a/b Test Bias

An example of how low-effort user experience research can yield high value.

Nov 22, 2024

Often we learn more from a/b tests that fail. Back in 2019 at Quizlet one such test offered the opportunity for me to do guerrilla user testing, or asking people on the street questions. It was 2019 and I paired with team member Stephanie DeFranzo to take to the streets to try to understand why the test was failing.

Now, five years later, Stephanie and I caught up to reflect on that experience and guerrilla user testing in general.

What is guerrilla user testing?

Guerrilla user testing is brief, informal user testing with random individuals to gather opinions, feedback or reactions. They’re a low-cost, quick way to uncover usability issues for small flows or single pages.

What isn't it?

It’s not meant to provide in-depth insights or replace formal user testing, and they’re definitely not true scientific surveys.

Why did we use guerrilla user testing?

We had a failing a/b test - a redesign of Quizlet’s iOS upgrade screen - and limited time and resources to figure out why. We needed to fix the design to meet our goal of growing Quizlet’s subscriptions.

At first, the bias of our PM (Alex), designer, and iOS developer had convinced the team that the problem was technical, and not the design - we believed there was a bug in the implementation. How could this beautiful, animated grid of cute icon features fail? The team cut the data every which way exploring how the test performed across user groups, iOS devices, iOS versions, geographies, etc. No matter how we sliced the data, the new design’s performance was down.

With the test quantitatively down, we could only guess at the qualitative 'why'. We needed to gather user feedback quickly without the support of a UX Research team (which Quizlet didn’t have at that time) so we opted for guerrilla user testing - ultimately showing how low-effort UXR can have high-yield value.

How did we go about this?

We grabbed two test devices, loaded one with the new upgrade screen and another with the old design and headed outside for an IRL A/B test. Our plan was to present people with both upgrade screens and ask for their reactions (not telling them which screen was new). Our goal was to search for a trend in user behavior or verbal feedback that might help us understand why our new design was failing.

What were the upsides to this approach?

Guerrilla user testing is quick, affordable, and doesn't require a full research team. It offers a unique opportunity for team members who don’t regularly engage with users to observe user interactions firsthand, fostering empathy. The informal setting can also help users feel more relaxed than in a clinical UXR setting, which might lead to more honest responses.

What were the drawbacks to this approach?

Our location in San Francisco's tech hub meant our target audience didn’t align with Quizlet's core demographic (HS/college/grad students). However, since the experience we were testing (an upgrade screen) wasn't specific to students, guerrilla testing could work. The biggest drawback of guerrilla testing is the inherent awkwardness of approaching strangers and asking for their time. Many people will decline.

What did you learn?

While we ultimately solved the problem with the a/b test (more on that below) our biggest learning was about the bias our product-engineering-design team had for their own design. Until the flaw was made obvious by the people we interviewed they had a difficult time believing the design was hard to use. In the development of this design users weren’t consulted and this experience taught us to user-test our designs before committing any code (essentially testing them in a cheap way before a/b testing them in production). Stephanie summed up these learnings well:

As a product specialist at that time, I was often trying to convey synthesized feedback, some of it at odds with proposed designs, and it was very difficult to get that inconvenient feedback incorporated. Part of the problem was my own biased dataset (people writing into Quizlet with problems), but the bigger challenge was one that's persisted throughout my career in UXR and design: humans are much more likely to believe what they themselves can observe over what their colleagues tell them. When product teams have the opportunity to experience user pain firsthand - through compelling quotes, videos, and IRL conversations with users - they're more likely to believe feedback that’s at odds with their assumptions.

As for the a/b test, over the course of ~90 minutes we were able to talk with 5-10 people and discovered a trend in their feedback. The beautiful graphical grid design of the new screen was confusing to people - they found it much easier to scan vertically down the list of features on the old screen (see comparison below). A big part of this learning came through observing the testers in person. As they used the new screen we’d see their gaze move horizontally and vertically as they tried to understand the subscription features - versus the old screen which they could more easily interpret with a single glance.

Tips for guerrilla user testing:

Develop your elevator pitch. Have a very discreet ask for the person, keeping the interaction under 5 minutes. Remember that you're interrupting their day and the clearer you are with an ask the more likely you are to get quality answers
Location, location, location. Keep in mind that geography is biasing you so ensure that what you are looking to understand won't be harmed by that Geographic bias. Location will also help you recruit testers - we had the most success approaching people waiting in line at a food truck, since they were stuck there anyways.
Have a visible incentive in your hand to soften the ask. We brought Quizlet t-shirts and some snacks from the office pantry.
Distance yourself from the thing you are testing so that people don’t bias their answers out of social fear of making you feel a certain way. When Stephanie and I interviewed people we prefaced our ask by saying that this was a screen built by our colleagues who had asked us to get feedback (all true). We assured people that they wouldn’t offend us with their answers and we urged them to be direct.
Documentation - since guerilla testing is so informal, it probably won’t get a full research report, but it’s still important to document findings and share with your team to bring everyone on board with your learnings.

Return to Shipping on Fridays Home

A guest post by

Stephanie DeFranzo

Product designer & researcher, worked @ Google, Quizlet, startups

Shipping on Fridays

Discussion about this post

Ready for more?