Using YARP as a split testing tool

This post assumes you have some knowledge or experience with YARP

There are many ways, and many tools for testing changes to website. These can range from per-page frontend tooling (think Optimizely, or the sunsetting Google Optimize), or perhaps you hard code elements in your backend pages using something like Scientist. Running tests themselves classically is done at the page/URL level, or perhaps a component on the page.

What if you have developed an entirely new website, what are you options for launching? Do you go big bang and hope and pray your user studies are correct and it won't entirely tank your engagement or conversion rate? What happens if Google sees those pages and you take a negative hit in SERPS?

An approach to this challenge could be to release a new version of your website to a specific audience or a specific % of that audience. But to do that at the website level generally means you need to re-think the front-door, and this is where a proxy is a good fit.

Leveraging a proxy, beyond the classical benefits of defining security boundaries, or amalgamating various backend services into a single point of presence, a proxy could also have the capabilities to split traffic between distinct versions of your website.

If you're in the .NET space like me, you have a couple of options - perhaps IIS' Application Request Routing (ARR) feature paired with some server farms, or there is another candidate in the mix, and that is Yet Another Reverse Proxy (YARP).

The Test Scenario

A customer as a legacy website, that has reached the limits of the technology and has invested in a new website, perhaps on a completely different technology stack. In my example, a customer with a legacy .NET website built on a custom CMS, has invested in a new Gatsby-powered statically-built website.

The new website is completely different, the HTML is different, the content, meta tags, usability, etc. have all been changed, refined (hopefully), but the customer naturally has concerns about doing a big-bang changeover.

YARP Configuration

So, before we begin, we need to define two different clusters representing our two websites (note - in our tests, our legacy website is known as baseline and the new website is known as candidate):

{
  "ReverseProxy": {
    "Clusters": {
      "baseline": {
        "Destinations": {
          "baseline-http": {
            "Address": "https://<baseline-url>"
          }
        }
      },
      "candidate": {
        "Destinations": {
          "candidate-http": {
            "Address": "https://<candidate-url>"
          }
        }
      }
    }
  }
}

Cluster configuration for YARP

Now we've defined our two clusters, we need to handle routing. Because this is a complete website split test, we need just a single catch-all route.

{
  "ReverseProxy": {
    "Routes": {
      "root": {
        "ClusterId": "baseline",
        "Match": {
          "Path": "{**catch-all}"
        }
      }
    }
  }
}

Route configuration for YARP

It would be possible I believe with some tweaking, to handle this at the route-level, to give your more granularity for running tests against different sections of the website too.

Test Configuration

OK, now we're going to jump into code and start defining some test configuration classes, these will utilize the Options library so we can hydrate these from any configuration source.

public class TestOptions
{
  public string CookieName { get; set; }
  public string CookieDomain { get; set; }
  public TestBaselineOptions Baseline { get; set; }
  public TestCandidateOptions Candidate { get; set; }
}

public class TestBaselineOptions
{
  public string ClusterId { get; set; }
  public string CookieValue { get; set; }
}

public class TestCandidateOptions : TestBaselineOptions
{
  public double Weighting { get; set; }
  public bool AllowBots { get; set; }
  public bool AllowChannels { get; set; }
}

Test options for controlling the test

These options allow us to configure our test scenario:

{
  "Test": {
    "CookieName": "website_ver",
    "CookieDomain": "www.mydomain.com",
    "Baseline": {
      "ClusterId": "baseline",
      "CookieValue": "v1"
    },
    "Candidate": {
      "ClusterId": "candidate",
      "CookieValue": "v2",
      "Weighting": 0.1,
      "AllowBots": false,
      "AllowChannels": false
    }
  }
}

Initial test configuration

Let's talk through these settings so get familiar with they all do:

Test:CookieName - this represents the name of a cookie used to control the traffic. Having a cookie means we do not need to remember state, we can simply forward the request to the right cluster. This also provides the ability to force a specific version of a test, which is great for debugging.
Test:CookieDomain: The domain property of our generated cookie. We specify this to limit the test to the right host. We don't want our test cookie potentially bleeding out into other subdomains.
Test:Baseline:ClusterId - The ID of the cluster representing our baseline - the legacy website.
Test:Baseline:CookieValue - We use v1 to represent the baseline for the cookie value, and also the query string parameter value (more on that later).
Test:Candidate:ClusterId - Same as Test:Baseline:ClusterId but for the candidate.
Test:Candidate:CookieValue - Same as Test:Baseline:CookieValue, but for the candidate, so v2.
Test:Candidate:Weighting - This is how we bucket between the two versions, expressed as a double between 0.0 and 1.0. In our example, our intial test is for 0.1 or 10% of our traffic.
Test:Candidate:AllowBots - We'll restrict search engine bots (like Googlebot), to the baseline for now.
Test:Candidate:AllowChannels - We'll restrict paid and social channels to the baseline for now too.

Implementing split testing with YARP

Now we've defined everything, let's get going with YARP. The first thing you'll need to do, is add a package reference to your project:

<PackageReference Include="Yarp.ReverseProxy" Version="2.0.0" />

And then let's create our ASP.NET application:

using Microsoft.AspNetCore.Builder;
using Microsoft.Extensions.Options;
using Yarp.ReverseProxy;

var builder = WebApplication.CreateBuilder();

// Add our configuration
builder.Configuration.AddJsonFile("appsettings.json");

// Register our services
builder.Services
  .Configure<TestOptions>(builder.Configuration.GetSection("Test"))
  .AddReverseProxy()
  .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
  
// Build our app
var app = builder.Build();

// Map our proxy
app.MapReverseProxy(b =>
{
  // Test code here
});

// Run our app
app.Run();

Minimal code for running our YARP proxy

Running this alone will just route all traffic to our baseline, because the default cluster for the {**catch-all} route is baseline.

We now need to implement some test code that redirects traffic. Let's define some utility methods first:

bool HasTestCookie(
  HttpRequest request, string cookieName, 
  string[] acceptedValues, out string cookieValue)
{
  if (request.Cookies.ContainsKey(cookieName))
  {
    cookieValue = request.Cookies[cookieName];
    for (int i = 0; i < acceptedVales.Length; i++)
    {
    	if (string.Equals(acceptedValues[i], cookieValue, StringComparison.OrginalIgnoreCase))
        {
          return true;
        }
    }
  }
  
  cookieValue = null;
  return false;
}

bool HasTestQueryParameter(
  HttpRequest request, string parameterName, 
  string[] acceptedValues, out string parameterValue)
{
  if (request.Query[parameterName].Count > 0)
  {
  	parameterValue = request.Query[parameterName][0];
    for (int i = 0; i < acceptedVales.Length; i++)
    {
    	if (string.Equals(acceptedValues[i], parameterValue, StringComparison.OrginalIgnoreCase))
        {
          return true;
        }
    }
  }
  
  parameterValue = null;
  return false;
}

bool IsPaidSearchReferral(HttpReqest request)
 => request.Query.ContainsKey("gclid")
 || request.Query.ContainsKey("msclkid")
 || string.Equals(request.Query["utm_medium"], "cpc", StringComparison.OrdinalIgnoreCase);

bool IsSearchEngineBot(HttpRequest request)
  => request.Headers.UserAgent.Contains("googlebot", StringComparison.OrdinalIgnoreCase)
  ||request.Headers.UserAgent.Contains("bingbot", StringComparison.OrdinalIgnoreCase)
  ||request.Headers.UserAgent.Contains("slurp", StringComparison.OrdinalIgnoreCase)
  ||request.Headers.UserAgent.Contains("duckduckbot", StringComparison.OrdinalIgnoreCase)
  ||request.Headers.UserAgent.Contains("facebot", StringComparison.OrdinalIgnoreCase)
  ||request.Headers.UserAgent.Contains("ia_archiver", StringComparison.OrdinalIgnoreCase)

Helper methods for running the test

These helpers are pretty naïve, particularly around matching bots and channels, but as a starting point it will serve us just fine.

OK, let's build our test proxy! We're going to be taking advantage of some YARP bits that allow us to dynamically switch the cluster as part of our test.

app.MapReverseProxy(b =>
{
  var comparison = StringComparison.OrdinalIgnoreCase;
  var options = b.ApplicationServices
    .GetRequiredService<IOptions<TestOptions>>().Value;
    
  b.Use(async (context, next) =>
  {
    var route = context.GetRouteModel();
    
    if (string.Equals("root", route.Config.RouteId, comparison))
    {
      // We only want to run our test when we 
      // match the "root" route. This helps if you need 
      // to specify other roots which should ignore the test
      
      // 1. We grab the cluster state for each of versions
      var lookup = context.RequestServices
        .GetRequiredService<IProxyStateLookup>();
      lookup.TryGetCluster(
        options.Baseline.ClusterId, out var baseline);
      lookup.TryGetCluster(
        options.Candidate.ClusterId, out var candidate);
      
      // 2. Let's grab the cookie and query parameter 
      //    value if it is present.
      bool hasCookie = HasTestCookie(
        context.Request, options.CookieName,
        new[] { 
          options.Baseline.CookieValue, 
          options.Candidate.CookieValue }, 
        out var cookieValue);
        
      bool hasQueryParameter = HasTestQueryParameter(
        context.Request, options.CookieName,
        new[] { 
          options.Baseline.CookieValue, 
          options.Candidate.CookieValue }, 
        out var parameterValue);
        
      bool useCandidate = false;
      bool setCookie = true;
        
      // 3. If there is a specific query parameter, 
      //    e.g. ?website_ver=v2, that overrides the test
      if (hasQueryParameter)
      {
        useCandidate = string.Equals(
          parameterValue, options.Candidate.CookieValue, comparison);
      }
      
      // 4. If the visitor already has the cookie, 
      //    use that to ensure they get the same version 
      //    for subsequent requests
      else if (hasCookie)
      {
        useCandidate = string.Equals(
          cookieValue, options.Candidate.CookieValue, comparison);
      }
      
      // 5. Filter out non-organic channels (e.g. Paid and social)
      else if (!options.Candidate.AllowChannels 
        && IsPaidSearchReferral(context.Request))
      {
        useCandidate = false;
        setCooie = false;
      }
      
      // 6. Filter out bots
      else if (!options.Candidate.AllowBots 
        && IsSearchEngineBot(context.Request))
      {
        useCandidate = false;
        setCookie = false;
      }
      
      // 7. Split traffic into buckets
      else
      {
        var weighting = settings.Candidate.Weighting;
        useCandidate = (weighting == 0D
          ? false
          : (weighting == 1D)
            ? true
            : Random.Shared.NextDouble() <= weighting);
      }
      
      // 8. Set our test cookie if we need to (mostly yes)
      if (setCookie)
      {
        context.Response.Cookies.Append(
          options.CookieName,
          useCandidate 
            ? options.Candidate.CookieValue 
            : options.Baseline.CookieVlaue,
          new CookieOptions
          {
            Domain = options.CookieDomain,
            HttpOnly = false,
            SameSite = SameSiteMode.None,
            Secure = true,
            Expires = DateTimeOffset.Now.AddYears(1)
          });
      }
      
      // 9. Lastly, reassign the cluster!
      context.ReassignProxyRequest(
        useCandidate ? baseline : candidate);
    }
  });
});

Test-specific code

There is a lot to unpack here, so let's break it down:

We grab our defined options as from the ApplicationServices scope, these are singleton so this is fine to hold onto. We're not dynamically reloading these settings in this example.
We use the IProxyStateLookup service to get the ClusterState items we need for later reassignment.
We grab both the cookie and parameter values representing out test cases.
We first check if we have a matching query parameter. E.g. https://www.mydomain.com?website_ver=v2. This will allow us to force an override of the specific version if we need to.
We then check if we have a matching cookie. This ensures repeat visitors (and subsequent requests for other types of resources, e.g. CSS and JS will get the same version)
If applicable, we filter out audiences like paid and social, and also bots like Googlebot, if you want to exclude them from the test.
If we do not have an existing cookie value, or parameter override, we need to split the results. We specifically checkif the weighting is 0%, which will always result in the baseline, or 100% which will always result in the candidate. If the weighting is less than 1.0, we use Random.Shared.NextDouble() to return a random value representing our placement in the test.
We then set a cookie if we need to (we skip this for channels, because if you later allow this and they had the cookie they'd never see the new version). We make sure the cookie is HttpOnly = false We do this because the cookie itself can then be reported through Google Analytics for segmenting and analysis.
Lastly, we use context.ReassignProxyRequest to tell YARP to use which ever cluster we need for testing.

Over the course of time, when you have confidence in your test candidate. You can incrementally increase traffic by adjusting the weighting.