Learn Span<T> by Implementing a high-performance CSV Parser

Ever since I first heard about Span<T>, I’ve been wanting play around with using it. It’s a ref struct, so the semantics of using this type and the restrictions that go along with it are best understood by actually trying to use it. So I decided to build a simple CSV parser and see how the performance compared to other mainstream CSV parsers.

You can find the full code on GitHub and reference the package from NuGet.

As the main goal of this project was to exercise Span<T> and, as we’ll later see, Memory<T>, so for the initial version I decided to ignore some of the details around CSV parsing such as handling quotes, trimming values, and handling comments.

Initial Implementation

For my initial implementation of a parser, I simply created a class which took a ReadOnlyMemory<char> as the content and two ReadOnlySpan<char>s as the value and record delimiters for example a comma and newline respectively for a CSV. Then in the constructor I would walk the entire content saving the index and length for each value in each row as a List<List<long>>. Finally, There are members on the class to allow getting the number of records, the number of values in a particular record, and a particular value. Already you may have some questions, so let’s dig into the details.

Firstly, I needed to use a ReadOnlyMemory<char> to store the content, as ref structs like Span<T> are stack-only and so can only be used in method parameters and local variables, while Memory<T> is a “regular” struct which can be stored on the heap. ReadOnlyMemory<T> can be sliced just as Span<T> can be without allocating or copying buffers around.

I wanted the parser to allow random-access into the values of the CSV, so I needed to store the index where each value started in the content, as well the length. To save on memory, I decided to store this data in List<List<long>> where the outer list represented a list of the records, the inner list represented the list of the values within a record, and the long (int64) store for each value was constructed by fusing the index and length data, both ints (int32), using bit shifting.

After running some benchmarks, I found that already this implementation was better than other libraries, both in terms of speed and allocations.

Benchmark results after iteration 1:

Method Data Size Mean Scaled Allocated
My Parser Large 1,221.2610 us 0.59 16111.39 KB
StringSplit Large 93,149.630 us 1.00 85937.55 KB
TinyCsvParser Large 169,713.472 us 1.82 25072.83 KB
FastCsvParser Large 63,043.752 us 0.68 47227.2 KB
CsvTextFieldParser Large 109,165.716 us 1.17 182031.39 KB
CsvHelper Large 206,209.245 us 2.21 173448.95 KB
FileHelpers Large 278,066.185 us 2.99 83321.9 KB
My Parser Medium 307.973 us 0.76 157.07 KB
StringSplit Medium 404.962 us 1.00 859.36 KB
TinyCsvParser Medium 2,406.140 us 5.94 322.31 KB
FastCsvParser Medium 717.625 us 1.77 796.22 KB
CsvTextFieldParser Medium 1,066.085 us 2.63 1820.45 KB
CsvHelper Medium 2,050.169 us 5.06 1747.09 KB
FileHelpers Medium 2,069.213 us 5.11 851.4 KB
My Parser Small 3.269 us 0.84 1.96 KB
StringSplit Small 3.890 us 1.00 8.58 KB
TinyCsvParser Small 1,059.621 us 272.45 74.74 KB
FastCsvParser Small 22.811 us 5.87 203.89 KB
CsvTextFieldParser Small 10.769 us 2.77 18.34 KB
CsvHelper Small 22.715 us 5.84 28.7 KB
FileHelpers Small 687.064 us 176.66 31.07 KB

Splitting Out a No-allocation Reader

Next I decided to extract the reader from the parser so that scenarios which didn’t need random-access to the data didn’t need the allocations from the index data. Basically the reader would advance the memory as the consumer read it, much like an IEnumerator<T> or TextReader. Additionally, the parser was simplified as it could use the reader to enumerate the values and simply worry about building up the index and length data for random-access.

After categorizing the benchmarks into Readers vs Parsers, the results showed that the reader implementation was far and away better than other libraries which exposed reader-like APIs. The closest contender was for a large data set using FastCsvParser, which only has a couple hundred downloads on NuGet.org as of this writing and still was took 3.5x longer to read the data. It requires zero allocations as it simply just stores and advances a span as well as a few other scalar fields.

Benchmark results after iteration 2:

Method Data Size Mean Scaled Allocated
My Reader Large 16,349.969 us 1.00 0 B
FastCsvParser Large 57,649.310 us 3.53 48360656 B
CsvTextFieldParser Large 103,398.356 us 6.32 186400144 B
CsvHelper Large 194,618.607 us 11.90 177611728 B
My Parser Large 53,862.981 us 1.00 16498940 B
StringSplit Large 87,535.705 us 1.63 88000046 B
TinyCsvParser Large 157,429.493 us 2.92 25674400 B
FileHelpers Large 259,745.145 us 4.82 85322480 B
My Reader Medium 161.729 us 1.00 0 B
FastCsvParser Medium 640.133 us 3.96 815328 B
CsvTextFieldParser Medium 1,024.653 us 6.34 1864144 B
CsvHelper Medium 1,953.924 us 12.08 1789016 B
My Parser Medium 321.522 us 1.00 160840 B
StringSplit Medium 382.312 us 1.19 879984 B
TinyCsvParser Medium 2,249.190 us 7.00 330045 B
FileHelpers Medium 1,901.365 us 5.91 871837 B
My Reader Small 1.647 us 1.00 0 B
FastCsvParser Small 22.443 us 13.62 208784 B
CsvTextFieldParser Small 10.202 us 6.19 18784 B
CsvHelper Small 21.441 us 13.02 29392 B
My Parser Small 3.385 us 1.00 2008 B
StringSplit Small 3.703 us 1.09 8784 B
TinyCsvParser Small 1,018.881 us 301.01 76532 B
FileHelpers Small 664.814 us 196.41 31812 B

To ref struct or not to ref struct

I found that using the reader API was fairly difficult, and I was unable to write convenience subclasses (eg. CSV reader, TSV reader) due to the fact that the reader was defined as a ref struct so that it could hold other ref structs like ReadOnlySpan<char>. Because of this, the reader is unable to be stored as a field in a class, used in async APIs, or be on the heap in any other way. I decided that instead it could just hold a ReadOnlyMemory<char> and become a class for easier usage.

Unfortunately, based on the benchmark results, this affected performance non-trivially and slowed down the reader by ~50%. The reader was still faster than the other implementations I tested against, but it was interesting to see just how optimized Span<T> and ref struct in general are.

Because of this, I may consider bringing back the ref struct implementation for usages which aren’t limited by it.

Benchmark results after iteration 3:

Method Data Size Mean Scaled Allocated
My Reader Large 24,929.372 us 1.00 0 B
FastCsvParser Large 58,659.486 us 2.35 48360656 B
CsvTextFieldParser Large 105,256.008 us 4.22 186400144 B
CsvHelper Large 196,541.906 us 7.89 177611728 B
My Parser Large 59,266.792 us 1.00 16499337 B
StringSplit Large 89,254.220 us 1.51 88000057 B
TinyCsvParser Large 160,947.603 us 2.72 25674926 B
FileHelpers Large 265,223.622 us 4.48 85322864 B
My Reader Medium 244.256 us 1.00 96 B
FastCsvParser Medium 686.830 us 2.81 815328 B
CsvTextFieldParser Medium 1,050.241 us 4.30 1864144 B
CsvHelper Medium 1,957.324 us 8.01 1789016 B
My Parser Medium 375.222 us 1.00 160936 B
StringSplit Medium 394.238 us 1.05 879984 B
TinyCsvParser Medium 2,311.955 us 6.16 330044 B
FileHelpers Medium 2,014.732 us 5.37 871837 B
My Reader Small 2.494 us 1.00 96 B
FastCsvParser Small 22.896 us 9.18 208784 B
CsvTextFieldParser Small 10.506 us 4.21 18784 B
CsvHelper Small 21.624 us 8.67 29392 B
My Parser Small 3.985 us 1.00 2104 B
StringSplit Small 3.825 us 0.96 8784 B
TinyCsvParser Small 1,050.156 us 263.60 76532 B
FileHelpers Small 680.903 us 170.91 31812 B

Next Steps

While analyzing these benchmarks, I was surprised to find that, the simple and obvious string.Split implementation was better than most the other libraries I tested against. One explanation could be that those libraries are optimized for usability rather than performance, for example mapping the records to a class. However, at least in some cases this case be explained by “streamability”. In my benchmarks, I loaded the entire content into memory as one big string prior to running each benchmark. I suspect that in real-world conditions like parsing the content from the disk or network, some of these libraries would perform much better. In fact, my implementation suffers from the same problem since it requires the entire buffer at once.

So the next step is to continue this exploration using System.IO.Pipelines. Stay tuned for a follow up post about that!

A Tale of Performance and Bad SQL Usage

The other day I was looking at telemetry for one of my websites to try and finally figure out why I was running into my telemetry cap, which I had been putting off for a while. I’m on the free Application Insights tier, so I only get 1 GB a month and was needing to sample to only 4% of traffic to remain under the cap, which didn’t seem right considering the traffic that site gets.

What I found is that the dependencies was just being spammed with similar SQL calls:

And based on the behavior of the website, I knew that all of those calls listed were part of a single web request and SQL transaction. At the time, I originally implemented this, it seems that I was lazy and just put a bunch of separate INSERT INTO statements in one transaction. Note that I manually craft the SQL queries in my app (I know, the horror!) because Entity Framework tends to have pretty bad performance despite being a pretty good programming interface, and stored procedures felt a bit awkward to use when the inputs are tables. Still, judgement may still be warranted this practice, but it’s what I have right now.

In any case, I ended up combining the INSERT INTO statements for similar tables, which cut the number of SQL calls from 40 to 7 for this sort of request. The results were a drastic decrease in the number of SQL calls and thus telemetry entries:

This actually was enough of a cut for me to increase my sampling to about 25%. That’s still lower than I’d like, but it’s a great improvement. The majority of the remaining excess is the SQL calls made by Entity Framework since I’m using it for Identity and Authentication.

One very interesting and initially unexpected side-effect of this change is that each of the individual SQL calls seems to take approximately the same amount of time as before, about 40 or so milliseconds per call (with some outliers where I need to use HOLDLOCK), so this actually had an additional benefit of cutting end to end latency for the request to about one third of what it previously was.

I’m considering taking this even further and composing one massive SQL query, including the transaction statements to just have one single SQL call per request. The moral of the story though is to just combine calls when it’s practical to do so. In my case, it was pretty straightforward and good sense to combine a loop of INSERT INTO statements into a single statement, as INSERT INTO easily allows for that.

Stored procedures are the obvious solution to this problem, and are one of the reasons they’re highly recommended by pretty much anyone who knows anything about SQL, including some readers of this article who I’m sure are livid by avoidance of them, but I personally find the awkwardness of use combined with the fact that my business logic ends up being split between my app and database to be unattractive enough to pass up. This investigation showed me however that when choosing that path one needs to be very careful when crafting their SQL queries and be cognizant of the number of individual calls being made to avoid performance pitfalls.

Trimming Unnecessary Dependencies from Projects

When you’re working in large repositories with hundreds of MsBuild projects, you’re bound to have fairly complex build graphs. Over time, these can devolve and you may end up with lots of dependencies between projects which are no longer needed. This can cause builds to slow down as they are less parallelizable, and the developer experience can suffer as you unnecessarily rebuild libraries which have falsely depend on libraries you changed.

Luckily, the Roslyn compiler is smart enough to only include references which are actually used in an assembly’s metadata. So you can actually compare the references which are passed to the compiler against the references that actually make it into the compiled assembly.

You can figure out which references make it into the assembly using a tool like ILSpy or dotPeek, or even use the reflection method Assembly.GetReferencedAssemblies.

There are three different ways that references can be included in your project: Reference, ProjectReference, and PackageReference.

References and Project References

References and Project References are both fairly straightforward. Each reference gets resolved, usually by its HintPath if it isn’t a framework assembly, and passed to the compiler. Similarly for project references, MsBuild does an inner build to determine each project reference’s target assembly and passes that to the compiler for the outer build. For both of these, it’s pretty straightforward to just compare each reference passed to the compiler and check whether it made it into the assembly’s metadata.

Transitivity

But wait, what if an indirect dependency is required at run time, but not at compile time? This can happen if you have a project with a dependency which itself has a dependency which isn’t directly required by the original project. To build the project, the indirect dependency may not be needed, however if the project produces something runnable like an exe or a unit test assembly, certain code paths may require the indirect dependency to get loaded into the App Domain.

So for project which produce something runnable, we have to consider all transitive references and not just the references the main assembly has.

Package References

Package references are where things get a little complicated. In the new Sdk-style projects, MsBuild and NuGet work together to form the entire dependency graph for the package references you specified, collects all the assemblies for all of those packages, and passes every single one of them to the compiler. With the packaging of the framework assemblies themselves (ie. netstandard2.0 and netcoreapp2.0), this can get fairly huge. As an example, for a fairly simple web application I have, a whopping 366 /reference parameters are passed to csc.exe. As if things weren’t complicated enough, some packages like Microsoft.AspNetCore.All are really just meta-packages which themselves don’t have any assemblies but instead just have a number of dependencies which do contain assemblies to be referenced. And then packages like Microsoft.Net.Compilers don’t add any references but instead provide additional build tooling by way of MsBuild props and targets.

So for package references, we have to account for any assembly in the package or any packages it depends on, and also for cases where they don’t provide references at all.

Introducing ReferenceTrimmer

If all of this seems overwhelming enough to make you not want to bother cleaning up your projects, fear not. I’ve create a little tool called ReferenceTrimmer which does all the work for you.

When you run it, it accounts for each of the things discussed above and prints out what it believes are unnecessary references, project references, and package references.

I’m sure it doesn’t cover all cases yet, and likely reports false-positives and false-negatives, but it was able to find some issues in some of my smaller projects even. Contributions are always welcome if you find that it could do better though!

Setting up a Roaming Developer Console

Have you ever wanted to have your personal scripts and aliases just always available to you in any console session? Well, it’s possible!

In my setup I’ll be using OneDrive, but the concept applies to any cloud storage that syncs to your disk like Google Drive or Dropbox.

Setting it up

The Command Prompt has a fairly unknown feature called Autorun, which allows for running a command every time cmd.exe starts. This is done via a registry key, but to make setup easy, you can write a script and put it in your cloud storage. You can take this a step further can have that Autorun script be in your cloud storage as well. This is the basis by which we’ll be creating the roaming dev console.

First, create a setup.cmd in your cloud storage. This script will need to be run once per machine you log into. In my example, I’m putting it under my OneDrive folder under Code\Scripts\setup.cmd.

@echo off

setlocal

REM The script to run each time cmd starts.
set initScript=%HOMEDRIVE%%HOMEPATH%\OneDrive\Code\Scripts\init.cmd

if not exist "%initScript%" (
    echo "%initScript%" does not exist!
    exit /B 1
)

REM Configure the current user's autorun
echo reg add "HKCU\Software\Microsoft\Command Processor" /v "Autorun" /d "\"%initScript%\"" /t REG_EXPAND_SZ /f
call reg add "HKCU\Software\Microsoft\Command Processor" /v "Autorun" /d "\"%initScript%\"" /t REG_EXPAND_SZ /f

echo Autorun configured!

REM init now to avoid the need to restart the console
echo Running "%initScript%" now
"%initScript%"

endlocal

This setup script simply adds the registry key which controls Command Prompt’s Autorun and points it to an init script under your cloud storage.

For those who prefer PowerShell, there is an equivalent concept to Autorun called Profiles. Instead of setting the registry key, add a PowerShell file at %UserProfile%\My Documents\WindowsPowerShell\profile.ps1. This will run initially in any PowerShell window you open. To make the roaming part work, have this profile.ps1 file call an init.ps1 under your cloud storage.

You can actually take this even further and have your setup script also configure other settings (user-wide git config), install various applications (git, nodejs, etc.), and anything else you may want to do once per computer you use.

The init script

Now that init script is configured to run in each Command Prompt or PowerShell window, what’re the kinds of things to add here?

One warning about this is that Autorun/Profile runs before every invocation of cmd or powershell. Thus it’s not recommended to do long-running work there or perhaps more importantly, don’t emit any output. Any application which may spawn child processing using cmd or powershell will incur the cost of your init script, and if it parses the output of the process it launched, it will see any output from your init script. So just keep it quiet, short, and sweet.

Here’s an example of what part of my init.cmd file looks like. In my example, I’m putting it under my OneDrive folder under Code\Scripts\init.cmd.

@echo off

SET PATH=%PATH%;%~dp0

DOSKEY ..=cd ..
DOSKEY n="%ProgramFiles(x86)%/Notepad++/notepad++.exe" $*

Again, for PowerShell users, you can just as easily put similar things in your init.ps1, like global functions.

While this doesn’t look like much, it sets up a few aliases for me, and adds my Code\Scripts OneDrive folder to my PATH. I have other convenience scripts
and tools which don’t require an install under this path as well:

To give an idea of some of the things I have in mine:

  • handle.exe – Helps identify which stubborn process has an open handle on a file.
  • kill.exe – Renamed from PsKill.exe. Easily kills processes
  • nuget.cmd – Calls Nuget\nuget.exe, the NuGet CLI
  • procexp.exe – A better task manager.
  • procmon.exe – Shows real-time file and registry accesses, process spawns, etc.

The beauty about this is that if you add a new tool or alias, your cloud storage will automatically sync to other machines and so will just be available automatically there!

Testing asynchronous code: async vs fake async

In the last post I explored implementing a mock which tested asynchronous code in a “fake” asynchronous way, and I promised to dive a little deeper into that concept and compare it with testing in an asynchronous way.

I say “fake” here because it’s still using async/await, but the way of testing is more of a step by step approach where the unit test ends up effectively waiting on each task or promise to finish before moving on.

Angular has really clear examples of each pattern, so let’s see the differences and compare the pros and cons for each.

Async

This is the patterns I think C# developers are more familiar with, although I’ve seen it used in TypeScript as well. It generally involves the three traditional sections: . Here’s an example of a test using this pattern:

it("should display some links when the user is not logged in", async(() => {
    // Arrange (some arranging was already done in beforeEach as well)
    let authenticationService = TestBed.get(AuthenticationService) as AuthenticationService;
    spyOn(authenticationService, "userInfo").and.returnValue(new BehaviorSubject(notLoggedInUser));

    // Act (detectChanges ends up triggering ngOnInit which actually does the work we're testing)
    fixture.detectChanges();
    fixture.whenStable().then(() => {
        fixture.detectChanges();

        // Assert
        let expectedLinks: { text: string, url: string }[] = [ /* Omitted for brevity */ ];
        let navItems = fixture.debugElement.queryAll(By.css(".nav-item"));
        expect(navItems).not.toBeNull();
        expect(navItems.length).toEqual(expectedLinks.length);

        for (let i = 0; i < navItems.length; i++) {
            let link = navItems[i].query(By.css(".nav-link"));
            expect(link).not.toBeNull();

            let expectations = expectedLinks[i];
            expect(link.nativeElement.innerText).toEqual(expectations.text);
            expect(link.attributes.href).toEqual(expectations.url);
        }

        expect(authenticationService.userInfo).toHaveBeenCalled();
    });
}));

I won’t go into detail about the test as it was pulled from a real example, but the pattern should look familiar. You start in the Arrange section mocking calls and setting up the test, then you trigger the code that’s actually being tested, then finally you assert your expectations on any results or side-effects.

A minor note for those unfamiliar with the use of async here, it’s equivalent to using the done construct and calling done after the last expectation. Or for any C# developers, it’s equivalent to returning a Task from your test. The test framework just waits for all async work to finish before the test ends.

Fake Async

This pattern is used pretty widely in Angular apps, especially when mocking http calls. It’s also how I implemented my Testing.HttpClient C# library. It differs from the usual “Arrange, Act, and Assert” and instead interleaves and combines all three. Here’s an example:

it("should return some data", fakeAsync(() => {
    // Act
    let response: IUser;
    let error: HttpErrorResponse;
    userService.getUser(userName)
        .then((r: IUser) => response = r)
        .catch((e: HttpErrorResponse) => error = e);

    // Arrange, with some implicit assertions
    let expectedResponse: IUser = { name: "someName" };
    let request = httpMock.expectOne({ method: "get", url: `/api/users/${userName}` });
    request.flush(expectedResponse);
    tick();

    // Assert
    expect(response).toEqual(expectedResponse, "should return the expected response");
    expect(error).toBeUndefined();
    expect(httpErrorHandlerService.logError).not.toHaveBeenCalled();
    expect(httpErrorHandlerService.getValidationErrors).not.toHaveBeenCalled();
}));

In this example, the test being called is made immediately (although there is some setup in a beforeEach not shown). It returns a promise, but the promise is unresolved at that point. Next there is an implicit assertion about what a request that the userService.getUser looked like and a mock response is provided. Then the promises are flushed, which is why these are fake async tests. Finally, assertions are made.

Comparing the two

Ironically, the Async pattern more closely follows how you should test synchronous code. Both tend to follow the “Arrange, Act, and Assert” structure. This pattern set up a bunch of mocks and rules beforehand to “wind up” the test, then lets the code under test run to completion, the expectations are checked. This pattern is good for testing how code behaves to inputs and the results it produces.

The fake async pattern is a lot more granular as you essentially interleaves the test code with the code being tested. It’s like stepping through a debugger and providing mocks and asserting as you go, line by line. This pattern is better at testing code that creates side-effects, or code that needs to be done in a specific order. Anecdotally though, the granularity can come at a cost of brittleness. Refactoring can easily break tests even though the code remained functionally correct, so you can end up wasting time fixing tests.

One downside of the fake async pattern is that it tends to require extra code to get the async parts of the code flushed. In Angular tests, the tick function does this magic for you, as Angular is able to wrap all Promises and so tick can wait for their completion for you. In C#, this is fairly cumbersome, as there isn’t a way to provide a replacement to the default TaskFactory, so you end up having to expose TaskCompletionSource objects from all your mocks to provide the flushing functionality.

Additionally, given the right tools, any fake sync test could be made into an async test. In Angular instead of using the HttpTestingController, you could provide your own mock HttpClient object and use spies to both match specific request patterns and provide responses.

Verdict

When writing up these examples, I actually had attempted to use the same test written in each pattern, but one would always feel a little awkward. There is definitely something to be said about using the right tool for the job, so in Angular tests if you find yourself testing code that makes http calls or uses timers, feel free to use the fake async pattern. The magic is provided for you, so you might as well use it.

However, I also feel that usage of fake async is fairly niche. It doesn’t work well in all scenarios, and it’s not even a very common pattern in some languages like C#. The async pattern on the other hand works well in almost all cases, and so it’s the pattern I’d suggest any well-rounded software engineer to master. It’s a commonly used, widely applicable, well-known, and well-understood pattern.

tl;dr if I was stuck on a desert island with only one unit testing pattern, I’d pick the async pattern over the fake async one.

HttpClient and Unit Testing

If you’ve written C# which uses HttpClient and tried to unit test it, you probably noticed that it’s not the easiest thing in the world to mock out. If this github issue is any indication, many developers don’t find it particularly easy either. It’s certainly not impossible, but it requires learning about some of the internals of HttpClient, such as HttpMessageHandler as the HttpClient is designed as just a wrapper around these things.

Using Moq

Over time I’ve personally solved this problem again and again for different projects by implementing a mock HttpMessageHandler, either explicitly or using a framework like Moq to setup the method calls and return values. The problem is that with these solutions, I found that it was either too loose in terms of ability to validate the code, or too complicated to get the level of strictness I wanted.

Here’s an example from the github issue above for how to mock a request using Moq:

var requestUri = new Uri("http://google.com");
var expectedResponse = "Response text";

var mockResponse = new HttpResponseMessage(HttpStatusCode.OK) { Content = new StringContent(expectedResponse) };
var mockHandler = new Mock<HttpClientHandler>();
mockHandler
    .Protected()
    .Setup<Task<HttpResponseMessage>>(
        "SendAsync",
        It.Is<HttpRequestMessage>(message => message.RequestUri == requestUri),
        It.IsAny<CancellationToken>())
    .Returns(Task.FromResult(mockResponse));

var httpClient = new HttpClient(mockHandler.Object);
var result = await httpClient.GetStringAsync(requestUri).ConfigureAwait(false);
Assert.AreEqual(expectedResponse, result);

That sets up a mock which matches a call to a specific url and returns a mock response. It works well enough, but surely there’s a better way.

Patterns in other languages/frameworks

While working on some tests for an Angular project, I realized how much nicer their http testing library is and how easy it was to expect specific requests, do additional validations on the details of the request, and respond. Here’s an example:

let logInSuccessful = false;
let error: Error;

let httpMock = TestBed.get(HttpTestingController) as HttpTestingController;
let authenticationService = TestBed.get(AuthenticationService) as AuthenticationService;
authenticationService.logInWithPassword("someUsername", "somePassword")
  .then(() => logInSuccessful = true)
  .catch(e => error = e);

let request = httpMock.expectOne({ method: "post", url: "/api/auth/token" });
expect(request.request.body).toEqual("grant_type=password&username=someUsername&password=somePassword&scope=openid%20offline_access");

let response = createMockResponse();
request.flush(response);
tick();

expect(logInSuccessful).toEqual(true);
expect(error).toBeUndefined();

httpMock.verify();

The Solution

So with that, I decided to build a .NET library to implement a similar pattern as what I was using in the Angular test. It’s on Nuget.org as Testing.HttpClient (Github).

Here’s an example of a unit test using the library:

[TestMethod]
public async Task ExampleTest()
{
    using (var http = new HttpClientTestingFactory())
    {
        var worker = new Worker(http.HttpClient);

        // Make the call, but don't await the task
        var resultTask = worker.FetchDataAsync();

        // Expect the request and respond to it
        var request = http.Expect("http://some-website.com/some-path");
        request.Respond(HttpStatusCode.OK, "123");

        // Await the result and assert on it
        var result = await resultTask;
        Assert.AreEqual(123, result);
    }
}

Design tradeoffs

Because I based this library heavily off Angular’s http testing, the Expect call must be made after the request is made or else it won’t match a request and fail the test. This means that your test needs to separate the calls that start the http call and the one that awaits it. This design decision makes it easy to inspect the actual full request and simplifies the mocking logic.

However, this admittedly may be a little awkward for C# developers who haven’t seen this pattern which is heavily used in Angular testing. I describe this pattern as “synchronously testing asynchronous code”. For those familiar with Angular, this is the difference between using async and fakeAsync.

In a future post I plan on diving into the pros and cons of both testing approaches in more depth.