Learn Span<T> by Implementing a high-performance CSV Parser

Ever since I first heard about Span<T>, I’ve been wanting play around with using it. It’s a ref struct, so the semantics of using this type and the restrictions that go along with it are best understood by actually trying to use it. So I decided to build a simple CSV parser and see how the performance compared to other mainstream CSV parsers.

You can find the full code on GitHub and reference the package from NuGet.

As the main goal of this project was to exercise Span<T> and, as we’ll later see, Memory<T>, so for the initial version I decided to ignore some of the details around CSV parsing such as handling quotes, trimming values, and handling comments.

Initial Implementation

For my initial implementation of a parser, I simply created a class which took a ReadOnlyMemory<char> as the content and two ReadOnlySpan<char>s as the value and record delimiters for example a comma and newline respectively for a CSV. Then in the constructor I would walk the entire content saving the index and length for each value in each row as a List<List<long>>. Finally, There are members on the class to allow getting the number of records, the number of values in a particular record, and a particular value. Already you may have some questions, so let’s dig into the details.

Firstly, I needed to use a ReadOnlyMemory<char> to store the content, as ref structs like Span<T> are stack-only and so can only be used in method parameters and local variables, while Memory<T> is a “regular” struct which can be stored on the heap. ReadOnlyMemory<T> can be sliced just as Span<T> can be without allocating or copying buffers around.

I wanted the parser to allow random-access into the values of the CSV, so I needed to store the index where each value started in the content, as well the length. To save on memory, I decided to store this data in List<List<long>> where the outer list represented a list of the records, the inner list represented the list of the values within a record, and the long (int64) store for each value was constructed by fusing the index and length data, both ints (int32), using bit shifting.

After running some benchmarks, I found that already this implementation was better than other libraries, both in terms of speed and allocations.

Benchmark results after iteration 1:

Method Data Size Mean Scaled Allocated
My Parser Large 1,221.2610 us 0.59 16111.39 KB
StringSplit Large 93,149.630 us 1.00 85937.55 KB
TinyCsvParser Large 169,713.472 us 1.82 25072.83 KB
FastCsvParser Large 63,043.752 us 0.68 47227.2 KB
CsvTextFieldParser Large 109,165.716 us 1.17 182031.39 KB
CsvHelper Large 206,209.245 us 2.21 173448.95 KB
FileHelpers Large 278,066.185 us 2.99 83321.9 KB
My Parser Medium 307.973 us 0.76 157.07 KB
StringSplit Medium 404.962 us 1.00 859.36 KB
TinyCsvParser Medium 2,406.140 us 5.94 322.31 KB
FastCsvParser Medium 717.625 us 1.77 796.22 KB
CsvTextFieldParser Medium 1,066.085 us 2.63 1820.45 KB
CsvHelper Medium 2,050.169 us 5.06 1747.09 KB
FileHelpers Medium 2,069.213 us 5.11 851.4 KB
My Parser Small 3.269 us 0.84 1.96 KB
StringSplit Small 3.890 us 1.00 8.58 KB
TinyCsvParser Small 1,059.621 us 272.45 74.74 KB
FastCsvParser Small 22.811 us 5.87 203.89 KB
CsvTextFieldParser Small 10.769 us 2.77 18.34 KB
CsvHelper Small 22.715 us 5.84 28.7 KB
FileHelpers Small 687.064 us 176.66 31.07 KB

Splitting Out a No-allocation Reader

Next I decided to extract the reader from the parser so that scenarios which didn’t need random-access to the data didn’t need the allocations from the index data. Basically the reader would advance the memory as the consumer read it, much like an IEnumerator<T> or TextReader. Additionally, the parser was simplified as it could use the reader to enumerate the values and simply worry about building up the index and length data for random-access.

After categorizing the benchmarks into Readers vs Parsers, the results showed that the reader implementation was far and away better than other libraries which exposed reader-like APIs. The closest contender was for a large data set using FastCsvParser, which only has a couple hundred downloads on NuGet.org as of this writing and still was took 3.5x longer to read the data. It requires zero allocations as it simply just stores and advances a span as well as a few other scalar fields.

Benchmark results after iteration 2:

Method Data Size Mean Scaled Allocated
My Reader Large 16,349.969 us 1.00 0 B
FastCsvParser Large 57,649.310 us 3.53 48360656 B
CsvTextFieldParser Large 103,398.356 us 6.32 186400144 B
CsvHelper Large 194,618.607 us 11.90 177611728 B
My Parser Large 53,862.981 us 1.00 16498940 B
StringSplit Large 87,535.705 us 1.63 88000046 B
TinyCsvParser Large 157,429.493 us 2.92 25674400 B
FileHelpers Large 259,745.145 us 4.82 85322480 B
My Reader Medium 161.729 us 1.00 0 B
FastCsvParser Medium 640.133 us 3.96 815328 B
CsvTextFieldParser Medium 1,024.653 us 6.34 1864144 B
CsvHelper Medium 1,953.924 us 12.08 1789016 B
My Parser Medium 321.522 us 1.00 160840 B
StringSplit Medium 382.312 us 1.19 879984 B
TinyCsvParser Medium 2,249.190 us 7.00 330045 B
FileHelpers Medium 1,901.365 us 5.91 871837 B
My Reader Small 1.647 us 1.00 0 B
FastCsvParser Small 22.443 us 13.62 208784 B
CsvTextFieldParser Small 10.202 us 6.19 18784 B
CsvHelper Small 21.441 us 13.02 29392 B
My Parser Small 3.385 us 1.00 2008 B
StringSplit Small 3.703 us 1.09 8784 B
TinyCsvParser Small 1,018.881 us 301.01 76532 B
FileHelpers Small 664.814 us 196.41 31812 B

To ref struct or not to ref struct

I found that using the reader API was fairly difficult, and I was unable to write convenience subclasses (eg. CSV reader, TSV reader) due to the fact that the reader was defined as a ref struct so that it could hold other ref structs like ReadOnlySpan<char>. Because of this, the reader is unable to be stored as a field in a class, used in async APIs, or be on the heap in any other way. I decided that instead it could just hold a ReadOnlyMemory<char> and become a class for easier usage.

Unfortunately, based on the benchmark results, this affected performance non-trivially and slowed down the reader by ~50%. The reader was still faster than the other implementations I tested against, but it was interesting to see just how optimized Span<T> and ref struct in general are.

Because of this, I may consider bringing back the ref struct implementation for usages which aren’t limited by it.

Benchmark results after iteration 3:

Method Data Size Mean Scaled Allocated
My Reader Large 24,929.372 us 1.00 0 B
FastCsvParser Large 58,659.486 us 2.35 48360656 B
CsvTextFieldParser Large 105,256.008 us 4.22 186400144 B
CsvHelper Large 196,541.906 us 7.89 177611728 B
My Parser Large 59,266.792 us 1.00 16499337 B
StringSplit Large 89,254.220 us 1.51 88000057 B
TinyCsvParser Large 160,947.603 us 2.72 25674926 B
FileHelpers Large 265,223.622 us 4.48 85322864 B
My Reader Medium 244.256 us 1.00 96 B
FastCsvParser Medium 686.830 us 2.81 815328 B
CsvTextFieldParser Medium 1,050.241 us 4.30 1864144 B
CsvHelper Medium 1,957.324 us 8.01 1789016 B
My Parser Medium 375.222 us 1.00 160936 B
StringSplit Medium 394.238 us 1.05 879984 B
TinyCsvParser Medium 2,311.955 us 6.16 330044 B
FileHelpers Medium 2,014.732 us 5.37 871837 B
My Reader Small 2.494 us 1.00 96 B
FastCsvParser Small 22.896 us 9.18 208784 B
CsvTextFieldParser Small 10.506 us 4.21 18784 B
CsvHelper Small 21.624 us 8.67 29392 B
My Parser Small 3.985 us 1.00 2104 B
StringSplit Small 3.825 us 0.96 8784 B
TinyCsvParser Small 1,050.156 us 263.60 76532 B
FileHelpers Small 680.903 us 170.91 31812 B

Next Steps

While analyzing these benchmarks, I was surprised to find that, the simple and obvious string.Split implementation was better than most the other libraries I tested against. One explanation could be that those libraries are optimized for usability rather than performance, for example mapping the records to a class. However, at least in some cases this case be explained by “streamability”. In my benchmarks, I loaded the entire content into memory as one big string prior to running each benchmark. I suspect that in real-world conditions like parsing the content from the disk or network, some of these libraries would perform much better. In fact, my implementation suffers from the same problem since it requires the entire buffer at once.

So the next step is to continue this exploration using System.IO.Pipelines. Stay tuned for a follow up post about that!