Learn Span by Implementing a high-performance CSV Parser

Ever since I first heard about Span<T>, I’ve been wanting play around with using it. It’s a ref struct, so the semantics of using this type and the restrictions that go along with it are best understood by actually trying to use it. So I decided to build a simple CSV parser and see how the performance compared to other mainstream CSV parsers.

You can find the full code on GitHub and reference the package from NuGet.

As the main goal of this project was to exercise Span<T> and, as we’ll later see, Memory<T>, so for the initial version I decided to ignore some of the details around CSV parsing such as handling quotes, trimming values, and handling comments.

Initial Implementation

For my initial implementation of a parser, I simply created a class which took a ReadOnlyMemory<char> as the content and two ReadOnlySpan<char>s as the value and record delimiters for example a comma and newline respectively for a CSV. Then in the constructor I would walk the entire content saving the index and length for each value in each row as a List<List<long>>. Finally, There are members on the class to allow getting the number of records, the number of values in a particular record, and a particular value. Already you may have some questions, so let’s dig into the details.

Firstly, I needed to use a ReadOnlyMemory<char> to store the content, as ref structs like Span<T> are stack-only and so can only be used in method parameters and local variables, while Memory<T> is a “regular” struct which can be stored on the heap. ReadOnlyMemory<T> can be sliced just as Span<T> can be without allocating or copying buffers around.

I wanted the parser to allow random-access into the values of the CSV, so I needed to store the index where each value started in the content, as well the length. To save on memory, I decided to store this data in List<List<long>> where the outer list represented a list of the records, the inner list represented the list of the values within a record, and the long (int64) store for each value was constructed by fusing the index and length data, both ints (int32), using bit shifting.

After running some benchmarks, I found that already this implementation was better than other libraries, both in terms of speed and allocations.

Benchmark results after iteration 1:

Method	Data Size	Mean	Scaled	Allocated
My Parser	Large	1,221.2610 us	0.59	16111.39 KB
StringSplit	Large	93,149.630 us	1.00	85937.55 KB
TinyCsvParser	Large	169,713.472 us	1.82	25072.83 KB
FastCsvParser	Large	63,043.752 us	0.68	47227.2 KB
CsvTextFieldParser	Large	109,165.716 us	1.17	182031.39 KB
CsvHelper	Large	206,209.245 us	2.21	173448.95 KB
FileHelpers	Large	278,066.185 us	2.99	83321.9 KB

My Parser	Medium	307.973 us	0.76	157.07 KB
StringSplit	Medium	404.962 us	1.00	859.36 KB
TinyCsvParser	Medium	2,406.140 us	5.94	322.31 KB
FastCsvParser	Medium	717.625 us	1.77	796.22 KB
CsvTextFieldParser	Medium	1,066.085 us	2.63	1820.45 KB
CsvHelper	Medium	2,050.169 us	5.06	1747.09 KB
FileHelpers	Medium	2,069.213 us	5.11	851.4 KB

My Parser	Small	3.269 us	0.84	1.96 KB
StringSplit	Small	3.890 us	1.00	8.58 KB
TinyCsvParser	Small	1,059.621 us	272.45	74.74 KB
FastCsvParser	Small	22.811 us	5.87	203.89 KB
CsvTextFieldParser	Small	10.769 us	2.77	18.34 KB
CsvHelper	Small	22.715 us	5.84	28.7 KB
FileHelpers	Small	687.064 us	176.66	31.07 KB

Splitting Out a No-allocation Reader

Next I decided to extract the reader from the parser so that scenarios which didn’t need random-access to the data didn’t need the allocations from the index data. Basically the reader would advance the memory as the consumer read it, much like an IEnumerator<T> or TextReader. Additionally, the parser was simplified as it could use the reader to enumerate the values and simply worry about building up the index and length data for random-access.

After categorizing the benchmarks into Readers vs Parsers, the results showed that the reader implementation was far and away better than other libraries which exposed reader-like APIs. The closest contender was for a large data set using FastCsvParser, which only has a couple hundred downloads on NuGet.org as of this writing and still was took 3.5x longer to read the data. It requires zero allocations as it simply just stores and advances a span as well as a few other scalar fields.

Benchmark results after iteration 2:

Method	Data Size	Mean	Scaled	Allocated
My Reader	Large	16,349.969 us	1.00	0 B
FastCsvParser	Large	57,649.310 us	3.53	48360656 B
CsvTextFieldParser	Large	103,398.356 us	6.32	186400144 B
CsvHelper	Large	194,618.607 us	11.90	177611728 B

My Parser	Large	53,862.981 us	1.00	16498940 B
StringSplit	Large	87,535.705 us	1.63	88000046 B
TinyCsvParser	Large	157,429.493 us	2.92	25674400 B
FileHelpers	Large	259,745.145 us	4.82	85322480 B

My Reader	Medium	161.729 us	1.00	0 B
FastCsvParser	Medium	640.133 us	3.96	815328 B
CsvTextFieldParser	Medium	1,024.653 us	6.34	1864144 B
CsvHelper	Medium	1,953.924 us	12.08	1789016 B

My Parser	Medium	321.522 us	1.00	160840 B
StringSplit	Medium	382.312 us	1.19	879984 B
TinyCsvParser	Medium	2,249.190 us	7.00	330045 B
FileHelpers	Medium	1,901.365 us	5.91	871837 B

My Reader	Small	1.647 us	1.00	0 B
FastCsvParser	Small	22.443 us	13.62	208784 B
CsvTextFieldParser	Small	10.202 us	6.19	18784 B
CsvHelper	Small	21.441 us	13.02	29392 B

My Parser	Small	3.385 us	1.00	2008 B
StringSplit	Small	3.703 us	1.09	8784 B
TinyCsvParser	Small	1,018.881 us	301.01	76532 B
FileHelpers	Small	664.814 us	196.41	31812 B

To `ref struct` or not to `ref struct`

I found that using the reader API was fairly difficult, and I was unable to write convenience subclasses (eg. CSV reader, TSV reader) due to the fact that the reader was defined as a ref struct so that it could hold other ref structs like ReadOnlySpan<char>. Because of this, the reader is unable to be stored as a field in a class, used in async APIs, or be on the heap in any other way. I decided that instead it could just hold a ReadOnlyMemory<char> and become a class for easier usage.

Unfortunately, based on the benchmark results, this affected performance non-trivially and slowed down the reader by ~50%. The reader was still faster than the other implementations I tested against, but it was interesting to see just how optimized Span<T> and ref struct in general are.

Because of this, I may consider bringing back the ref struct implementation for usages which aren’t limited by it.

Benchmark results after iteration 3:

Method	Data Size	Mean	Scaled	Allocated
My Reader	Large	24,929.372 us	1.00	0 B
FastCsvParser	Large	58,659.486 us	2.35	48360656 B
CsvTextFieldParser	Large	105,256.008 us	4.22	186400144 B
CsvHelper	Large	196,541.906 us	7.89	177611728 B

My Parser	Large	59,266.792 us	1.00	16499337 B
StringSplit	Large	89,254.220 us	1.51	88000057 B
TinyCsvParser	Large	160,947.603 us	2.72	25674926 B
FileHelpers	Large	265,223.622 us	4.48	85322864 B

My Reader	Medium	244.256 us	1.00	96 B
FastCsvParser	Medium	686.830 us	2.81	815328 B
CsvTextFieldParser	Medium	1,050.241 us	4.30	1864144 B
CsvHelper	Medium	1,957.324 us	8.01	1789016 B

My Parser	Medium	375.222 us	1.00	160936 B
StringSplit	Medium	394.238 us	1.05	879984 B
TinyCsvParser	Medium	2,311.955 us	6.16	330044 B
FileHelpers	Medium	2,014.732 us	5.37	871837 B

My Reader	Small	2.494 us	1.00	96 B
FastCsvParser	Small	22.896 us	9.18	208784 B
CsvTextFieldParser	Small	10.506 us	4.21	18784 B
CsvHelper	Small	21.624 us	8.67	29392 B

My Parser	Small	3.985 us	1.00	2104 B
StringSplit	Small	3.825 us	0.96	8784 B
TinyCsvParser	Small	1,050.156 us	263.60	76532 B
FileHelpers	Small	680.903 us	170.91	31812 B

Next Steps

While analyzing these benchmarks, I was surprised to find that, the simple and obvious string.Split implementation was better than most the other libraries I tested against. One explanation could be that those libraries are optimized for usability rather than performance, for example mapping the records to a class. However, at least in some cases this case be explained by “streamability”. In my benchmarks, I loaded the entire content into memory as one big string prior to running each benchmark. I suspect that in real-world conditions like parsing the content from the disk or network, some of these libraries would perform much better. In fact, my implementation suffers from the same problem since it requires the entire buffer at once.

So the next step is to continue this exploration using System.IO.Pipelines. Stay tuned for a follow up post about that!

Initial Implementation

Splitting Out a No-allocation Reader

To ref struct or not to ref struct

Next Steps

To `ref struct` or not to `ref struct`