Configuration
Overview
Aside from attributes, configuration is mainly done through the CsvOptions<T> class. Similar to System.Text.Json
, options instances should be configured once and reused for the application lifetime. After an options instance is used to read or write CSV, it cannot be modified (see IsReadOnly). The options instances are thread-safe to use, but not to configure. You can call CsvOptions<T>.MakeReadOnly() to ensure thread safety by making the options instance immutable.
For convenience, a copy-constructor CsvOptions(CsvOptions<T>) is available, for example if you need slightly different configuration for reading and writing. This copies over all the configurable properties.
Default Options
The static CsvOptions<T>.Default property provides access to default configuration. This is used when null
options are passed to CsvReader or CsvWriter. The default options are read-only and have identical configuration to a new instance created with new()
.
Default options are only available for char (UTF-16) and byte (UTF-8).
CsvConverter<byte, int> intConverter = CsvOptions<byte>.Default.GetConverter<int>();
Dialect
Delimiter
The field separator is configured with CsvOptions<T>.Delimiter. The default value is ,
(comma). Other common values include \t
and ;
.
Quote
The string delimiter is configured with CsvOptions<T>.Quote. The default value is "
(double-quote). CSV fields wrapped in quotes (also referred to as strings) can contain otherwise special characters such as delimiters. A quote inside a string is escaped with another quote, e.g. "James ""007"" Bond"
.
Newline
The record separator is configured with CsvOptions<T>.Newline. The default value is \r\n
. FlameCSV is lenient when parsing newlines, and a \r\n
-configured reader can read only \n
or \r
. The value is used as-is when writing. If you know the data is always in a specific format, you can set the value to \n
or \r
to squeeze out an extra 1-2% of performance. You can use any custom newline as well, as long as it is 1 or 2 characters long, and does not contain two of the same character (such as \r\r
or \n\n
).
Whitespace
Significant whitespace is configured with CsvOptions<T>.Whitespace, and determines how fields are trimmed when reading, or if a field needs quoting when writing. Characters present in the whitespace string are trimmed from each field, unless they are in a quoted field (whitespace is not trimmed inside strings). When writing, values containing leading or trailing whitespace are wrapped in quotes if automatic field quoting is enabled. The default value is null/empty, which means whitespace is not considered significant.
Important
The concept of fully configurable whitespace is likely being removed in 0.4.0 for confirmity with other widely used CSV libraries, and only
(space) will be considered whitespace. This change aims to simplify the configuration and improve compatibility with existing CSV libraries.
Escape
An explicit escape character CsvOptions<T>.Escape can be set to a non-null value to escape any character following the escape character. The default value is null, which follows the RFC 4180 spec and wraps values in strings, and escapes quotes with another quote.
Any field containing the escape character must be wrapped in quotes. The escape character itself is escaped by doubling it, e.g., "\\"
.
Tip
Due to the rarity of this non-standard format, SIMD accelerated parsing is not supported when using an escape character.
Additional info
Internally, FlameCsv uses the CsvDialect<T> struct to handle the configured dialect. It is constructed from the options when they are used (this makes the options immutable), and contains the configured values and other things related to parsing, such as SearchValues<T> used internally in parsing. The CsvDialect<T>.IsAscii property can be used to ensure that SIMD-accelerated parsing is used.
CsvOptions<byte> options = new()
{
Delimiter = '\t',
Quote = '"',
Newline = "\r\n",
Whitespace = " ",
Escape = '\\',
};
Header
The CsvOptions<T>.HasHeader property is true
by default, which expects a header record on the first line/record. Header names are matched using the Comparer-property, which defaults to StringComparer.OrdinalIgnoreCase.
For more information on which methods transcode the data into string, see Transcoding.
const string csv = "id,name\n1,Bob\n2,Alice\n";
List<User> users = CsvReader.Read(csv, new CsvOptions<char> { HasHeader = true });
Parsing and formatting fields
See Converters for an overview on converter configuration, implementation, and what converters are supported by default.
Quoting fields when writing
The CsvFieldQuoting enumeration and CsvOptions<T>.FieldQuoting property are used to configure the behavior when writing CSV. The default, CsvFieldQuoting.Auto only quotes fields if they contain special characters or whitespace.
// quote all fields, e.g., for noncompliant 3rd party libraries
StringBuilder result = CsvWriter.WriteToString(
[new User(1, "Bob", true)],
new CsvOptions<char>() { FieldQuoting = CsvFieldQuoting.Always });
// "id","name","isadmin"
// "1","Bob","true"
If you are 100% sure your data does not contain any special characters, you can set it to CsvFieldQuoting.Never to squeeze out a little bit of performance by omitting the check if each written field needs to be quoted.
Skipping records or resetting headers
The CsvOptions<T>.RecordCallback property is used to configure a custom callback. The argument contains metadata about the current record, and can be used to skip records or reset the header record. This can be used to read multiple different "documents" out of the same data stream.
Below is an example of a callback that resets the headers and bindings on empty lines, and skips records that start with #
.
CsvOptions<char> options = new()
{
RecordCallback = (ref readonly CsvRecordCallbackArgs<char> args) =>
{
if (args.IsEmpty)
{
// reset the current headers and bindings on empty lines
args.HeaderRead = false;
}
else if (args.Record[0] == '#')
{
// skip records that start with #
args.SkipRecord = true;
}
}
};
Warning
Comments are not yet fully supported by FlameCSV (see issue).
For example, even if you configure the callback to skip rows that start with #
, the rows are still parsed and expected to be properly structured CSV (e.g., no unbalanced quotes).
Field count validation
CsvOptions<T>.ValidateFieldCount can be used to validate the field count both when reading and writing.
When reading CsvValueRecord<T>, setting the property to true
ensures that all records have the same field count as the first record.
The expected field count is reset if you reset the headers with a callback.
This property also ensures that all records written with CsvWriter<T> have the same field count.
Alternatively, you can use the ExpectedFieldCount-property. The property can also be used to reset the expected count by setting it to null
,
for example when writing multiple CSV documents into one output.
Advanced topics
NativeAOT
Since any implementation of CsvConverterFactory<T> (including built-in nullable and enum factories) can potentially require unreferenced types or dynamic code, the default CsvOptions<T>.GetConverter<TResult>() method is not AOT-compatible.
Use CsvOptions<T>.Aot to retrieve a wrapper around the configured converters, which provides convenience methods to safely retrieve converters for types known at runtime. See the documentation on methods of CsvOptions<T>.AotSafeConverters for more info. This property is used by the source generator.
// aot-safe default nullable and enum converters if not configured by user
CsvConverter<char, int?> c1 = options.Aot.GetOrCreateNullable(static o => o.Aot.GetConverter<int>());
CsvConverter<char, DayOfWeek> c2 = options.Aot.GetOrCreateEnum<DayOfWeek>();
Parsing performance and read-ahead
FlameCSV uses SIMD operations to read ahead multiple records. The performance benefits are substantial, but if you wish to turn this option off, you can do so with the CsvOptions<T>.NoReadAhead property. One possible justification would be to halt reading when an unparsable field is present, or to avoid reading ahead large amounts of data if the CSV is broken. Flaky data with e.g., unpaired quotes can result in excessive amount of data being read before the error materializes. However, FlameCsv is very performant in tokenizing the CSV, and uses limited read-ahead buffers, this shouldn't be a problem.
For SIMD operations and read-ahead to work, the configured dialect must consist only of ASCII characters (value 127 or lower). You can ensure the fast path will be used with CsvDialect<T>.IsAscii.
// ensure SIMD accelerated parsing routines are used
CsvOptions<byte> options = GetConfiguredOptions();
Debug.Assert(options.Dialect.IsAscii);
return CsvReader.Read<User>(csv, options);
Further reading: Architecture.
Transcoding
The following methods are used by the library to convert T
values to char and back:
- TryGetChars(ReadOnlySpan<T>, Span<char>, out int) used to convert the header fields to strings
- GetAsString(ReadOnlySpan<T>) used in error messages, and to convert long header fields to strings
- TryWriteChars(ReadOnlySpan<char>, Span<T>, out int) used when writing text values, and initializing Newline and Whitespace
- GetFromString(string?) used in some converters, and while initializing the dialect
Note
The library maintains a small pool of string-instances of previously encountered headers, so unless your data is exceptionally varied, the allocation cost is paid only once.
Warning
While you can inherit the options-type and override these methods, the library expects both of the the string and ReadOnlySpan<T> methods to return the same sequences for the same inputs. Make sure you override both transcoding methods in either direction, and keep the implementations in sync.
Most likely you can achieve the same goals easier by using Comparer and custom converters.
Memory pooling
You can configure the MemoryPool<T> instance used internally with the CsvOptions<T>.MemoryPool property. Pooled memory is used to handle escaping, unescaping, and records split across multiple sequence segments. The default value is MemoryPool<T>.Shared.
If set to null
, no pooled memory is used and all temporary buffers are heap allocated.
Further reading: Architecture.
Custom binding
If you don't want to use the built-in CsvReflectionBinder<T> (attribute configuration), set CsvOptions<T>.TypeBinder property to your custom implementation implementing ICsvTypeBinder<T>.