How to make strings non-Unicode in Entity Framework Code First

By default Entity Framework will convert string properties to nvarchar(MAX) or nvarchar(123) where 123 is the length we have set with .HasMaxLength(123) in the Fluent API.

nvarchar columns are Unicode safe, while varchar columns only store ACSCII characters.

Since a varchar takes up half the size of an nvarchar for an equivalent string, if we know for sure we don’t need Unicode columns we can tell EF Core to use varchars via the IsUnicode() method in the Fluent API …

We can also set it via the data annotations approach with the Unicode attribute

In the above examples we are dealing with ISBN numbers which cannot contain non Unicode characters.

Note, for the data attribute approach we could also use:

[Column(TypeName = "varchar(13)")]

which sets the the column to non-Unicode varchar(13) directly, but it does it by referencing a platform specific (SQL Server) column type.

The Unicode attribute which is new since Entity Framework Core 6 is preferred as it’s an abstraction over any specific DB provider so we can switch providers more easily.

Setting Unicode preference globally

Since Entity Framework 6 we can also set a preference for Unicode columns or not globally in the ConfigureConventions DbContext method …

We can then override this default on a property by property basis via the Fluent API or Data Annotation approach shown above.

Why not just keep everything Unicode ‘just in case’?

I have seen recommendations to make everything Unicode (eg. nvarchar in SQL Server) ‘just in case’, but I personally don’t like this design approach. 

Remember, a varchar takes up HALF THE SIZE of an nvarchar for an equivalent string. If you know for sure a column is ASCII SAFE using nvarchar is just wasteful… BUT… space is cheap.. that’s true (in many but not all cases) but there can be performance implications too.

I’m not a DBA but the basic idea is that SQL Server stores data in 8k pages (in memory), the wider the columns are the less data can fit into one page before it needs to go to hard disk again, reading from disk will always be slower than reading from memory, how much slower?… who knows… all depends on I/O cost for particular server/disk etc.

Another benefit of making columns non-Unicode is that it better communicates intent to other developers.

Leave a Reply

Your email address will not be published. Required fields are marked *