Varchar vs NVarchar

So let’s talk about this to better understand some differences and benefits of using each of these data types.

NVarchar – the ‘N’ in varchar means uNicodeThe column can store any Unicode data.  The column is a varchar that supports two-byte characters.  The most common use for this sort of thing is to store character data that is a mixture of English and non-English symbols.

Varchar – is an abbreviation for variable-length character string.  It’s a string of text characters that can be as large as the page size for the database table holding the column.  The size for a table page is 8,196 bytes, and no one row in a table can be more than 8060 characters.  This in turn limits the maximum size of a varchar to 8000 bytes.

So we used the term unicode data.  What does this mean?  Computers store letters and other characters by assigning a number for each one.  Before unicode was invented there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. These encoding systems also conflict with another, for instance two encodings could use the same number for two different characters. Unicode to the rescue.  Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.  The unicode standard has widespread adoption across systems, platforms, languages etc.  It solves most of the encoding issues.

The primary difference between these two column types is how they are stored.  Varchar is stored as 8-bit data.  Nvarchar strings are stored in the database as UTF-16, 16 bits or two bytes per character, and converted on output (typically UTF-8).  That said, Nvarchar strings have the same length restrictions as varchar 8000 bytes.  Since Nvarchar use two bytes instead of one nvarchar can only hold 4000 characters maximum.

What is UTF-8 and UTF-16?   UTF stands for Unicode Transformation Format.  It is a family of standards for encoding the Unicode character set into its equivalent binary value.  UTF was developed so that users have a standardized means of encoding the characters with the minimal amount of space.UTF-8 and UTF 16 are only two of the established standards for encoding.  The main advantage of UTF-8 is that it is backwards compatible with ASCII. The ASCII character set is fixed width and only uses one byte.  When encoding a file that uses only ASCII characters with UTF-8, the resulting file would be identical to a file encoded with ASCII. This is not possible when using UTF-16 as each character would be two bytes long.

Okay, we have covered a few technical topics here.  So in the end you can store more string data in a varchar than nvarchar fields, however if you are working with multiple cultures, languages etc. you will NEED to store those character sets in fields that are of type NVARCHAR.  Due to the widespread usage of web applications around the globe I believe you will rarely go wrong by selecting/using nvarchar over varchar. If you are concerned about storage space you can use a combination of field types (using nvarchar for user-entered data and varchar for system generated) but this may just add complexity and one day just be a limitation/hurdle you will need to overcome in the future when globalizing your application.

Just use nvarchar and you will be safe.




comments powered by Disqus