Tuesday, May 20, 2008

ANSI_PADDING and Variable Length Character Columns in SQL Server

I have been involved in a very interesting thread on SQLServerCentral.com about trailing spaces in strings. It has changed the way I will be looking as strings going forward. I had always assumed that when a character column was defined as variable length (varchar, nvarchar) any trailing spaces inserted into the column would be trimmed. Well this WAS how SQL Server handled it because SQL Server defaulted the ANSI PADDING setting to OFF. Now SQL Server in SSMS (SQL Server Management Studio) the connection settings set ANSI PADDING to ON. This means that any table created using SSMS will use the ANSI PADDING ON setting, regardless of the default database setting. ANSI PADDING ON means that 'a ' will store the 'a' and the space while ANSI PADDING OFF only stores the 'a'. This becomes an issue when upgrading from an earlier version of SQL Server or, as in the thread, you are not aware of this change and are not coding appropriately for it.

Why is this a problem? Well. Here is a table showing how SQL Server acts with the 2 ANSI_PADDING options (courtesy of Matt Miller in the thread mentioned above and some testing):

SQL Function

Result (ANSI Padding ON)

Result (ANSI Padding OFF)

Primary Key

Trim it. Duplicate key error

Trim it

Unique Index

Trim it. Duplicate key error

Trim it

Comparison ("=", "!=")

Trim it. 'a' = 'a '

Trim it

Comparison ("like using _")

Don't trim it

Trim it

Concatenation

Don't trim it. 'a ' + 'a' = 'a a'

Trim it 'a ' + 'a' = 'aa'

Storage

Don't trim it

Trim it

Group By

Trim it

Trim it

Order By

Trim it

Trim it


The RED rows show where there are differences in behavior. Here is what Microsoft has to say from SQL Server 2005 BOL:

Important:

In a future version of SQL Server ANSI_PADDING will always be ON and any applications that explicitly set the option to OFF will produce an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.


Note:

We recommend that ANSI_PADDING always be set to ON


Here is some code that proves the behavior:

SET Ansi_Padding OFF

CREATE TABLE
test.categories
(
CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),
CategoryName VARCHAR(15),
Name2 VARCHAR(15),
NCategoryName NVARCHAR(15),
NName2 NVARCHAR(15)
)
--Create Unique Index UX_varchar_test On test.categories(CategoryName)
--Create Unique Index UX_nvarchar_test On test.categories(NCategoryName)

INSERT INTO test.categories
(
CategoryName,
Name2,
NCategoryName,
NName2
)
SELECT 'a', 'b', 'a', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b', 'a ', 'b' UNION ALL
SELECT 'a ', 'b', 'a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b'

-- length, storage, concatenation
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
CategoryID,
CategoryName,
CategoryName + Name2 AS concat_names,
LEN(CategoryName) AS len_name,
DATALENGTH(CategoryName) AS Datalength_Name,
NCategoryName,
NCategoryName + NName2 AS concat_unicode_names,
LEN(NCategoryName) AS len__unicode_name,
DATALENGTH(NCategoryName) AS Datalength_Unicode_Name
FROM
test.categories

-- varchar group by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'group by' AS test_function,
COUNT(categoryid) AS num_recs,
categoryName
FROM
test.categories
GROUP BY
categoryName

-- nvarchar group by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'group by' AS test_function,
COUNT(categoryid) AS num_recs,
NcategoryName
FROM
test.categories
GROUP BY
NcategoryName

-- varchar order by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'order by' AS test_function,
categoryid,
categoryName
FROM
test.categories
ORDER BY
categoryName DESC

-- nvarchar order by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'order by' AS test_function,
categoryid,
ncategoryName
FROM
test.categories
ORDER BY
ncategoryName DESC

-- varchar =
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'=' AS test_function,
categoryid,
categoryName
FROM
test.categories
WHERE
categoryName = 'a'

-- nvarchar =
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'=' AS test_function,
categoryid,
ncategoryName
FROM
test.categories
WHERE
NcategoryName = 'a'

-- varchar like %
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'Like %' AS test_function,
categoryid,
categoryName
FROM
test.categories
WHERE
categoryName LIKE 'a%'

-- nvarchar like %
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'Like %' AS test_function,
categoryid,
ncategoryName
FROM
test.categories
WHERE
ncategoryName LIKE 'a%'

-- varchar like _
SELECT
CASE
WHEN SESSIONPROPEERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'Like _' AS test_function,
categoryid,
categoryName
FROM
test.categories
WHERE
categoryName LIKE 'a_'

-- nvarchar like _
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'Like _' AS test_function,
categoryid,
ncategoryName
FROM
test.categories
WHERE
ncategoryName LIKE 'a_'
GO

SET Ansi_Padding ON

CREATE TABLE
test.categories2
(
CategoryID INT PRIMARY KEY CLUSTERED IDENTITY(1,1),
CategoryName VARCHAR(15),
Name2 VARCHAR(15),
NCategoryName NVARCHAR(15),
NName2 NVARCHAR(15)
)
--Create Unique Index UX_varchar_test On test.categories2(CategoryName)
--Create Unique Index UX_nvarchar_test On test.categories2(NCategoryName)

INSERT INTO test.categories2
(
CategoryName,
Name2,
NCategoryName,
NName2
)
SELECT 'a', 'b', 'a', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b', 'a ', 'b' UNION ALL
SELECT 'a ', 'b', 'a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b' UNION ALL
SELECT 'a ', 'b','a ', 'b'

-- length, storage, concatenation
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
CategoryID,
CategoryName,
CategoryName + Name2 AS concat_names,
LEN(CategoryName) AS len_name,
DATALENGTH(CategoryName) AS Datalength_Name,
NCategoryName,
NCategoryName + NName2 AS concat_unicode_names,
LEN(NCategoryName) AS len__unicode_name,
DATALENGTH(NCategoryName) AS Datalength_Unicode_Name
FROM
test.categories2

-- varchar group by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'group by' AS test_function,
COUNT(categoryid) AS num_recs,
categoryName
FROM
test.categories2
GROUP BY
categoryName

-- nvarchar group by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'group by' AS test_function,
COUNT(categoryid) AS num_recs,
NcategoryName
FROM
test.categories2
GROUP BY
NcategoryName

-- varchar order by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'order by' AS test_function,
categoryid,
categoryName
FROM
test.categories2
ORDER BY
categoryName DESC

-- nvarchar order by
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'order by' AS test_function,
categoryid,
ncategoryName
FROM
test.categories2
ORDER BY
ncategoryName DESC

-- varchar =
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'=' AS test_function,
categoryid,
categoryName
FROM
test.categories2
WHERE
categoryName = 'a'

-- nvarchar =
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'=' AS test_function,
categoryid,
ncategoryName
FROM
test.categories2
WHERE
NcategoryName = 'a'

-- varchar like %
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'Like %' AS test_function,
categoryid,
categoryName
FROM
test.categories2
WHERE
categoryName LIKE 'a%'

-- nvarchar like %
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'Like %' AS test_function,
categoryid,
ncategoryName
FROM
test.categories2
WHERE
ncategoryName LIKE 'a%'

-- varchar like _
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'varchar' AS data_type,
'Like _' AS test_function,
categoryid,
categoryName
FROM
test.categories2
WHERE
categoryName LIKE 'a_'

-- nvarchar like _
SELECT
CASE
WHEN SESSIONPROPERTY('ANSI_PADDING') = 0 THEN 'Off'
ELSE 'ON'
END AS ansi_padding_setting,
'nvarchar' AS data_type,
'Like _' AS test_function,
categoryid,
ncategoryName
FROM
test.categories2
WHERE
ncategoryName LIKE 'a_'


DROP TABLE test.categories2
DROP TABLE test.categories


The key, as always, is to understand the system you are working with and design and code against it appropriately. As noted by the Original Poster in the SSC thread, "Trim your strings before inserting".

Friday, May 2, 2008

SSIS SCript Component Rant

I certainly don't consider myself an SSIS expert, but I have done some work with it. It is a very powerful tool, but some things bother me about it. The first thing is the inability to debug Script Components in a data flow. I can set breakpoints in a Script Task, but in a Script Component I am left with message boxes and logging. This is a very painful way to try to debug code. Anyone with classic ASP experience can relate (response.write, anyone?). Microsoft went to the trouble of including debugging capabilities in ASP.NET and I certainly hope it is added to Script Components in SSIS in 2008.

Another annoyance is that I would like to be able to "save" a Script Component script as a file and then re-use it in other Script Components. Why can't I do this? I realize I could write a custom component (currently beyond my abilities), but shouldn't I be able to load code from a .vb file?

Lastly, error messages. Why doesn't an unhandled error tell me the Script Component's name instead of the useless, Script_Component_GUID? Just give me something I can easily decipher.

Anyway, I don't want to sound like I am down on SSIS, Script Components particularly, but I just see areas it could be improved and they seem like reasonable things to ask.