Wise man or Wise guy? You decide: Programming

Showing posts with label Programming. Show all posts

Thursday, December 17, 2009

Where’d the Data Go?

The Reported Problem

A few weeks ago I was contacted by a former employer because an SSRS report I had written several years ago was no longer returning the data that was expected. The report returned Manufacturing Instructions that were attached to an Order or Order Item and was used to ensure that everything was done as requested by the customer. There are 2 groups on the report:

Runs – a Run consists of orders that contain the same product.
Orders

The detail section consists of the specific Manufacturing Instructions for each order in the run. Well, the report was showing the Runs and the Orders but no Instructions. Obviously this is bad since the whole point of the report is to show the instructions.

Troubleshooting the Problem

I vpn’ed into the network and opened up the report to find the SQL that was generating the report. I knew I had authored the report, but could not remember what the data source was. There were 2 possible data sources: SQL Server or DB2 on the iSeries. I was hoping it would be SQL Server. It wasn’t. I had a query similar to this (underlining added) in the report that was connecting to DB2 using the IBM DA400 OLE DB Driver:

SELECT
    S.run_no,
    O.order_no,
    OI.instruction AS order_instruction,
    S.start_date,
    S.end_date
FROM
    @orders O JOIN 
    @order_instructions OI
        ON O.order_no = OI.order_no JOIN
    @schedule S
        ON O.order_no = S.order_no
WHERE
    (S.start_date >= '12/17/2009' AND
     S.end_date < '12/21/2009') OR
    (S.start_date < '12/21/2009' AND
     S.end_date > '12/14/2009')
UNION ALL
SELECT
    S.run_no,
    O.order_no,
    OII.instruction AS order_item_instruction,
    S.start_date,
    S.end_date
FROM
    @orders O JOIN 
    @order_items OI
        ON O.order_no = OI.order_no JOIN 
    @order_item_instructions OII
        ON OI.order_item_no = OII.order_item_no JOIN
    @schedule S
        ON O.order_no = S.order_no
WHERE
    (S.start_date >= '12/17/2009' AND
     S.end_date < '12/21/2009') OR
    (S.start_date < '12/21/2009' AND
     S.end_date > '12/14/2009')
ORDER BY
    run_no,
    O.order_no

I ran the query in the dataset designer in Visual Studio 2003 (yup, SSRS 2000) and got data back including instructions. I couldn’t figure out why I wasn’t seeing instructions on the report, especially since this report had been running without problem. Then I ran the query in a custom query tool and that’s when I noticed the instruction column was not being returned as instruction but as column 00003! What! I don’t have that anywhere? Well, look at the underlined text in query, notice how I aliased the instruction columns with different aliases. Turns out THIS was the problem. Apparently when an update (ptf) was applied to the iSeries the behavior of column aliases in a UNION changed. If you run a query like that in SQL Server the ALIAS used in the FIRST query is returned as the column name. This was also how it USED to work in DB2 on the iSeries, but apparently that changed and instead of getting EITHER of the ALIASES it returns the column number, in the case you’d get 00003 for the ALIASED column.

The Solution

Well, simple solution, I removed the ALIAS in the second part of the UNION query and, VIOLA!, the report now worked again.

Why did I have the separate ALIASes in the first place? I think it was because when I was testing the queries for correct data I wanted to know which instructions were at the order level and which were at the order item level and I ran the queries separately. Then when I was convinced the data was correct I UNION’ed the queries and stuck them in the report without removing the ALIASes.

So in reality the query was returning the correct data, but the SSRS report was looking for a column named instructions and it didn’t exist, so it didn’t show it!

Next time, I’ll know better!

Monday, December 7, 2009

Good Programmer = Lazy & Dumb?

Alternative title for programmers using C-derivatives and Java:

Good Programmer == Lazy & Dumb?

I read this blog post, Why Good Programmers Are Lazy and Dumb, and it struck me a being an excellent perspective. Especially with these 2 quotes:

Lazy, because only lazy programmers will want to write the kind of tools that might replace them in the end.

My goal when working on a project it produce a product that takes me out of the loop on the project. By that I mean that I want to provide the end users all the tools they need to manage the application once released to them. No, I don’t want to have to manage in-application security, I want a tool provided to the appropriate person to do that. I want to be a DBA\Developer, NOT an application administrator. I want to be able to work on new and exciting projects, not stick with one.

But there’s a more crucial point why a good programmer must be dumb. That’s because for him to find the best solutions to problems, he must keep a fresh mindset and manage to think out of the box (or rather, know its actual shape).

Sometimes, the worst situation is the one where you are re-writing an existing application. Why? Because you, or someone else on the team, knows too much about it and can’t see new ways to do it. A few years ago I moved to a new position and was put on a project with one of the existing developers who had been in the organization for several years and had written much of the current application. Let’s just say this was a blessing and a curse. They knew the processes well and what hadn’t worked well, but this also meant that they weren’t able to see other ways to do things which may be better. I was the “dumb” programmer in this situation and tended to ask, “Why?” a lot and suggest alternate ways of doing things.

So the questions are:

Are you trying to write software that replaces you? Are you automating the routine tasks of being a DBA so that, with proper documentation, someone can come behind you and not miss a beat?
I know I have a long way to go in regards to automation and documentation, but I’m working on it.
Are you content with, “I’ve always done it that way”, or are you looking for new or better ways to do things?
I’d like to think that this an area I do well in. I’m always looking for a better way to do something, which is why I attend user group meetings, read books and blogs, and participate in the online SQL Community.

How about you?

Monday, November 30, 2009

Maintaining Security and Performance Using Stored Procedures Part II – Signing

Well, it has been a couple of weeks since my last blog post and over a month since Maintaining Security and Performance Using Stored Procedures Part I – Using EXECUTE AS was posted, although I did spend time working on the originally unplanned follow up to that post when I would have been doing this post. I know you all have been anxiously awaiting this post.

EXECUTE AS vs. Signing

Like EXECUTE AS, the ability to sign a module (stored procedure, DML trigger, function, or assembly) with a certificate was added in SQL Server 2005. In addition to being able to allow access to objects within the current database context, signing also allows you to access resources in another database or that require server level permissions. With EXECUTE AS on functions, stored procedures, and DML triggers you are limited by database context, so if you need access to objects in another database you are out of luck, even if the EXECUTE AS user has proper rights in the database. You can download code that demonstrates this behavior here.

Demonstrations of Signing

Laurentiu Cristofor has an excellent blog post that demonstrates signing a stored procedure to grant server level permissions here, so I am not going to duplicate his work in this post. I will demonstrate how to sign a procedure for use within a database and when accessing another database.

Using Signing to enable Dynamic SQL within the database

USE AdventureWorks;
GO

/*
Create a restricted_login
*/
CREATE LOGIN restricted_login WITH Password = '$tr0ngPassword';

GO

/*
Create a Certificate first
*/    
CREATE CERTIFICATE cert_dynamic_sql 
ENCRYPTION BY PASSWORD = 'c3rtificatePa$$w0rd'
WITH subject = 'Dynamic SQL Security'   
GO 

/* 
Create user based on the certificate
*/ 
CREATE USER certificate_user FROM CERTIFICATE cert_dynamic_sql;

/*
Give the certificate_user select on all objects
*/
GRANT SELECT ON SCHEMA::Person TO certificate_user;
GO

/*
Create a restricted rights database user
*/    
CREATE USER restricted_user FROM LOGIN restricted_login;

GO

/*
Create the procedure
*/
CREATE PROCEDURE dbo.FindPhoneByName
(
@LastName nvarchar(50) = null,
@FirstName nvarchar(50) = null
)
AS   
BEGIN
 SET NOCOUNT ON;

Declare @sql_cmd nvarchar(2000),
@select nvarchar(1000),
@where nvarchar(1000),
@parameters nvarchar(1000);

 Set @parameters = N'@FirstName nvarchar(50), @LastName nvarchar(50)';

Set @select = N'Select
Title,
FirstName,
MiddleName,
LastName,
Suffix,
Phone  
From
Person.Contact';

Set @where = N' Where 1=1 '                

If @LastName is not null
Begin
Set @where = @where + N' And LastName Like @LastName + N''%'' ';
End;

If @FirstName is not null     
Begin
Set @where = @where + N' And FirstName Like @FirstName + N''%''';
End;

Set @sql_cmd = @select + @where;

Exec sys.sp_executesql @sql_cmd, @parameters, @LastName = @LastName, @FirstName = @FirstName;

Return;         
END

GO

/*
Give the restricted user exec on the proceduere
*/
GRANT EXEC ON dbo.FindPhoneByName TO restricted_user; 

GO
/*
Change context to the restricted rights user
*/
EXECUTE AS LOGIN = 'restricted_login';

GO

/*
Exec the procedure - will fail on the dynamic portion
*/
EXEC [dbo].FindPhoneByName;

GO

/* 
Return to sysadmin rights
*/
revert;

GO

/*
Sign the procedure
*/
ADD SIGNATURE TO [dbo].[FindPhoneByName]
BY CERTIFICATE cert_dynamic_sql
WITH PASSWORD = 'c3rtificatePa$$w0rd';
GO

/*
Change context to the restricted rights user
*/
EXECUTE AS LOGIN = 'restricted_login';

GO

/*
Exec the procedure - will work now
*/
EXEC [dbo].FindPhoneByName;

GO

/* 
Return to sysadmin rights
*/
revert;

GO

Using Signing to Access an Object in Another Database

In this example I’ll use a signed procedure to access a table in the Northwind database (download here) from the AdventureWorks database (I use a 2005 copy with extra data, the unmodified version is available here). One thing I found in my testing is that you have to use a private key file in this case. If anyone knows how to do it without the file please let me know. Here is the code:

USE MASTER;
GO

/*
Create the restricted_login
*/
CREATE LOGIN restricted_login WITH Password = '$tr0ngPassword';

GO

USE Northwind;

GO

/*
Create the Certificate
*/    
CREATE CERTIFICATE cert_access_other_db 
ENCRYPTION BY PASSWORD = 'c3rtPa$$word'
WITH subject = 'Access Other DB'   
GO 

/* 
Backup the certificate being sure to use a Private Key
*/
BACKUP CERTIFICATE cert_access_other_db TO FILE = 'C:\Certificates\cert_access_other_db.cer'
WITH PRIVATE KEY (FILE = 'C:\Certificates\cert_access_other_db.pvk' ,
ENCRYPTION BY PASSWORD = '3ncRyptKeyPa$$word',
DECRYPTION BY PASSWORD = 'c3rtPa$$word');
GO

/*
Create the certificate user in the Northwind and give needed permissions
*/
CREATE USER certificate_user FROM CERTIFICATE cert_access_other_db;

GO

GRANT SELECT ON dbo.Categories TO certificate_user;

GO 

USE AdventureWorks;

GO

/*
Create a restricted rights database user
*/    
CREATE USER restricted_user FROM LOGIN restricted_login;

GO

/*
Create the procedure
*/
CREATE PROCEDURE [dbo].access_other_db
AS
SET NOCOUNT ON

SELECT 
SYSTEM_USER AS USERName, 
*
FROM
[Northwind].dbo.[Categories] AS C;

RETURN;    

GO

/*
Give the restricted_user execute rights on the sp
*/
GRANT EXEC ON dbo.access_other_db TO restricted_user;

GO

/*
Create the certificate in this database from the file
*/
CREATE CERTIFICATE cert_access_other_db FROM FILE = 'C:\Certificates\cert_access_other_db.cer'
WITH PRIVATE KEY (FILE = 'C:\Certificates\cert_access_other_db.pvk',
DECRYPTION BY PASSWORD = '3ncRyptKeyPa$$word', /*The password used to create the private key */
ENCRYPTION BY PASSWORD = 'D3cryptKeyPa$$word');
GO

/* 
Execute as a sysadmin - works - this is my user
*/
EXEC [dbo].[access_other_db];

GO


/*
Now execute as the restricted user 
*/
EXECUTE AS LOGIN = 'restricted_login';

GO

/*
This will fail.
*/
EXEC [dbo].[access_other_db];

GO

/*
Back to the sysadmin level
*/
Revert;

GO

/*
Sign the procedure 
*/
ADD SIGNATURE TO dbo.access_other_db
BY CERTIFICATE cert_access_other_db WITH Password = 'D3cryptKeyPa$$word'
GO

/*
Now execute as the restricted user 
*/
EXECUTE AS LOGIN = 'restricted_login';

GO

/*
This will now work.
*/
EXEC [dbo].[access_other_db];

GO

/*
Back to the sysadmin level
*/
Revert;

GO

/*
Be sure to delete the certificate and
private key when done.
*/

Summary

As you can see from this post and the previous post (or two), the SQL Server team has given you some good options when it comes to using Stored Procedures for data access and manipulation while still maintaining security. You can use EXECUTE AS to allow cross-schema or within database access and you can use module signing to allow access to system objects or cross-database queries without specifically granting users access to the objects.

Resources

After having planned and started the post because I had not found anything outside of Books On Line, I found a few resources that covered the material as well, and I used each to help me write this post once I found them.

I linked to Laurentiu Cristofor’s post earlier
Erland Sommarskog has an excellent write-up on Giving Permissions through Stored Procedures which handles this subject very thoroughly as you would expect form Erland. Erland’s post helped me get past the fact that I needed to use a private key in order to get the cross database piece working as none of my other resources did this.
Jonathan Kehayias also answered this this forum post with an example.

In reality I could have just posted these links as these other folks covered the subject on signing thoroughly, but I decided that since I did the research and wanted to try the code myself that I’d share my experiences as well, crediting these guys for their work and hopefully sending them some traffic, however limited, from this blog.

Finally all the MY code from this post can be downloaded from here.

Monday, November 2, 2009

Follow Up to Maintaining Security and Performance Using Stored Procedures Part I – Using Execute As

This is a follow up to Maintaining Security and Performance using Stored Procedures Part I – Using EXECUTE AS because of a comment on that post on the SQLServerCentral syndicated feed by Ken Lee. He commented:

Like the technique, not the example. ISNULL() is a function, it forces the where clause to look at every record in the table.
SQL is smart enough to identify constants and know the variable name is true or not and will or will not evaluate the second OR statement used below.
First SET @LastName=@LastName+N'%', if it's null it remains null. To make sure it always works declare a nvarchar(51) field and assign it instead. If the field in the table is max 50 characters then this isn't needed.
Replace the function in the first example with "(@LastName IS NULL OR LastName Like @LastName) And" logic and you should get the performance without the dynamic SQL.
Kind of curious about the stats.

Since the purpose of that post was to show how to get better performance using Dynamic SQL without compromising security, I decided I needed to do some testing on his method. Here are the queries I ran (FREEPROCCACHE was run because the Dynamic SQL query plans were not removed from the cache when the stored procedure was altered):

DBCC FREEPROCCACHE();
GO

Exec dbo.FindPhoneByName @FirstName ='J',@LastName ='A';

GO

Exec dbo.FindPhoneByName @FirstName ='J';

GO

Exec dbo.FindPhoneByName @LastName ='A';

Go

I don’t know why, but his method DOES generate a different execution plan than my method using ISNULL() when you leave the Person.Contact in, what I believe is, the original state. In the original state the optimizer chooses to do a scan of the index, IX_Contact_MiddleName, and bookmark lookup on the clustered index. Here’s the index definition:

CREATE NONCLUSTERED INDEX[IX_Contact_MiddleName] ON[Person].[Contact]
(
[MiddleName] ASC
)
INCLUDE ([FirstName],
[LastName])

While the optimizer chose a clustered index scan for Ken’s query. This access path required more reads when both parameters were provided, but fewer when only one was provided.

The Dynamic SQL, however, provided the best of both, using the index when both parameters were provided and using the clustered index when only one parameter was provided.

Intrigued I decided to see what happened if a covering index was made available, so I created this index:

CREATE NONCLUSTERED INDEX[IX_Contact_LastName_FirstName] ON[Person].[Contact]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)
INCLUDE ([Title],
[Suffix],
[Phone])

With a covering index in place both non-dynamic solutions produced the same execution plan, scans of the newly created covering index.  The Dynamic SQL though had seeks on the covering index for queries 1 and 3, and a scan for query 2.

Lastly I decided to see what happened when there was a non-covering index available, here’s that index:

CREATE NONCLUSTERED INDEX [IX_Contact_LastName_FirstName] ON [Person].[Contact] 
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)

This case was similar to the first test, my solution did a scan on the index with a bookmark lookup, and Ken’s solution did a clustered index scan. The dynamic SQL did an index seek/bookmark lookup when both parameters were provided and a clustered index scan when only one parameter was provided.

Here’s the results from statistics IO:

		IsNull() Solution		Ken’s Solution		Dynamic
		Scans	Reads	Scans	Reads	Scans	Reads
Query 1	Original	1	593	1	1116	1	593
	Covered	1	201	1	201	1	13
	Uncovered	1	457	1	1116	1	359
Query 2	Original	1	7783	1	1116	1	1116
	Covered	1	201	1	201	1	201
	Uncovered	1	7647	1	1116	1	1116
Query 3	Original	1	3031	1	1116	1	1116
	Covered	1	201	1	201	1	13
	Uncovered	1	2895	1	1116	1	1116

Conclusion

As you can see from the results above Ken’s solution does provide more consistent results than my original solution, while the Dynamic SQL still provides the best performance. I should note that there is a tradeoff with the dynamic SQL solution, you get better plans because you get a plan for each option that is run, so depending on the number of parameter combinations you could get procedure cache bloat. In most 64-bit implementations this shouldn’t be an issue, but you may see more compiles with a dynamic solution. As always test your options, monitor, and change as needed to keep your systems running as well possible.

All the files (except the AdventureWorks database, on CodePlex) can be found here. Included are the queries, the results from statistics IO, and a trace file showing the activity and query plans.

Part II

Monday, October 26, 2009

Book Review: Murach’s JavaScript and DOM Scripting by Ray Harris

In early September I received a complimentary copy of Murach’s JavaScript and DOM Scripting by Ray Harris (Amazon) to review. I got the book because my friend, Andy Warren, passed my name along to the publisher when they asked him to review the book. He knew I was attempting to learn JavaScript, so passed along this opportunity to me.

This is the first Murach book I have read and it definitely has an interesting format. The left page is text and the right page is code, examples, and summary of key points. It took a couple of chapters to get used to the format, but once I did I found it to be very helpful. For some of the early chapters I was able to just skim the right page to pick up the concepts as the content was already familiar to me.

The book is broken down into four sections:

Introduction to JavaScript Programming
JavaScript Essentials
Dom Scripting
Other JavaScript Skills

In the first section, Introduction to JavaScript Programming, you get the basics of web development including XHTML, CSS, and beginning JavaScripting. As an inexperienced web programmer I found this information invaluable. I’ve always been confused by CSS this book helped me to gain a basic understanding so I can now read, understand, and intelligently edit CSS pages. An experience web developer will be able to skim/skip much of this section, but as a relative newbie, I ate it all up.

In section two, JavaScript Essentials, you delve deeper into JavaScript functionality. Getting input and displaying output, working with native objects, control statements, arrays, functions, objects, regex, exception handling, and data validation. Some of the topics in this section, like control statements, are common to other languages so I was able to skim parts of this section and just use the examples and summary on the right hand page.

In section three, DOM Scripting, you really get into the deeper topics. This is where you really get to hone your skills and take advantage of the power of the DOM and scripting. You learn to manipulate the DOM, CSS, and build libraries you can re-use to do this. You get understandable examples and exercises that lead you through the concepts and help you build working examples of a slide show, manipulating tables, and animations. From here to the end of the book, I would imagine that even experienced JavaScript developers would learn something.

In section four, Other JavaScript Skills, you learn how to manipulate the browser and leverage existing JavaScript libraries like jQuery to extend your applications.

The book was was easy to read and the examples and exercises were good. Being a Microsoft developer I was a little disappointed that it the book did not really talk about using JavaScript with Visual Studio, but I guess that is to be expected. I also didn’t think enough time was spent on showing how to deal with the situation when a user has disabled scripting in their browser. The last thing was that I found the instructions in some of the exercises were too vague and while I finished them in a manner that worked, I wasn’t sure if I had done it correctly. The finished scripts were included in the download, but I would have liked to have seen it in the book as an appendix.

So I did have a couple of issues, but overall a really good book. I think it works for beginners all the way to experienced web developers. The more you know you can “mine” the parts of the book that address your weak areas. I’d recommend the book and would buy other Murach books based on my experience with this one.

Tuesday, October 6, 2009

Index on a Foreign Key?

About two weeks ago, I had a discussion on Twitter and via email with Jeremiah Peschka (@peschkaj) about placing indexes on Foreign Key columns. I made the comment that I while indexing a foreign key is not required that I thought they made good candidates. I also shared this blog post by Greg Low with him.

Jeremiah mentioned that he had never had SQL Server recommend an index on a foreign key which I thought was strange so I decided to run some tests. I ran the tests against my copy of AdventureWorks to which I have added data. Here is the query I ran:

SELECT
    SOD.SalesOrderDetailID,
    SOD.OrderQty,
    SOD.ProductID,
    SOD.UnitPrice,
    SOD.UnitPriceDiscount,
    SOD.LineTotal,
    SOD.UnitPrice2,
    SOH.SalesOrderID,
    SOH.RevisionNumber,
    SOH.OrderDate,
    SOH.DueDate,
    SOH.ShipDate,
    SOH.Status,
    SOH.SalesOrderNumber,
    SOH.PurchaseOrderNumber,
    SOH.AccountNumber,
    SOH.CustomerID,
    SOH.SubTotal,
    SOH.TaxAmt,
    SOH.Freight,
    SOH.TotalDue,
    SOH.Comment
FROM
    Sales.SalesOrderHeader AS SOH JOIN
    Sales.SalesOrderDetail AS SOD ON 
        SOH.SalesOrderID = SOD.SalesOrderID 
WHERE
    SOH.OrderDate >= '12/1/2007'

In my copy of AdventureWorks I have 131465 rows in SalesOrderHeader and 1121317 rows in SalesOrderDetail. The Max(SalesOrderHeader.OrderDate) is 12/30/2007 so I just queried for the last month (1095 rows). In this query SalesOrderDetail.SalesOrderID is a foreign key referencing SalesOrderHeader.SalesOrderID and before modification the Clustered Primary Key on SalesOrderDetail is on SalesOrderID, SalesOrderDetailID. Because SalesOrderID is the lead column in the clustering key the optimizer uses a clustered index seek to access SalesOrderDetail as shown here:

Now, to see if the Optimizer would recommend an index on SalesOrderDetail.SalesOrderID, I altered the clustered primary key on SalesOrderDetail, removing SalesOrderID with the following code:

USE [AdventureWorks]
GO

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID')
ALTER TABLE [Sales].[SalesOrderDetail] DROP CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]
GO

USE [AdventureWorks]
GO

ALTER TABLE [Sales].[SalesOrderDetail] ADD  CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED 
(
    [SalesOrderDetailID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

I then re-ran my first query and now you can see that I there is a clustered index scan, a hash match join, and, in green at the top, a index recommended on SalesOrderDetail.SalesOrderID:

Next I added the suggested index (remember that you should not just add indexes to production without checking other queries as well):

USE [AdventureWorks]
GO
CREATE NONCLUSTERED INDEX SalesOrderDetails_SalesOrderId
ON [Sales].[SalesOrderDetail] ([SalesOrderID])
INCLUDE ([SalesOrderDetailID],[OrderQty],
        [ProductID],[UnitPrice],
        [UnitPriceDiscount],[LineTotal],
        [UnitPrice2])
GO

Then I re-ran my query, and here’s the execution plan:

Here are the Statistics IO results from the 3 queries as well:

Results with SalesOrderID as lead column in Primary Key (Original State)

Table 'SalesOrderDetail'. Scan count 147, logical reads 516, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderHeader'. Scan count 1, logical reads 461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Results with SalesOrderID removed from Primary Key and not added as an index.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderDetail'. Scan count 1, logical reads 12584, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderHeader'. Scan count 1, logical reads 461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Results with the suggested index on SalesOrderID.

Table 'SalesOrderDetail'. Scan count 147, logical reads 504, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderHeader'. Scan count 1, logical reads 461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Conclusion

In this case you can see that adding an index on the foreign key, whether the lead column in the Clustered Primary Key or in a non-clustered index, you get much improved performance particularly regarding reads, and that the non-clustered index actual provides a slightly better access path than being part of the clustered primary key. Certainly you need to test for your individual situations but, I would say that the majority of the time a foreign key will benefit from an index since it is used as a filter in the JOIN criteria.

AdventureWorks can be downloaded from codeplex. Execution Plans and T-SQL can be downloaded from here.

Wednesday, September 30, 2009

On vs. Where

Does it matter if you put your criteria in the ON clause or the WHERE clause? Well, as with most things SQL the answer is, “It depends”. If you are dealing with INNER JOIN’s then it really doesn’t matter because the query optimizer is smart enough to come up with the same execution plan for both. For example these 2 queries evaluate to the same execution plan:

SELECT
    SOD.SalesOrderDetailID,
    SOH.SalesOrderID
FROM
    Sales.SalesOrderHeader AS SOH JOIN
    Sales.SalesOrderDetail AS SOD ON 
        SOH.SalesOrderID = SOD.SalesOrderID 
WHERE
    SOH.OrderDate >= '7/1/2004' AND 
    SOH.OrderDate < '8/1/2004'    
    
SELECT
    SOD.SalesOrderDetailID,
    SOH.SalesOrderID
FROM
    Sales.SalesOrderHeader AS SOH,
    Sales.SalesOrderDetail AS SOD 
WHERE
    SOH.SalesOrderID = SOD.SalesOrderID AND
    SOH.OrderDate >= '7/1/2004' AND 
    SOH.OrderDate < '8/1/2004'

The old SQL 6.5 OUTER JOIN syntax (*= and =*) has been discontinued beginning with SQL Server 2005, so you have to do the JOIN for OUTER JOIN’s in the ON as demonstrated in this code:

SELECT

   SOD.SalesOrderDetailID,

   SOH.SalesOrderID

FROM

   Sales.SalesOrderHeader AS SOH LEFT JOIN

   Sales.SalesOrderDetail AS SOD

        ON SOH.SalesOrderID = SOD.SalesOrderID

WHERE

   SOH.OrderDate >='7/1/2004' AND

   SOH.OrderDate <'8/1/2004'

Now let’s create a sandbox to play in.

If OBJECT_ID('sales.orders', 'U') Is Not Null
    Begin
        Drop Table sales.orders;
    End;
    
If OBJECT_ID('sales.order_items', 'U') Is Not Null
    Begin
        Drop Table sales.order_items;
    End;
    
 If SCHEMA_ID('sales') Is Not Null
    Begin
        Drop Schema sales;
    End;   

Go

Create Schema sales;

Go
/* 
Tables to hold sample data 
*/ 
Create Table sales.orders
    ( 
    order_id INT IDENTITY(1,1)PRIMARY KEY, 
    customer_id INT 
    ); 

Create Table sales.order_items
    ( 
    order_detail_id INT IDENTITY(1, 1)PRIMARY KEY, 
    order_id INT, 
    product_id INT, 
    quantity INT 
    ) 

/* 
Load Sample data
*/ 
INSERT INTO sales.orders (customer_id) 
    SELECT TOP 5
        AO.[object_id] 
    FROM 
        sys.all_objects AS AO; 

INSERT INTO sales.order_items 
    ( 
    order_id, 
    product_id, 
    quantity 
    ) 
    SELECT 
        1,
        1,
        7
    Union ALL
    Select
        2,
        1,
        4
    Union ALL
    Select
        3,
        2,
        6
     Union ALL
     Select
        4,
        2,
        11
     Union ALL
     Select
        5,
        3,
        1;

Now we want to return all the customers who have placed an order, but we only want to return the items where the quantity is greater than 5.  Here is method 1:

Select
    O.customer_id,
    OI.order_id,
    OI.product_id,
    OI.quantity      
From
    sales.orders AS O LEFT OUTER JOIN
    sales.order_items AS OI ON
        O.order_id = OI.order_id 
Where
    OI.quantity > 5;

This returns:

customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3           1           1           7
7           3           2           6
8           4           2           11

Hmmm, we know we have orders from five customers, but this only returns the three rows. Let’s look at the execution plan:

What’s that nest loops (inner join) operator? Well, by putting the criteria for the RIGHT (second) table in the WHERE clause we essentially converted our LEFT OUTER JOIN to an INNER JOIN. The correct way to get the data we want would be this way:

SELECT O.customer_id, OI.order_id, OI.product_id, OI.quantity FROM sales.orders AS O LEFT OUTER JOIN sales.order_items AS OI ON O.order_id = OI.order_id AND OI.quantity > 5;

This returns what we would expect to see:

customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3           1           1           7
5           NULL        NULL        NULL
7           3           2           6
8           4           2           11
17          NULL        NULL        NULL

And here is the execution plan:

Where you can see the Nested Loops (left outer join) operator.

So yes, it does matter where you put your criteria when dealing with OUTER JOINS.

Wednesday, September 23, 2009

Venturing into Powershell

As you probably know if you read my blog regularly, I recently put together a presentation called, Dive into the Default Trace, and as part of that presentation I wanted to provide examples of different ways to archive the Default Trace files. Here are the option I considered:

fn_trace_gettable to load the data into a table.
SSIS to archive the files.
PowerShell to archive the files.
xp_cmdshell and the copy command.

I immediately eliminated #4 because I don’t have xp_cmdshell enabled and because Chad Miller (@cmille19) had said in his Powershell talk that anything that requires xp_cmdshell is a good candidate for Powershell. So I sent Chad an email asking him where to start and he was kind enough to help out. I got a “working” script together and sent it off to Chad for review and he kindly told me my errors (there were several) and included a modified script that cleaned up my code. I did make a couple of minor modifications after receiving the script from Chad. The idea and starting point were mine, even if much of the finished product is Chad’s. So here’s the code (please comment if you have any improvements):

# Written with much Assistance from Chad Miller 
# (http://chadwickmiller.spaces.live.com/Blog)
param ($serverName)

#######################
function Get-SqlData
{
    param([string]$serverName=$(throw 'serverName is required.'),
            [string]$databaseName=$(throw 'databaseName is required.'),
              [string]$query=$(throw 'query is required.'))
 
    Write-Verbose "Get-SqlData serverName:$serverName 
                    databaseName:$databaseName query:$query"
 
    $connString = "Server=$serverName;Database=$databaseName;
                    Integrated Security=SSPI;"
    $da = New-Object "System.Data.SqlClient.SqlDataAdapter" ($query,$connString)
    $dt = New-Object "System.Data.DataTable"
    [void]$da.fill($dt)
    $dt
 
} #Get-SqlData

#######################
function Get-TraceInfo
{
    param($serverName)

$qry = @"
SELECT
   RIGHT(T.path, CharIndex('\', 
   Reverse(T.path))-1) as current_file,                      
   Substring(T.path, 1, Len(T.Path) - 
       CharIndex('\', Reverse(T.path))+ 1) as trace_path
FROM
    sys.traces T 
WHERE
    T.is_default = 1
"@

    Get-SqlData $serverName 'master' $qry

} #Get-TraceInfo

#######################
# MAIN
#######################

if ($serverName -eq $null)
    {
        $serverName = $env:COMPUTERNAME
    }

$result = Get-TraceInfo $serverName
$copy_path = $result.trace_path + 'Archive' 

if ((Test-Path -Path $copy_path) -eq $false)
    {
        New-Item -Path $result.trace_path -Name 'Archive' -ItemType directory
    }

Get-ChildItem -Path $result.trace_path -Exclude $result.current_file | 
    where {$_.name -like "*.trc"} | Move-Item -destination $copy_path

Once I knew what cmdlet’s I needed the PowerShell was probably the second easiest one to implement. I know SSIS fairly well and it got a bit complex because of having to exclude the file currently in use. Using fn_trace_gettable is simple because it allows me to use T-SQL which I have many years experience in.

Using PowerShell is really a matter of knowing what cmdlet’s you need, once you have that, the built in help (get-help) is very good, so you can just about do anything. PowerShell is very powerful and flexible. It can actually do so much that I think you need to be careful. For example, the New-Item cmdlet can create just about any item depending on the ItemType you use. '

So while I definitely recommend that any DBA learn some PowerShell I also recommend being very careful and Test, Test, Test!

Friday, July 24, 2009

Learning some javascript

As part of my work I need to be a “Jack of all trades” meaning I not only need to be able to do SQL Server DBA and development tasks, but I also need to be capable of doing .NET and VFP development. Yes, I did say VFP (Visual FoxPro for those who are too young to know what it is).

The major project in which I am involved is using SQL Server and ASP.NET (C#) MVC. Fortunately I had downloaded Scott Guthrie’s NerdDinner tutorial when it first came out and have just finished going through it. In the tutorial I came across some code like this:

if (item.Description !== undefined) {
            description = item.Description;
        }

Well, I had never seen the !== operator before and assumed that it was a typo and was supposed to be:

if (item.Description != undefined) {
            description = item.Description;
        }

since I know that == the equality comparison operator and != is the inequality comparison operator. Well, then I found the same operator later in the code, so I thought, “Hmmm, maybe this isn’t a typo, but a real operator?”, so off to www.google-vs-bing.com to see what I could find. Well, I didn’t find anything helpful on page 1 of either search engine’s results. Next a quick question on Twitter. Ahhh, several replies that, “Yes !== and === are operators and they compare type AND value”. Having always programmed in strongly typed languages this is new and interesting to me. So basically in javascript 1 == “1” is true, but 1 === “1” is false.

I still would rather be in SQL, but at least now I am starting to understand more about javascript.

Saturday, May 9, 2009

An Explanation of Why to Use Stored Procedures

I wish I was a smart as the guys I interact with on sites like SQLServerCentral and Twitter. Paul Nielsen, @PaulNielsen on Twitter, has an excellent blog post today on Why Use Stored Procedures that I think is a must read for everyone involved with SQL Server. I am a huge advocate of stored procedures and Paul does much better job explaining the why’s than I ever could.

So thanks Paul, now I have someplace to point the nay-sayers.

Wednesday, May 6, 2009

Software Developers, Remember that People Need to Use Your Software

As a hybrid DBA\Developer I had the message of the title reinforced through a negative experience with a web site this past week. Please bear with me as I tell a fairly long story/rant to illustrate my point.

The web site I had an issue with is www.sunpass.com. SunPass is Florida’s prepaid toll system and they provide a web site to manage your account. There are 2 devices you can purchase, a portable transponder and a SunPass Mini Sticker transponder which applies to your windshield and is only good for 1 vehicle. My wife and I purchased a SunPass Mini and I went to the web site and activated it and I set up my account so that it would auto-replenish using one of my credit cards when the balance dipped below a certain threshold.

Well, last week we traded in the vehicle that had the Mini in it, so we bought a new Mini for our new vehicle. This is when the trouble started. I went to the SunPass web site to activate the new Mini. I logged into my account and nowhere on my account home page was there a link to activate a new transponder or remove the old transponder. I thought, “Well, the package says I can activate it online and I activated the first one online so I know it has to be here somewhere.” So I double–checked and nope, not there. I went back to the SunPass home page and, lo and behold, the “Existing Customers activate a New Transponder” link was on that page instead of my account home page. Alright that’s not the most intuitive interface, but I only spent about 10 minutes trying to find it, but at least I found it. So I clicked on the link and here is what happened:

Wait a minute, I don’t have a commercial account, what’s going on? So I lookup the customer agreement and here is the definition of a commercial account:

5.2 Commercial Accounts: Primarily for companies or businesses with corporate owned, leased or rented vehicles and/or trailers. A current Federal Employer Identification Number (FEIN) must be provided to open an account of this type. FEIN information must be kept current and may be requested by a SunPass® representative at any time. The minimum opening balance for a Commercial Account is dependent on the User’s estimated monthly usage, but must be a minimum of $50.00. User shall be notified of any changes to the required replenishment amount on their monthly Account Summary Statement. For Commercial accounts to qualify for tax exempt status, User must provide a valid tax exemption certificate at the time of account activation.

Nope, I didn’t provide any of this information when I setup my account so, why am I getting this message? I found the contact us link and sent them a message asking why I was getting the message. As expected I got a non-response response:

Please call customer service

The next day I got a letter from SunPass telling me that I had a negative balance and I needed to call them to pay up. Now I’m not only confused, but I’m a getting a bit ticked off, I mean, I did setup my account to auto-replenish, why isn’t it happening? Back to the web site to make sure auto-replenish is setup, yup, it is. So I call customer service to deal with the issues. Here are the answers I got:

Can’t add transponder because I have a commercial account.
Can’t delete the old transponder.
Auto-replenish isn’t working.

So now to the point of the post. What could the developers/designers done differently to provide a better user experience.

Put account related activities on the account home page. So an add transponder link on my account home page in addition to the main home page would have made life much simpler and intuitive.
Provide a message that conveys the real reason functionality is not available. Tell me that you cannot add a Mini to an existing account, not that I have a Commercial account. In my opinion, the real answer is to not provide the functionality at all. If account type = Commercial do not show the Add transponder link.
Catch errors and tell the user that their efforts were unsuccessful. If something goes wrong in setting up auto-replenishment tell me! If it happens in a batch at the end of the day, send me an email telling me it failed. Don’t wait to send me a letter when I’m already in the hole.

Please keep the end user in mind when designing and developing your applications. Don’t frustrate your customers, whether internal or external, with poor design.

Tuesday, April 21, 2009

JumpstartTV Featured Content

My video, Using fn_trace_gettable to Import Trace Results, is the featured content on JumpstartTV today. Check it out. JumpstartTV is a site that features short (< 5 min) how to videos focused on SQL Server.

Friday, April 3, 2009

Error Handling in SSIS re-featured on SQLServerCentral

I am honored that my first ever published article, Error Handling in SSIS, was re-featured on SQLServerCentral today. I don’t know how Steve selects articles to feature a second time, but I assume it was because they were popular. Here are a couple of earlier blog posts about this article/subject:

Error Handling in SSIS Published

SSIS Error Logging Custom Component

Wednesday, March 25, 2009

What’s ownership chaining?

Edit: 04-27-2009 K. Brian Kelley corrrectly pointed out in a comment that objects can still be owned by users in 2005 and later. See his blog post to see how.

Edit: While the code section is not as nice as I would like it is better. Thanks for sticking with me.

I recently answered a question on SSC about how you granting execute rights on a stored procedure allows you to access tables without having any permissions granted on the tables. The way that SQL Server implements this it through ownership chaining. Here is the BOL (SQL Server 2005) definition of Ownership Chains:

When multiple database objects access each other sequentially, the sequence is known as a chain. Although such chains do not independently exist, when SQL Server 2005 traverses the links in a chain, SQL Server evaluates permissions on the constituent objects differently than it would if it were accessing the objects separately. These differences have important implications for managing security.
Ownership chaining enables managing access to multiple objects, such as multiple tables, by setting permissions on one object, such as a view. Ownership chaining also offers a slight performance advantage in scenarios that allow for skipping permission checks.

How Permissions Are Checked in a Chain
When an object is accessed through a chain, SQL Server first compares the owner of the object to the owner of the calling object. This is the previous link in the chain. If both objects have the same owner, permissions on the referenced object are not evaluated.

Prior to SQL Server 2005, objects were owned by users, and, in most cases, designers and developers used the dbo user to own common objects. In my experience it was very rare for objects to be owned by a “real” user. Thus if the customers table and orders table are both owned by dbo, then you can create a customer_orders_get stored procedure owned by dbo, grant user Joe execute permissions on the stored procedure, and Joe would be able to successfully use the stored procedure to query the customers and orders tables without having select permissions on the tables. This is pretty easy to understand, as long all the objects are dbo.object_name I can access them using ownership chaining.
SQL Server 2005 introduced schemas which removed the use of users as owners. Now objects belong to schemas, not users. From BOL:

Beginning in SQL Server 2005, each object belongs to a database schema. A database schema is a distinct namespace that is separate from a database user. You can think of a schema as a container of objects. Schemas can be created and altered in a database, and users can be granted access to a schema. A schema can be owned by any user, and schema ownership is transferable.

The question now is, how does ownership chaining work when using schemas? As the last line of the BOL quote states, a schema is owned as well, and ownership chaining in SQL Server 2005 and greater is based on schema ownership. So if schema A is owned by dbo then you would be able to use ownership chaining to access objects in dbo from schema A and vice versa. I think the best way to explain is with an example:

USE [master]
GO

-- drop the DB if it exists
IF EXISTS ( SELECT
name
FROM
sys.databases
WHERE
name = N'oc_test' ) 
DROP DATABASE [oc_test]
GO

USE [master]
GO

-- create the test database
CREATE DATABASE [oc_test] ON PRIMARY 
( 
NAME = N'oc_test', 
FILENAME = N'C:\Data\oc_test.mdf' , 
SIZE = 194560KB, 
MAXSIZE = 1024000KB, 
FILEGROWTH = 10240KB 
)
LOG ON 
( 
NAME = N'oc_test_log', 
FILENAME = N'C:\Logs\oc_test_log.ldf', 
SIZE = 204800KB, 
MAXSIZE = 204800KB, 
FILEGROWTH = 10240KB 
)
GO

-- switch to the new databse
USE oc_test ; 

GO

-- drop the user if it exists
IF EXISTS ( SELECT
1
FROM
sys.database_principals AS DP
WHERE
DP.[name] = 'test_user' ) 
BEGIN
DROP USER test_user ;
END

-- create our test user    
CREATE USER test_user WITHOUT LOGIN 
WITH DEFAULT_SCHEMA = dbo ;
GO

-- drop proc and schema
IF OBJECT_ID('ownership_chaining.table_one_ins') 
IS NOT NULL 
BEGIN
DROP PROCEDURE ownership_chaining.table_one_ins ;
END

GO

IF SCHEMA_ID('ownership_chaining') IS NOT NULL 
BEGIN
DROP SCHEMA ownership_chaining ;
END
GO

-- create our test schema, notice it is "owned" by dbo
CREATE SCHEMA ownership_chaining AUTHORIZATION dbo ;

GO

-- drop our test table if it exists
IF OBJECT_ID('dbo.table_one') IS NOT NULL 
BEGIN
DROP TABLE dbo.table_one ;
END

/*
create a test table NOTICE it is in the dbo 
schema which is owned by dbo
*/
CREATE TABLE dbo.table_one
(
column_one INT IDENTITY(1, 1),
column_two VARCHAR(10) NOT NULL
) ;

GO

/* 
Create the test procedure in the 
ownership_chaining schema which 
is owned by dbo
*/    
CREATE PROCEDURE ownership_chaining.table_one_ins
@column_two VARCHAR(10)
AS 
INSERT INTO
dbo.table_one (column_two)
VALUES
(@column_two) ;

RETURN ;

GO

/*
give out test user execut permissions on the 
procedure
*/
GRANT EXECUTE ON ownership_chaining.table_one_ins 
TO test_user ;

GO

-- change user context
EXECUTE AS USER = 'test_user' ;

-- execute the procedure
EXEC ownership_chaining.table_one_ins 
@column_two = 'Test 1' ;

-- go back to my login
Revert ;

-- verify that it worked which it should
SELECT
*
FROM
dbo.table_one AS _TO ;

/*
Now change the owner of the ownership_chaining 
schema so that it is no longer owned by dbo 
thus breaking the chain
*/
ALTER AUTHORIZATION ON SCHEMA::ownership_chaining 
TO db_owner ;

-- change user context
EXECUTE AS USER = 'test_user' ;

/* 
using try - catch so I get the revert as 
I expect failure
*/
BEGIN TRY 
EXEC ownership_chaining.table_one_ins 
@column_two = 'Test 1' ;
END TRY 
BEGIN CATCH
-- show the error message
SELECT
ERROR_MESSAGE() AS error ;

-- return to my user
revert ;
END CATCH 

-- verify data
SELECT
*
FROM
dbo.table_one AS _TO ;

I hope this helps clear up what I found to be a somewhat confusing topic.