Thursday, May 13, 2010

Let’s Play, “Why did it do that?”

On my current project I decided to be flexible and allow the use of Linq to SQL to try to speed up development.  This will not be a large, heavily used database, so squeezing every drop of performance out of the database is not a big issue.  One of the reasons I agreed to it was so that I could see Linq to SQL in a real-world project and the queries generated.  Well, I saw my first “Why did it do that?” query.

Here’s a similar, simplified schema:

CREATE TABLE dbo.students
(
student_id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
last_name VARCHAR(20),
first_name VARCHAR(30),
birth_date smalldatetime
)
CREATE INDEX IX_last_name_first_name ON dbo.Students(last_name, first_name);

CREATE TABLE dbo.classes
(
class_id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
class_name VARCHAR(25)
)
CREATE INDEX IX_class_name ON dbo.classes(class_name);

CREATE TABLE dbo.student_classes
(
student_class_id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
student_id INT REFERENCES dbo.students(student_id),
class_id INT REFERENCES dbo.classes(class_id)
)
CREATE UNIQUE INDEX UX_student_id_class_id ON dbo.student_classes(student_id, class_id)



Here’s what Linq generated (modified to fit the new schema):



exec sp_executesql 
N'SELECT
[t0].[student_ID] AS [Id],
[t0].[First_Name] AS [FirstName],
[t0].[Last_Name] AS [LastName],
[t1].[student_class_ID] AS [Id2],
[t1].[class_id] AS [ClassId],
[t1].[class_name] as [ClassName]
(
SELECT
COUNT(*)
FROM
[dbo].[student_classes] AS [t3] INNER JOIN
[dbo].[classes] AS [t4] ON
[t3].[class_id] = [t4].[class_id]
WHERE
[t3].[Student_ID] = [t0].[Student_ID]
) AS [value]
FROM
[dbo].[students] AS [t0] LEFT OUTER JOIN
(
[dbo].[student_classes] AS [t1] INNER JOIN
[dbo].[classes] AS [t2] ON
[t1].[class_id] = [t2].[class_id]
) ON
[t1].[student_ID] = [t0].[student_ID]
WHERE
([t0].[Last_Name] LIKE @p0) AND
([t0].[First_Name] LIKE @p1)
ORDER BY
[t0].[student_ID],
[t1].[student_class_ID],
[t2].[class_id]'
,
N'@p0 varchar(8000),@p1 varchar(8000)',
@p0 = 'Abb%', @p1 = 'Abb%'

Note the correlated sub-query to return the count of classes by student. This is not in the Linq query in .NET, but is added by the framework because of the one-to-many relationship between students and student_classes. Basically the student object contains an EntitySet of student_class objects and it looks like L2S needs to know the number of objects in the list. If this is needed you can return it by replacing the correlated sub-query with a CTE or derived table. Here’s an example of what I’d write:

SELECT
S.student_id,
S.last_name,
S.first_name,
SC.student_class_id,
SC.class_id,
C.class_id,
C.class_name,
CC.class_count
FROM
dbo.students AS S LEFT JOIN
(
dbo.student_classes AS SC JOIN
dbo.classes AS C
ON SC.class_id = C.class_id
)
ON S.student_id = SC.student_id LEFT JOIN
(
SELECT
SC2.student_id,
COUNT(SC2.student_class_id) AS class_count
FROM
dbo.student_classes AS SC2
GROUP BY
SC2.student_id) CC
ON S.student_id = CC.student_id
WHERE
S.last_name LIKE 'Abb%' AND
S.first_name LIKE '%'
ORDER BY
S.student_id,
SC.student_class_id,
C.class_id

This eliminates the correlated sub-query and eliminates a second scan of the classes table.  I’m still trying to understand why Linq needs the count.  Any other suggestions?

No comments:

Post a Comment

So what do you think I am?