Up to now I haven't really bothered with how EF is translating my LINQ to SQL queries to raw SQL, because I really haven't had an issue with them. But recently I had one method that returns the data from the database execute for almost 2 minutes.
So here's the query:
var result = (from u in context.Users join a in context.Applications on u.UserID equals a.UserID join w in context.Windows on a.ApplicationID equals w.ApplicationID join l in context.Logs on w.WindowID equals l.WindowID where u.UserID == userID && l.DateCreated >= dateFrom && l.DateCreated <= dateTo group l by new { year = l.DateCreated.Year, month = l.DateCreated.Month, day = l.DateCreated.Day, name = a.Name } into g select g).ToList();
On the first look, there's nothing special about it, join 4 tables, filter the data and group by days. It's worth noting that Logs table had around 40k rows, other tables less than 10 each.
And here's the SQL query that EF produced :
SELECT [Project2].[C5] AS [C1], [Project2].[C1] AS [C2], [Project2].[C2] AS [C3], [Project2].[C3] AS [C4], [Project2].[C4] AS [C5], [Project2].[Name] AS [Name], [Project2].[C6] AS [C6], [Project2].[LogID] AS [LogID], [Project2].[WindowID] AS [WindowID], [Project2].[DateCreated] AS [DateCreated], [Project2].[Keystrokes] AS [Keystrokes], [Project2].[KeystrokesRaw] AS [KeystrokesRaw], [Project2].[DateEnded] AS [DateEnded], [Project2].[UsageID] AS [UsageID] FROM ( SELECT [Distinct1].[Name] AS [Name], [Distinct1].[C1] AS [C1], [Distinct1].[C2] AS [C2], [Distinct1].[C3] AS [C3], [Distinct1].[C4] AS [C4], 1 AS [C5], [Join4].[LogID] AS [LogID], [Join4].[WindowID1] AS [WindowID], [Join4].[DateCreated] AS [DateCreated], [Join4].[Keystrokes] AS [Keystrokes], [Join4].[KeystrokesRaw] AS [KeystrokesRaw], [Join4].[DateEnded] AS [DateEnded], [Join4].[UsageID] AS [UsageID], CASEWHEN ([Join4].[UserID] ISNULL) THENCAST(NULLASint) ELSE 1 ENDAS [C6] FROM (SELECTDISTINCT [Extent1].[Name] AS [Name], 1 AS [C1], DATEPART (year, [Extent3].[DateCreated]) AS [C2], DATEPART (month, [Extent3].[DateCreated]) AS [C3], DATEPART (day, [Extent3].[DateCreated]) AS [C4] FROM [Aplications] AS [Extent1] INNERJOIN [Windows] AS [Extent2] ON [Extent1].[ApplicationID] = [Extent2].[ApplicationID] INNERJOIN [Logs] AS [Extent3] ON [Extent2].[WindowID] = [Extent3].[WindowID] WHERE ([Extent1].[UserID] = @p__linq__0) AND ([Extent3].[DateCreated] >= @p__linq__1) AND ([Extent3].[DateCreated] <= @p__linq__2) ) AS [Distinct1] LEFTOUTERJOIN (SELECT [Extent4].[ApplicationID] AS [ApplicationID1], [Extent4].[Name] AS [Name], [Extent4].[FileName] AS [FileName], [Extent4].[Version] AS [Version], [Extent4].[Description] AS [Description], [Extent4].[Company] AS [Company], [Extent4].[UserID] AS [UserID], [Extent4].[WinName] AS [WinName], [Extent5].[WindowID] AS [WindowID2], [Extent5].[ApplicationID] AS [ApplicationID2], [Extent5].[Title] AS [Title], [Extent6].[LogID] AS [LogID], [Extent6].[WindowID] AS [WindowID1], [Extent6].[DateCreated] AS [DateCreated], [Extent6].[Keystrokes] AS [Keystrokes], [Extent6].[KeystrokesRaw] AS [KeystrokesRaw], [Extent6].[DateEnded] AS [DateEnded], [Extent6].[UsageID] AS [UsageID] FROM [Aplications] AS [Extent4] INNERJOIN [Windows] AS [Extent5] ON [Extent4].[ApplicationID] = [Extent5].[ApplicationID] INNERJOIN [Logs] AS [Extent6] ON [Extent5].[WindowID] = [Extent6].[WindowID] ) AS [Join4] ON ([Distinct1].[Name] = [Join4].[Name]) AND ([Join4].[UserID] = @p__linq__0) AND ([Join4].[DateCreated] >= @p__linq__1) AND ([Join4].[DateCreated] <= @p__linq__2) AND (([Distinct1].[C2] = (DATEPART (year, [Join4].[DateCreated]))) OR (([Distinct1].[C2] ISNULL) AND (DATEPART (year, [Join4].[DateCreated]) ISNULL))) AND (([Distinct1].[C3] = (DATEPART (month, [Join4].[DateCreated]))) OR (([Distinct1].[C3] ISNULL) AND (DATEPART (month, [Join4].[DateCreated]) ISNULL))) AND (([Distinct1].[C4] = (DATEPART (day, [Join4].[DateCreated]))) OR (([Distinct1].[C4] ISNULL) AND (DATEPART (day, [Join4].[DateCreated]) ISNULL))) ) AS [Project2] ORDERBY [Project2].[C1] ASC, [Project2].[C2] ASC, [Project2].[C3] ASC, [Project2].[C4] ASC, [Project2].[Name] ASC, [Project2].[C6] ASC
Whoa, no wonder it took almost 2 minutes to execute this , select distinct is not very efficient and when I'm writing raw SQL, I almost never use it.
So I fired up the LinqPad and tried to get the results I needed without the distincts:
select a.Name, l.DateCreated, l.DateEnded, filter.ayear, filter.amonth, filter.aday from Uzers u join Aplications a on u.UserID = a.UserID join Windows w on a.ApplicationID = w.ApplicationID join Logs l on w.WindowID = l.WindowID join (select a.Name as AppName , DATEPART (year, l.DateCreated) as AYear , DATEPART (month, l.DateCreated) as AMonth , DATEPART (day, l.DateCreated) as ADay from Uzers u join Aplications a on u.UserID = a.UserID join Windows w on a.ApplicationID = w.ApplicationID join Logs l on w.WindowID = l.WindowID where u.UserID = 1 and l.DateCreated > '01-01-2014'and l.DateCreated < '01-01-2015'groupby a.Name , DATEPART (year, l.DateCreated) , DATEPART (month, l.DateCreated) , DATEPART (day, l.DateCreated)) as filter on filter.AppName = a.name and filter.Ayear = DATEPART (year, l.DateCreated) and filter.amonth =DATEPART (month, l.DateCreated) and filter.aday = DATEPART (day, l.DateCreated ) where u.UserID = 1 orderby filter.ayear, filter.amonth, filter.aday
This took ~40secs, that is almost 3 times faster, but still not good enough.
I guess it could be optimized even further, but I decided to drop the grouping in the SQL and use it to only do initial filtering and group in code after.
var result = (from u in context.Users join a in context.Applications on u.UserID equals a.UserID join w in context.Windows on a.ApplicationID equals w.ApplicationID join l in context.Logs on w.WindowID equals l.WindowID where u.UserID == userID && l.DateCreated >= dateFrom && l.DateCreated <= dateTo orderby l.DateCreated select l).Include(l => l.Window.Application).ToList(); var grouped = (from l in result group l by new { year = l.DateCreated.Year, month = l.DateCreated.Month, day = l.DateCreated.Day, name = l.Window.Application.Name } into g select g).ToList();
And now this method executes for ~5 seconds. Not bad, from 2 minutes down to 5 seconds is good enough reason to drop the group by in the queries.
Also, this made me pay more attention to what the EF is doing, so now I have the following code in my Context constructor.
#if DEBUG Database.Log = FlushSql; #endifprivatevoid FlushSql(string s) { Console.WriteLine(s); }
From EF 6.0 onwards the Database.Log property accepts an Action