NuGet Performance Problems: Part I
It appears that NuGet has some performance problems and Jeff Handley posted the problematic queries as well as the new, hand-optimized queries.
You can see the original problematic queries and the optimized code (still tentative) here.
Here is the hand-optimized query to load the pages for the packages page:
SELECT TOP (30)
Paged.PackageRegistrationKey
, Paged.Id
, Paged.Version
, Packages.FlattenedAuthors
, Packages.Copyright
, Packages.Created
, Packages.FlattenedDependencies
, Packages.Description
, PackageRegistrations.DownloadCount
, Packages.ExternalPackageUrl
, N'packages/' + PackageRegistrations.Id + N'/' + Packages.Version AS C1
, Packages.IconUrl
, Packages.IsLatestStable
, Packages.Language
, Packages.LastUpdated
, Packages.LicenseUrl
, Packages.Hash
, Packages.HashAlgorithm
, Packages.PackageFileSize
, Packages.ProjectUrl
, CASE Packages.Listed WHEN 1 THEN Packages.Published ELSE NULL END AS C2
, Packages.ReleaseNotes
, N'package/ReportAbuse/' + PackageRegistrations.Id + N'/' + Packages.Version AS C3
, Packages.RequiresLicenseAcceptance
, Packages.Summary
, CASE WHEN Packages.Tags IS NULL THEN CAST(NULL as varchar(1)) ELSE N' ' + LTRIM(RTRIM(Packages.Tags)) + N' ' END AS C4
, ISNULL(Packages.Title, PackageRegistrations.Id) AS C5
, Packages.DownloadCount AS DownloadCount1
, cast(0 as float(53)) AS C6
FROM (
SELECT Filtered.Id
, Filtered.PackageRegistrationKey
, Filtered.Version
, Filtered.DownloadCount
, row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
FROM (
SELECT PackageRegistrations.Id
, Packages.PackageRegistrationKey
, Packages.Version
, PackageRegistrations.DownloadCount
FROM Packages
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
WHERE Packages.IsPrerelease <> cast(1 as bit)
) Filtered
) Paged
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE Paged.[row_number] > 30
ORDER BY PackageRegistrations.DownloadCount DESC
, Paged.Id
This monster query is actually translated to something like:
Give me the top 30 packages which are not pre released, ordered by the download count and then by their id.
It takes a great deal of complexity to deal with that for one major
reason, the data is split up across multiple tables in a way that makes
it hard get all of it easily. The minor reason is that there is really
no good way to do paging in SQL Server (shocking, I know). One would
assume that such a basic feature would have a bit more attention.
What is worse is the optimized version of the search feature:
SELECT TOP (30)
Paged.PackageRegistrationKey
, Paged.Id
, Paged.Version
, Packages.FlattenedAuthors
, Packages.Copyright
, Packages.Created
, Packages.FlattenedDependencies
, Packages.Description
, PackageRegistrations.DownloadCount
, Packages.ExternalPackageUrl
, N'packages/' + PackageRegistrations.Id + N'/' + Packages.Version AS C1
, Packages.IconUrl
, Packages.IsLatestStable
, Packages.Language
, Packages.LastUpdated
, Packages.LicenseUrl
, Packages.Hash
, Packages.HashAlgorithm
, Packages.PackageFileSize
, Packages.ProjectUrl
, CASE Packages.Listed WHEN 1 THEN Packages.Published ELSE NULL END AS C2
, Packages.ReleaseNotes
, N'package/ReportAbuse/' + PackageRegistrations.Id + N'/' + Packages.Version AS C3
, Packages.RequiresLicenseAcceptance
, Packages.Summary
, CASE WHEN Packages.Tags IS NULL THEN CAST(NULL as varchar(1)) ELSE N' ' + LTRIM(RTRIM(Packages.Tags)) + N' ' END AS C4
, ISNULL(Packages.Title, PackageRegistrations.Id) AS C5
, Packages.DownloadCount AS DownloadCount1
, cast(0 as float(53)) AS C6
FROM (
SELECT Filtered.Id
, Filtered.PackageRegistrationKey
, Filtered.Version
, Filtered.DownloadCount
, row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
FROM (
SELECT PackageRegistrations.Id
, Packages.PackageRegistrationKey
, Packages.Version
, PackageRegistrations.DownloadCount
FROM Packages
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
WHERE ((((Packages.IsPrerelease <> cast(1 as bit)))))
((((AND Packages.IsLatestStable = 1))))
((((AND Packages.IsLatest = 1))))
AND (
PackageRegistrations.Id LIKE '%jquery%' ESCAPE N'~'
OR PackageRegistrations.Id LIKE '%ui%' ESCAPE N'~'
OR Packages.Title LIKE '%jquery%' ESCAPE N'~'
OR Packages.Title LIKE '%ui%' ESCAPE N'~'
OR Packages.Tags LIKE '%jquery%' ESCAPE N'~'
OR Packages.Tags LIKE '%ui%' ESCAPE N'~'
)
) Filtered
) Paged
INNER JOIN PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE Paged.[row_number] > 30
ORDER BY PackageRegistrations.DownloadCount DESC
, Paged.Id
One thing that immediately popped up to me was the use of queries such as "’%jquery%’. This is a flat out killer for performance in relational databases, since they cannot do any indexes on this and are forced to do a table scan.
I decided to take a stab at moving the NuGet data to RavenDB, which is a much better fit (in my own obviously utterly unbiased opinion). In the next post, we will start with the actual import process, then we get to actual queries.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





