This blog is moved to
http://amalhashim.wordpress.com

Sunday, April 11, 2010

LINQ Performance Benchmark

Yesterday I was reading “LINQ To Objects Using C# 4.0” by Troy Magennis and find a good LINQ usage example. I got curious about how the LINQ implementation will perform and thought of writing this article.

The example is related to an entity having 3 fields State, LastName and FirstName. The aim is to Group the objects by state and sort by LastName. For calculating the execution time I am using Stopwatch class in System.Diagnosis namespace.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.IO;

namespace ConsoleApplication2
{
public class Customer
{
public string State { get; set; }
public string LastName { get; set; }
public string FirstName { get; set; }
}

class Program
{
static void Main(string[] args)
{
List<Customer> custList = FillListAndGetList();

Stopwatch sw = new Stopwatch();
sw.Start();
#region C# 2.0 Approach

///Sorting by LastName
custList.Sort(
delegate(Customer c1, Customer c2)
{
if (c1 != null && c2 != null)
return string.Compare(c1.LastName, c2.FirstName);
return 0;
});

///Sort and Groupby State [SortedDictionary]
SortedDictionary<string, List<Customer>>
sortedCust = new SortedDictionary<string, List<Customer>>();

foreach (Customer c in custList)
{
if (!sortedCust.ContainsKey(c.State))
sortedCust.Add(c.State, new List<Customer>());

sortedCust[c.State].Add(c);
}

foreach (KeyValuePair<string, List<Customer>>
pair in sortedCust)
{
Console.WriteLine("State : " + pair.Key);

foreach (Customer c in pair.Value)
Console.WriteLine("Last Name : {0}" +
"FirstName : {1} ", c.LastName,
c.FirstName);
}

#endregion

sw.Stop();
File.WriteAllLines("C20", new string[] { sw.ElapsedTicks.ToString() });

custList = FillListAndGetList();
sw.Reset();
sw.Start();

#region LINQ Approach

var query = from c in custList
orderby c.State, c.LastName
group c by c.State;

foreach (var group in query)
{
Console.WriteLine("State : " + group.Key);

foreach (Customer c in group)
Console.WriteLine("Last Name : {0}" +
"FirstName : {1} ", c.LastName,
c.FirstName);
}

#endregion

sw.Stop();
File.WriteAllLines("C20", new string[] { sw.ElapsedTicks.ToString() });
}

private static List<Customer> FillListAndGetList()
{
List<Customer> custList = new List<Customer>();
for (int i = 0; i < 1000000; i++)
{
Customer c = new Customer();
if (i % 2 == 0)
{
c.State = "State2";
c. FirstName = "FirstName2" + i.ToString();
c.LastName = "LastName2" + i.ToString();
}
else if (i % 3 == 0)
{
c.State = "State3";
c.FirstName = "FirstName3" + i.ToString();
c.LastName = "LastName3" + i.ToString();
}
else if(i % 7 == 0)
{
c.State = "State7";
c.FirstName = "FirstName7" + i.ToString();
c.LastName = "LastName7" + i.ToString();
}
else if (i % 11 == 0)
{
c.State = "State11";
c.FirstName = "FirstName11" + i.ToString();
c.LastName = "LastName11" + i.ToString();
}
else
{
c.State = "OtherState";
c.FirstName = "FirstName" + i.ToString();
c.LastName = "LastName" + i.ToString();
}

custList.Add(c);
}

return custList;
}
}
}

Once I ran the application, I got the following result.
C# 2.0 version : 627373008
LINQ version : 692116906
LINQ version has a slight performance problem, but compared to the chunk of code that has gone in 2.0 I think LINQ is far better.

2 comments:

Unknown said...

In the for each version, your sorting delegate function is comparing lastname of c1 with firstname of c2. Please fix this and consider redoing the benchmark.

return string.Compare(c1.LastName, c2.LastName);

Unknown said...

Also, to stay fair in for each you are first sorting by lastname and then my state (sorted dictionary)
you should do the same in linq to.
The correct linq should be

foreach (var group in custList.OrderBy(c=>c.LastName).ThenBy(c=>c.State).GroupBy(c=>c.State))

(you may write it in your equivalent from... format, I am more used to this 1)

Interestingly now linq takes slightly less time then for each.

(FYI while doing benchmarks, console/io should be ignored)

Sometimes it just depends on how you code, though your output would be the same, but implementation could be inefficient