How to select distinct values based on a specific predicate using Linq Disctinct method

Selecting distinct values in a Linq query while working on non trivial data need more than a simple call to the extension method Distinct().

Say I have the following xml data, and I want to select only distinct data elements based on their date attribute.

  1. <SOMETING>
  2.     <HISTORY>
  3.         <data date="19/01/10 14:34:00" >1963</data>
  4.         <data date="19/01/10 13:34:00" >1960</data>
  5.         <data date="19/01/10 14:34:00" >1960</data>
  6.         <data date="17/01/10 21:34:00" >1911</data>
  7.         <data date="17/01/10 21:34:00" >1911</data>
  8.         <data date="17/01/10 11:34:00" >1911</data>
  9.         <data date="17/01/10 18:34:00" >1911</data>
  10.         <data date="17/01/10 17:34:00" >1911</data>
  11.     </HISTORY>
  12. </SOMETING>

As you can see there are some duplicates regarding the date attribute, the following Linq query will return all the data elements as an IEnumerable of HistoryDataElement.

  1. class HistoryDataElement
  2. {
  3.     public string Date { get; set; }
  4.     public int Value { get; set; }
  5. }
  6.  
  7. XDocument doc = XDocument.Parse(xml);
  8.  
  9. var dataElements = (from data in doc.Descendants("data")
  10.                                 select new HistoryDataElement
  11.                                 {
  12.                                     Date = data.Attribute("date").Value,
  13.                                     Value = int.Parse(data.Value)
  14.                                 });

If we want distinct results based on the date attribute for example, we have to create a comparison class that implements the interface IEqualityComparer<T>.

Here is a very simple implementation:

  1. class DataExtractorElementComparer : IEqualityComparer<HistoryDataElement>
  2. {
  3.  
  4.     public bool Equals(HistoryDataElement x, HistoryDataElement y)
  5.     {
  6.         return x.Date == y.Date;
  7.     }
  8.  
  9.     public int GetHashCode(HistoryDataElement obj)
  10.     {
  11.         return obj.Date.GetHashCode();
  12.     }
  13. }

With this class in place, we can just pass a new instance of this class to the Distinct method to get the desired results :

  1. var dataElements = (from data in doc.Descendants("data")
  2.                                 select new HistoryDataElement
  3.                                 {
  4.                                     Date = data.Attribute("date").Value,
  5.                                     Value = int.Parse(data.Value)
  6.                                 }).Distinct(new DataExtractorElementComparer());

Notice that this way, you are eliminating all the duplicates that have the same property date, so only the first element of the matching elements is retrieved even If the other properties differ, this is cited by code in the Equals method of the IEqualityComparer<T>.

Bookmark the permalink. RSS feed for this post.

comments powered by Disqus

Swedish Greys - a WordPress theme from Nordic Themepark. Converted by LiteThemes.com.