about Grape Solutions
and solutions
references, partners
open positions
get to know us

LINQ Union operator bugging

Dercsár Péter2009.12.23.

I recently run into a slight LINQ to SQL bug with the UNION operator that I managed to work around. This seemed to be a good occasion to start a little blog abouth such things.

Let's start with a simple demonstration scenario. I have two LINQ entities: Child and Parent. Parent has an ID and a Title, Child also has and ID, a Title and a foreign key, pointing to a Parent, always exactly one parent. As simple as that.

Now my task is to list the children of 'Parent1', whose name is longer than 10 characters, ordered by that length, descending.
The query is simple:
(from C in Dc.Childs
 where C.Parent.ParentTitle == "Parent1234"
   && C.ChildTitle.Length > 10
 select C).OrderByDescending((c) => (c.ChildTitle.Length));
Now, for some reason, I want at least 5 items to be listed. If there is not enough long-titled item, I want the shorter ones to be selected along, still in order of descending title length.
Using TOP (or LIMIT in MySql) is wrong, becuase we don't want to limit the results to 5 or anything. If I have hundred long-titled items, I want them listed all. I just always want to display at least 5 items.

Now my task is to list the children of 'Parent1', whose name is longer than 10 characters, ordered by that length, descending.

The query is simple:

(from C in Dc.Childs

 where C.Parent.ParentTitle == "Parent1"

   && C.ChildTitle.Length > 10

 select C).OrderByDescending((c) => (c.ChildTitle.Length));

Now, for some reason, I always want at least 5 items to be listed. If there is not enough long-titled item, I want the shorter ones to be selected along, still in order of descending title length.

Using TOP (or LIMIT in MySql) is wrong, becuase we don't want to limit the results to 5 or anything. If I have hundred long-titled items, I want them listed all. I just always want to display at least 5 items. (To make the idea more real-life, imagine a news page: you want to display company news from the last two weeks. But we always want some news to be displayed, so that the page doesn't look ugly.)

So what do you do? In SQL terms, we need every row, that belongs to Parent1 and where at least one of the following is true:

  • the lenght of the current title is longer than ten (in the news portal story: not older than some days)
  • the current row is not the fifth or later row in the result list

If you are in pure SQL, you can have different options. In SQL2005, for example, you could use the row_count() function to count the result rows. But I had to solve it using LINQ. No native queries, sorry.

The only solution I could figure out was to use Union() - I build two queries, one for the first condition and one for the second, then union them. Other LINQ-based solutions are welcome! This is the other topic of this post.

So, here is my solution:

(from C in Dc.Childs // first query: children, whose name is long enough

 where C.ChildTitle.Length > 10

   && C.Parent.ParentTitle == "Parent1"

 select C).OrderByDescending((c)=>(c.ChildTitle.Length))

 .Union(

   (from C in Dc.Childs //second query: 5 children

    where C.Parent.ParentTitle == "Parent1"

    select C).OrderByDescending((c)=>(c.ChildTitle.Length)).Take(5)

 ).OrderByDescending((c)=>(c.ChildTitle.Length))

(I'm not sure, if the outer OrderBy is really needed - anyone there?)
And now comes the real topic. This results in the following error:
"All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists."
Clearly, it's a SQL-error, not a LINQ-related exception. Soon I figured out, what it means: the SELECT-ed columns are not the same in the two queries. Wtf? Let's see the actual generated SQL!
You can't use SQL Profiler here, because it only tracks successful queries. So I wired Kris Vandermotten's nice DebuggerWriter into my app and here is the real SQL that happens in the black box:
SELECT [t4].[ChildID] AS [ID], [t4].[ParentID], [t4].[ChildTitle], [t4].[ParentTitle]
FROM (
    SELECT [t0].[ChildID], [t0].[ParentID], [t0].[ChildTitle], [t1].[ParentTitle]
    FROM [dbo].[Child] AS [t0]
    INNER JOIN [dbo].[Parent] AS [t1] ON [t1].[ParentID] = [t0].[ParentID]
    WHERE (LEN([t0].[ChildTitle]) > @p0) AND ([t1].[ParentTitle] = @p1)
    UNION
    SELECT TOP (5) [t2].[ChildID], [t2].[ParentID], [t2].[ChildTitle]
    FROM [dbo].[Child] AS [t2]
    INNER JOIN [dbo].[Parent] AS [t3] ON [t3].[ParentID] = [t2].[ParentID]
    WHERE [t3].[ParentTitle] = @p2
    ORDER BY LEN([t2].[ChildTitle]) DESC
    ) AS [t4]

You can clearly see the two SELECTs combined with a UNION, and a wrapping SELECT, for some internal reason, whatever. If you run it (providing the two parameters @p0 and @p1), you get the same error, of course. With some fiddling around, you can see, that the second inner SELECT is missing something:
...
SELECT         [t0].[ChildID], [t0].[ParentID], [t0].[ChildTitle], [t1].[ParentTitle]
...
SELECT TOP (5) [t2].[ChildID], [t2].[ParentID], [t2].[ChildTitle]
...
Oops! ParentTitle is missing. There is the point. They select different shaped result sets, therefore they cannot be combined. (Try the query with appending [t3].[ParentTitle] to the 2nd SELECT and all goes well.) LINQ to SQL makes a glitch here, not appending ParentTitle to the second select clause, don't ask me, why.
Luckily, you can force LINQ to use the correct target list by explicitly JOINing the Parent entity:
(from C in Dc.Childs // first query: children, whose name is long enogh
 join P in Dc.Parents on C.ParentID equals P.ID // explicit join to avoid the above bug
 where C.ChildTitle.Length > 10
   && P.ParentTitle == "Parent1"
 select C).OrderByDescending((c)=>(c.ChildTitle.Length))
 .Union(
   (from C in Dc.Childs //second query: 5 children
    join P in Dc.Parents on C.ParentID equals P.ID  // here too
    where P.ParentTitle == "Parent1"
    select C).OrderByDescending((c)=>(c.ChildTitle.Length)).Take(5)
 ).OrderByDescending((c)=>(c.ChildTitle.Length))
And now it works.
Note, that simply joining the other table is not enough, you have to use the joined table for condition (P.ParentTitle instead of C.Parent.ParentTitle) in order to work.
Comments are very welcome: how do you solve it a better/simpler/faster way?
There are no comments.

Leave a comment!

Name: *
Url:
E-mail:
Tag cloud
audio (1) blog (1) browser (1) div (1) html (1) join (1) linq (1) performance optimization (1) Pluto (1) services (1) shortcuts (1) Silverlight (2) svn (1) synthesizer (1) t-sql (1) union (1) video (1) windows (2) windows 7 (1) workstation (1) write (1)
Archive
Dercsár Peti blogja: