Linq and Expressions

Olav Nybø

Recently I was writing some Entity Framework linq queries at a client. Nothing unusual about that, but the compiler was complaining: "An expression tree lambda may not contain a null propagating operator". Sure enough I had included a '?' aka "null propagating operator in my query. I thought I needed it to make sure the code wouldn't throw an error at me.

The data I was querying was structurally equivalent to the following example:

It's a list of departments and employees (people) working in those departments. Each department may or may not have a boss which is a person with the IsDepartmentHead property true. Ok, this is certainly not the best way to model this, but for the sake of the example bear with me.

So I wanted a query that gives a list of department names and the name of the department head if there is one.

In SQL this could be written as something like this:

SELECT dep.Name as DepartmentName,  
    (SELECT top 1 p.Name as Head FROM People p where p.Department_Id = dep.Id AND p.IsDepartmentHead = 1)
FROM [Departments] dep  

My first shot at this in Linq was like this:

The compiler is complaining and tells me that I cannot have a null propagating operator.
Why is that?
And if I remove it, will it not crash and burn if there should be a department with no department head? I mean, I am dereferencing the Name of the first employee that has the IsDepartmentHead flag set to true and there could be none.

One way of "fixing" the query is to do add a ToList() before the projection (the Select()):

But what we have done now is querying for all departments and then turned the result into an in-memory list that we then project in code. And the nested query for the department head is done in code, not in the database.
This could work for small datasets, but why can we do it in memory and not as a query?

What is the difference?

The difference is in the types and the signatures of the two different Select functions of these two types.
In case of the pure entity framework linq query, the type is an IQueryable, when we put a ToList() before the Select() we get an IEnumerable instead.

When we have an IEnumerable the Select is an function that takes a Func<TSource, TResult> as argument, when we have an IQueryable then the Select expects a Expression<Func<TSource, TResult>> argument in stead.

In other words:
the argument that we pass to the Select statement when doing a Entity Framework Linq query is not code that is actually executed. It is just an expression object that is used to describe to Entity Framework the query that we want it to generate.

This is both why the compiler complains if we have the null propagating operator in the argument we pass to Select and also why we don't need it.
The generated SQL has no issues with some departments not having a department head.

To see the generated SQL we can set the Log delegate on the Entity Framework Context.
If we simply set it to the TraceInformation function from the System.Diagnostics.Trace then SQL will be conveniently be output to the output pane of Visual studio when debugging the app.

The code looks like this:

And sure enough the generated SQL is equivalent to the SQL I wanted (I have adjusted it a bit for readability, but essentially this is what is generated):

SELECT  
    Id, 
    dep.Name, 
    DepartmentHead
    FROM  Departments as dep
    OUTER APPLY  (SELECT TOP (1) 
        Name as DepartmentHead
        FROM People
        WHERE (dep.Id = People.Department_Id) AND (IsDepartmentHead = 1) ) as Employees