Adam Woroniec - Data Engineer

After years of optimizing SQL Server queries, I've learned that some patterns work consistently across different scenarios. If you've worked with SQL Server for any amount of time, you know how hard it is to pin down universal performance rules. However, today I'm happy to share two near-universal query practices that have transformed our reporting performance:

Rule 1: Use OUTER APPLY for "Top N Per Group" Queries

Instead of complex window functions or correlated subqueries, OUTER APPLY is often the cleanest solution for "top N per group" scenarios.

Before (LEFT JOIN with subquery):

-- A sample query where we want to preserve the entire result set, even if a customer has not been invoiced yet.

SELECT c.Name, c.CreationDate, invDate.InvoiceDate AS LastInvoiceDate
FROM Customer c
  LEFT JOIN (
    SELECT 
      MAX(InvoiceDate) AS InvoiceDate, 
      CustomerId
    FROM OrderInvoices 
    GROUP BY CustomerId
  ) invDate ON invDate.CustomerId = c.CustomerId

After (simple OUTER APPLY):

-- Same result, but cleaner and often faster

SELECT c.Name, c.CreationDate, invDate.InvoiceDate AS LastInvoiceDate
FROM Customer c
OUTER APPLY (
  SELECT TOP 1 InvoiceDate
  FROM OrderInvoices oi
  WHERE oi.CustomerId = c.CustomerId
  ORDER BY oi.InvoiceDate DESC
) invDate

The OUTER APPLY approach is often faster and more readable. There are important caveats to consider, including what indexes may already exist. The important thing to remember is that the LEFT JOIN approach will materialize the entire subquery first, before performing hash join operations, which makes the upfront cost very high. The OUTER APPLY execution will process row-by-row, which can be slow; however, it is more memory efficient. Depending on the size of your Customer table and WHERE filters, this will very often win out. I like to think of CROSS APPLY and OUTER APPLY as custom functions that we are applying to each row of our result set. If that sounds like a bad idea for your particular query, then either a LEFT JOIN or refactoring another way may prove better.

Rule 2: Start with #TempTables, Not @TableVariables

This was a game-changer for us. @TableVariables have limitations that #TempTables don't:

No statistics (SQL Server assumes 1 row)
No parallel execution plans
Limited indexing options

The Performance Impact:

@TableVariables: Often results in nested loop joins
#TempTables: Can use hash joins, merge joins, and parallel execution

Here's something I learned the hard way: I used to think I could start with @TableVariables for small datasets and switch to #TempTables later if needed. But that's backwards thinking.

On big datasets, I already knew to use #TempTables. But here's the key insight—because results could often grow as the report evolved, it's generally better to just start with #TempTables from the beginning.

You never know when: • Your "small" dataset will grow • Your report will be copied to use a base for another report • You'll need to add more complex joins

Starting with #TempTables is best practice, even if you don't need the performance benefits immediately. Memorize or add a code snippet to drop the table at the beginning of the query, and you will save time fussing with them during development.

The Bottom Line

These two changes alone transformed our reporting performance. The OUTER APPLY pattern especially was a game-changer for our "top N per group" queries, and switching from @TableVariables to #TempTables with proper indexing eliminated most of our performance bottlenecks.

At any layer, I'm always looking for fundamental design patterns that work consistently across different scenarios.