“ZLinq”, a Zero-Allocation LINQ Library for .NET
The special syntax is really just syntactic sugar on top of all this that makes things a little bit more readable for complex queries because e.g. you don't have to repeat computed variables every time after binding it once in the chain. Consider:
from x in xs
where x.IsFoo
let y = Frob(x)
where y.IsBar
let z = Frob(y)
where z.IsBaz
order by x, y descending, z
select z;
If you were to rewrite this with explicit method calls and lambdas, it becomes something like: xs.Where(x => x.IsFoo)
.Select(x => (x: x, y: Frob(x)) }
.Where(xy => xy.y.IsBar)
.Select(xy => (x: xy.x, y: xy.y, z: Frob(xy.y)))
.Where(xyz => xyz.z.IsBaz)
.OrderBy(xyz => xyz.x)
.ThenByDescending(xyz => xyz.y)
.ThenBy(xyz => xyz.z)
.Select(xyz => xyz.z)
Note how it needs to weave `x` and `y` through all the Select/Where calls so that they can be used for ordering in the end here, whereas with syntactic sugar the scope of `let` extends to the remainder of the expression (although under the hood it still does roughly the same thing as the handwritten code).It would be even better if it supported exposing pattern matching variables and null safety annotations from where clauses to the following operations but I guess it's hard to translate it to methods.
Something like this:
from x in xs
where x is { Y: { } y }
select y.z
Another feature I'd like to see is standalone `where` without needing to add `select` after it like in VB.net.One feature I'd like to see is integration with foreach so that you don't have to repeat the variable and come up with a different name to work around shadowing rules. I.e. instead of:
foreach (var x in from x0 ...)
it would be nice to be able to write simply: foreach (from x in ...)
and have it "just work", including more complicated cases with multiple nested from-clauses, let etc (effectively extending the scope of all of those into the body of foreach).And Ruby doesn't even enter this conversation if we're talking about these kinds of optimizations - it's an order of magnitude away from what you're aiming from if you're unrolling sequence operations in C#.
LINQ syntax is just sugar and probably a design mistake
I find that the term "LINQ" these days tend to mean the extensions on IEnumerable/IQueryable, and not the special query syntax. Whatever the term meant when it was launched is now forgotten. Almost no one uses the special query syntax, but everyone uses the enumerable/queryable extension methods like Select() etc, and calls it "Linq".
Almost no one uses the special query syntax
Source? I'm in multiple current, active development projects with companies, they are all using the LINQ query syntax.
Not to mention all of the legacy code out there that is under active maintenance.
To say almost no one feels very much antithetical to my (albeit anecdotal) experience. I can't imagine, I'm the only c# consultant that has multiple clients that use LINQ queries extensively throughout their applications.
[1] https://github.com/search?q=%2F%28%3F-i%29%5Cs%2Bselect%5Cs%2B%5Cw%2B%3B%2F+language%3AC%23&type=code&ref=advsearch
[2] https://github.com/search?q=%2F%28%3F-i%29%5Cs%2B%5C.%28Select%7CWhere%29%5C%28%2F+language%3AC%23&type=code&ref=advsearch
But I would also say that the companies using query syntax are also vastly underrepresented on GitHub public repositories.
It's likely a lot more common in fields where there are databases than where there are not.
I think beyond that, it's probably more common with developers that came from doing SQL queries but "needed" type safety.
I've worked with developers that didn't even know the extension methods existed. They went from SqlCommand to LINQ to EF or LINQ to SQL.
The only time I think query syntax is better than the extension methods is when dealing with table joins.
The only time I think query syntax is better than the extension methods is when dealing with table joins.
Fairly niche, but query syntax is a great approximation of Haskell's do notation:
https://github.com/louthy/language-ext/wiki/Thinking-Functio...
EDIT: updated URL
"Whatever the term meant when it was launched is now forgotten"
Language INtegrated Query. The SQL query isn't written inside a string, opaque and uncheckable, it's part of the C# language which means the tooling can autosuggest and sanity check database table names and field names against the live database connection, it means the compiler is aware of the SQL data types without manually building a separate ORM/layer.
That people who don't use C# think it's just a Microsoft way to write a lambda filter on an in-memory list is sad.
IIRC it was to implement a constraint solver, which I couched in monadic terms somehow, don't remember the details. Not sure if I'd do it the same way again, but I did get it to work.
Few people even knew how to use it or what monads were, it was a huge issue when onboarding people. When the initial masochist that inflicted this on the codebase left, and stop enforcing the madness, half of the codebase dropped it, half kept it, new people kept onboarding and squinting through it. This created huge pieces of shit glue code that was isolating the monadic crap everyone was too afraid to touch. Worst part was that even if you knew monads and were comfortable with them in other languages they just didn't fit - and it made writing the code super awkward.
Not to mention debugging that shit was a nightmare with the Result + Exceptions - worst of both worlds.
It's basically writing your own DSL by repurposing LINQ syntax - DSLs are almost always a bad idea, abusing language constructs to hack it in makes it even worse.
What makes people love Linq is that it handles 2 different cases (identical syntax but different backing objects).
1: The in-memory variant does things lazily, select/aggregate/where produce enumeration objects so after the chain you add ToArray, ToList, ToDictionary,etc and the final object is built lazily with most of the chain executed on as few objects as possible (thus if you have an effective Where at the start, the rest of the pipeline will do very little work and very few allocations).
2: The compiler also helps the libraries by providing syntax-tree's, thus database Linq providers just translates the Linq to SQL and sends it off to the server, letting the server do the heavy lifting _with indexes_ so that we can query tables of arbitrary sizes with regular C# Linq syntax very quickly without most of it never going over the network.
I always liked how the C# team took inspiration from other language ecosystems. Usually they do it with a lot more taste than the C++ committee. The suppose the declarative linq syntax gives the compiler some freedom of optimization, but I feel Ruby's do syntax makes higher order functions shine in a way that's only surpassed by functional languages like Haskell or Lisp.
Of course, if it's really a hot path like matrix multiplication then it makes total sense, but avoiding LINQ gives unpleasant side effect: loss of code soundness and quality.
In languages that don't have expression inspection capabilities you have to replace the `(p) => p.Price < 100` part with something that is possible for the language to inspect.
Normally it's strings or something using a builder pattern.
For example, in TypeORM:
queryBuilder.where("product.price < :price", { price: 100 })
And in Mongoose: Product.find({ price: { $lt: 100 } });
The LINQ-ish version would be: Product.find((p) => p.price < 100);
--Similarly, for Ruby on Rails:
Product.where("price < ?", 100)
Ruby's Sequel overloads operators to have a more natural syntax: DB[:products].where { price < 100 }
But the "lambda" syntax would be: Product.where { |p| p.price < 100 }
JavaScript ORMs would be revolutionized if they had this ability.
Is this possible in JavaScript?
Iterators: LINQ works on any type that supports iterators. In most languages, this is any type that you can write a for (foreach) loop on and perform an operation on each item in a collection / array / list. (In C#, the collection must implement the IEnumerable<T> interface.)
Lambda functions: LINQ then relies heavily on Lambda functions, which are used as filters or to transform / narrow down data. Most languages also have something similar to these.
Generics: C# allows for "list of foo objects" instead of "list of objects that I have to typecast to foo." Although not explicitly required to implement something LINQ-like in other languages, the compiler verifying type helps with autocomplete and in-IDE suggestions; and helps avoid silly typing bugs.
Generic inference: C# can infer the return type from a lambda function, and infer the argument type in a lambda function. This means you don't need to decorate LINQ syntax with type information; except in some rare corner cases.
This is why, for example, there are LINQ-like libraries in Javascript and Rust. Java supports something that is LINQ-like, although in my limited Java experience, I didn't use it enough to really "get the hang" of it.
---
Note that LINQ has a very serious pitfall: It's easy to accidentally build a filter, and then have a lot of overhead re-running an expensive operation to re-load the source collection. The simplest way to avoid this is to call .ToArray() or .ToList() at the end of the chain to ensure that you store the result in a collection once.
Extension methods allow LINQ to be implemented as library over all the existing collection types instead of needing child types or refactoring the core collections library.
1. You can extend other people's interfaces. If you care about method chaining, _something_ like that is required (alternative solutions include syntactic support for method chaining as a generic function-call syntax).
2. The language has support for "code as data." The mechanism is expression trees, but it's really powerful to be able to use method chaining to define a computation and then dispatch it to different backends, complete with optimization steps.
3. The language has a sub-language as a form of syntactic sugar, allowing certain blessed constructs to be written as basically inline SQL with full editor support.
Compare C# ORMs to JS/TS for example. In C#, it is possible to use expression trees to build queries. In TS, the only options are as strings or using structural representation of the trees.
Compare this:
var loadedAda = await db.Runners
.Include(r => r.RaceResults.Where(
finish => finish.Position <= 10
&& finish.Time <= TimeSpan.FromHours(2)
&& finish.Race.Name.Contains("New")
)
)
.FirstAsync(r => r.Email == "ada@example.org");
To the equivalent in Prisma (structural representation of the tree): const loadedAda2 = await tx.runner.findFirst({
where: { email: 'ada@example.org' },
include: {
races: {
where: {
AND: [
{ position: { lte: 10 } },
{ time: { lte: 120 } },
{
race: {
name: { contains: 'New' }
}
}
]
}
}
}
})
Yikes! Look how dicey that gets with even a basic query!1: IEnumerable<T> that works lazily in-memory (and similar to the authors improvement) can be done in any language with first class functions, see the authors linq.js or Java's streams library (it's not entirely the same as a chain of map/reduce/filter since it's lazy but that's mostly not a drawback since it improves performance by removing temporary storage objects).
2: IQueryable<T> is the really magical part though, by specifying certain wrapper types the compiler is informed that the library expects an bound syntax tree(AST) of the expression you write, the library can then translate the syntax tree to SQL queries and send the query to the server to be processed.
Thus huge tables can be efficiently queried by just writing regular C# and never touch SQL. In most ORM's it's annoying or have impedance mistmatches but with EF you can write code and be fairly certain that it'll produce a good SQL query since the entire Linq syntax is fairly close to SQL.
from x select x.name
And other is just lambda with anonymous types and so on.
For the lambda syntax, you can just do this: https://www.npmjs.com/package/linq
Of course, if you want to run this against a query provider, you do need compiler support to instead give you an expression tree, and provider to process it and convert them to a language (often sql) that database can understand.
There seems to be some transpilers, or things like that - but i don't know what the state of the art is on this: https://github.com/sinclairzx81/linqbox
I rely heavily on LINQ calls in a .NET (Core) Web App, should I replace these with Zlinq calls?
Or is this only helpful in case you want to do LINQ operations on let's say 1000's of objects that would get garbage collected at the end of a game loop?
I guess this library will at some point end up unmaintained after author is bored with it.
So I would not use it in any of my production code of a web app unless I get some problem I need to fix with this library specifically. Replacing all just because “it is faster” doesn’t seem good enough.
I have been using .NET (and LINQ) for many years on a daily basis, and I've yet to run into performance problems that can't be fixed by either rewriting the LINQ statement or do some other quick workarounds.
But will I try out ZLinq? Sure, but I won't create anything that depends on it.
This is to the Queryable/Enumerable extensions what ValueTask is to Task, or ref struct to struct etc. If you are the type of developer that sees great benefit switching from Task to ValueTask then you will probably find this useful too.
Backwards compatibility, security, edge cases, downstream effects on other libraries that are reliant on LINQ, etc.
One guy with an optional library can break things. If the .NET team breaks things in LINQ, it's going to be a bad, bad time for a lot of people.
I think Evan You's approach with Vue is really interesting. Effectively, they have set up a build pipeline that includes testing major downstream projects as well for compatibility. This means that when the Vue team build something like "Vapor Mode" for 3.6, they've already run it against a large body of community projects to check for breaking changes and edge cases. You can see some of the work they do in this video: https://www.youtube.com/watch?v=zvjOT7NHl4Q
I know of two examples:
1. Fedora in collaboration with GCC maintainers keep GCC on the bleeding edge so it can be used to compile the whole Fedora corpus. This validates the compiler against a set of packages which known to work with the previous GCC
2. I think the rust team also builds all crates on crates.io when working on `rustc`. It seems they created a tool to achieve that: https://github.com/rust-lang/crater
I would assume the .NET guys have something similar already but maybe there’s not enough open code to do that
C# has multiple technologies built to deal with ABI (though it probably all goes unused these days with folder-based deployments, you really need the GAC for it to work).
Every release has a fairly decent amount of fixes and additions from outside contributors, and while I can see a lot of to/fro on the PRs to get them through, it's probably not quite as bad as you'd expect.
The Task library has successfully added ValueTask but it took some doing. LINQ on the other hand can be replaced with unrolled loops or libraries more easily so the pressure just hasn't been there.
I could see something happening in the future but it would take a lot of be work.
The way LINQ currently works by default makes aggressive use of interfaces like IEnumerable to hide the actual types being iterated over. This has performance consequences (which is part of why ZLinq can beat it) but it has advantages - for example, the same implementation of Where<T>(seq) can be used for various T's instead of having to JIT or AOT-compile a unique body for every distinct class you iterate over.
From looking at ZLinq it seems like it would potentially have an explosion of unique generic struct types as your queries get more complex, since for it to work you potentially end up with types vaguely resembling Query3<Query2<Query1<T>>>>. But it might not actually be that bad in practice.
Adding another enumerable type would be a very large change that could effectively double the API surface of the entire ecosystem. This could take some time. Some places still don't even support Span<T>. Also there were some design decisions related to Linq where the number of overloads were a consideration.
Adding this API to .NET could probably be done with that extension method that converts to ValueEnumerable. But without support for that enumerable, this would pretty much be a walled garden where you have to convert back and forth between different enumerable types. Not that great if you'd ask me, but possible I guess.
There's an official process for API change requests: https://github.com/dotnet/runtime/blob/main/docs/project/api...
Edit: what's also nice is that C# recognizes Linq as a contract. So long as this has the correct method names and signatures (it does), the Linq syntax will light up automatically. You can also use this trick for your own home-grown things (add Select, Join, Where, etc. overloads) if the Linq syntax is something you like.
[1] https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotn...
Some notes on why this is so here: https://github.com/dotnet/runtime/blob/main/docs/design/core...
These types are all .NET interfaces, which are reference types, so they're allocated on the heap. .NET's escape analysis can sometimes move reference types to the stack, but this feature is currently very limited and didn't even exist until .NET 9.
ZLinq uses generic structs to prevent these allocations at the expense of some really verbose intermediate types.
Writing it was a lot of fun. Debugging compiler errors were a lot less fun because the template language of C++ has no static typing, so errors would be triggered very deep in the expression tree. The expression tree got processed and inlined at compile time so there was no to minimal overhead at runtime.
I was very impressed with GCC's inlining and vectorization. Especially the messages that explained why it could not vectorize.
One of the biggest sources of allocation is lambda captures, like when you write something like this
var myPerson = people.First(x=>x.Name==myPersonName);
in this case, a phantom object is allocated, that captures the myPersonName variable, for which a delegate is allocated, which is then passed to the First() method, making the number of allocations per call a minimum of 2. I don't see ZLinq doing anything about this..NET 10 takes a step in that direction[1].
[1] https://learn.microsoft.com/en-us/dotnet/core/whats-new/dotn...