Release notes here.
This week hasn’t exactly gone as I expected, but it’s been very productive. I had planned on working on the lobby, first of all, but then some performance-unfriendly saves came to light and I decided I’d work on that instead. The biggest hog in large battles is the vis-layer movement of ships around, and last release I talked about how I was going to look into System.Numerics and DrawMeshInstanced to solve that. I also basically decided to upgrade to Unity 2018.2, even though that’s still in beta, because it has some things we need.
Well, that didn’t happen either!
Badger fixed up the “unit testing” program that we have for the sim layer, and for the first time I fired that up. It’s an area that was previously out of my domain, but that’s been expanding a bit lately just due to necessity. At any rate, I spent almost all of this week on performance improvements to the sim layer.
Badger also fixed some notable bugs, such as the Macrophage not actually doing damage when they attack. That concludes my summary of the release notes other than to talk about performance.
Enjoy!
Chris
I again wanted to mention: we have a new Steam Developer Page. If you go there and follow us, you’ll be notified about other upcoming releases (including this one, of course).
Performance Hunting
I’ve tried using three different profilers in this period: NProfiler (which is awful despite promising big things), JetBrains dotTrace (which seems fine), and RedGate ANTS (which is maybe a bit better, but it’s hard to be sure).
At first these tools were lobbing up really juicy bits for me that I was able to majorly optimize, leading to quite a bit of savings. I spent way longer than I expected just trying to optimize squareroot again for our use cases, and finally cut that to a tenth or less of the load it used to represent.
I thought I was going to have to create a new form of data structure for tracking lists of entities in our code, and I came up with one in my head that I haven’t implemented yet (a wrappered, pooled, linked-list structure that is super fast at adding, removing, and iterating, but has no random access possible). It turns out that the things that I thought were going to require that MAY have been a profiling artifact, but the vote is still out on that. I’m undecided on whether or not I need to make an adjustment there.
At the moment, what I am winding up finding is a suspicious “speed limit” on the sim code that is related to the framerate in some fashion (and no, it’s not any of the obvious things; in this case it’s a virtual framerate, but that still adjusts the speed limit). At any rate, that’s the next thing I need to dig into, because I think no other changes I make will show a result at the moment because all the background threads are presently running below that speed limit, making it the limiting factor. Some of the later performance improvements I made show up with no benefit in actual gameplay yet, but they show up fine in unit testing if I set the virtual framerate really low. Fun for soon.
One of the things that I’ve observed is that the background threads aren’t hitting the other processors on my CPU as much as I expected, which was suspicious to me. I’ve gone in and looked around, and my first thought was that our threads are spending too long transitioning from idle to active. I’m still not sure that isn’t the case. We’re using Thread.Sleep(1) in order for them to wait while being alive and then turn on as soon as a bool is set that says “your data is here — now go!”
The problem is… apparently Thread.Sleep doesn’t guarantee that it will only wait one ms. Instead it will apparently average 12-15 ms. That is an eternity! No wonder things are not very busy on the secondary processors. So that’s no go.
I started using SpinWait to spin the cpu instead of Thread.Sleep(1), and that does indeed peg the CPU at 100, but there’s 88% wasteage on spinning according to the profilers when that happens. That’s going to slow down the main thread and lose framerate as well as making the other threads slower to sync, too. So that’s really kind of a no-go.
I need to figure out what that mysterious “speed limit” in our code is and get rid of that, and that will solve a lot of the problem. Other than that, though, I’ve got to figure out a way for the multithreading to be a bit more snappy in when it does things and stops doing things. Right now it’s 12-15ms at best from the word go to it actually doing anything (on almost a dozen background threads, individually).
We could supposedly use the Monitor class to help with synchronization, but I’ll be honest that I don’t yet fully understand how that would best be used while not pegging the CPU. Offhand, it sounds like using objects to lock against and monitor instead of using a bool to check against — still one per thread — but I’m not positive. Any multithreading-in-C# experts in the crowd that want to help out? Either with some explanations or taking a look at our code, or even making some changes on your own? We’re pretty slammed, workload-wise.
Anyway, another option that is still on the table is potentially just switching to using the ThreadPool or some other form of multithreaded job class rather than threads that we keep warmed up and running and managed on our own. That might be the simplest approach, we shall see. I’ve done this in plenty of applications before, but none with ms-level speed required. AI War Classic only had one secondary thread, and it didn’t block the sim when it was idle, so we never ran into this with it. With Stars Beyond Reach, we used a ton of threads, but it was done in such a way that a 12-15 ms lag was utterly invisible.
So that’s what’s going on lately!