Lag

From Armagetron

Have you wondered why sometimes you think you dumped the other guy only to find that your dumped yourself? Does it ever feel like the things you do don't happen quite the same way to everyone on the grid? There's a reason for that, and the reason is called lag.

Description of Lag

Ideally, the game would respond the very instant you or someone else hit a steering key. However, the underlying hardware is not capable of truly instantaneous response, and therefore, neither is armagetron. This difference is called Lag.

When you decide to hit a key, the following stages are passed:

  1. the command is transmitted to executing muscles
  2. the muslces contract / the key is being pressed
  3. armagetron is busy doing other stuff
  4. armagetron processes the key press and dispatches the information to the game server
  5. (network games only) the network forwards the information to the game server
  6. (network games only) the server processes the key press and dispatches it to all players
  7. (network games only) the network forwards the message (as in 5)
  8. the armagetron client updates his view

Every step along the way causes a certain delay. If this delay is constant, its effect can be mitigated by hitting the key earlier. If the delay is different every time though, this is not possible.

The delay introduced by the individual stages can be approximated as follows:

  • Stages 1 and 2 takes a predictable fraction of a second, less with training.
  • Stage 3 predictably takes at most 1000 ms / fps.
  • Stage 4 predictably takes a couple of microseconds unless the client's computer is very busy otherwise (such a case has never been reported)
  • The sum of the delay in stages 5 and 7 is called network lag. It is determined by the quality of the network connection between client and server, in case of the internet the chief influences are the client's and the server's connection to the internet, as well as the geographical distance between client and server. The sum of the average network delay is called ping and displayed on the score board. For a good connection, the delay is about constant, for a bad connection the delay may vary wildly (it is entirely possible for a message to take 3 times as long to deliver as the previous one). Since usually this is the primary source of lag, we describe it in more detail below.
  • Stage 6 is negligible unless the machine running the server is otherwise occupied at the time the message arrives.
  • Stage 8 predictably takes at most 1000 ms / fps.

Network Lag

Armagetron provides two facilities to assist players in coping with constant network lag: Prediction and the lag-o-meter. Prediction displays other cycles at where the are predicted to currently be on the server assuming they did not turn in the meantime. As a result, cycles jump on the client's display on a turn (because the client guessed wrong). If prediction is disabled, cycles are displayed on the last position confirmed by the server, i.e. they are really no longer there, but the client does not know yet where they are. The lag-o-meter highlights the area where the cycle could presently be. Prediction and the lag-o-meter can be enabled/disabled in game->network game->network setup.

Due to its non-predictable nature, armagetron can not cope well with non-constant network lag. Non-constant network delay is caused by heavy use of any link in the network connection. Therefore, a substantial proportion of the link should be idle. Non-constant network lag manifests in retroactive display updates on the client, i.e. other cycles jumping outside of their lag-o-meter or your own cycle's recent turn point beeing moved. It may also cause lag-o-meter to suddely grow or shrink.

Avoiding - 6 pigs, 7 nipples!

Buy a better internet connection and/or keep your other network usage down. When you see someone's ping fluctuating in the game, it's usually a safe bet that they have a really large download going on somewhere, KaZaA or some other large downloading program. Web browsing while playing the game will create more lag for you, which will in turn get pushed out to other players. You have a limited amount of bandwidth available on your internet connection, and the game server has a limited amount of bandwidth available on its connection. Some server admins will limit the number of players on their server to address this issue, but it's still not complete. You have to behave yourself and keep your bandwidth usage low.

A corollary to this is when the game server in particular is sharing a connection with other servers. For example, the Crack Pipe runs on a server that is also a web server, mail server, cvs server, and a few other things. On top of that, it's used as an alarm clock, and as a regular desktop computer. So it's using a fair amount of bandwidth just running, which gets pushed out to game clients. But also under the heading of limited resources vs amount of demand, the Crack Pipe only has a 900mhz AMD Duron processor and 256MB RAM. That's not really a lot of system resources for the amount of work it does. So occasionally spam will be sent to the machine and it will have to spend some of its precious CPU time receiving the spam. In the meantime, game clients suffer a little bit. Some players have gotten to where they can identify how much spam the mail server receives while playing the game.

Reason

Light Speed Now, Mr. Scott

As you've probably learned at one time or another, electrical signals travel at a speed very close to the speed of light. Consider that it takes 8 minutes (approximately) for light from the Sun to reach the Earth, if you were to stretch a copper wire from the Earth to the Sun, it would take approximately 8 minutes for an electrical pulse to travel that distance. So electrical signals don't travel instantaneously, they merely travel so fast that the conscious brain perceives it as instantaneous. So, to start with, every time you press a button on your computer, the electrical pulse associated with that button press has to travel some distance from the button itself to a location in the computer, and then be sent out over the wire to the game server.

Hops, not beer

The signal's first hop is the game itself. Of course, that doesn't mean anything unless you know what a hop is. A hop is considered a point along the path where the signal has to be processed and understood, at least in part, to determine where it's going to go next. So the first hop is your computer. The keypress has to be sent up from the hardware to your operating system's kernel, which then has to send it to the program that wants it, in this case the game client. The game client does some processing on it (did he turn? Did he want to say something?), and then sends it's action down through the kernel to the network interface.

Follow? This is not an instantaneous thing! It is very very very fast, but it's not instantaneous.

The next hop is probably your router, if you have one. If you're on a high speed network, you have one. It may be called a "dsl modem", "dsl router", "cable modem", or whatever, but it's still a router. It processes the signal to some extent, enough to figure out where to send it. Then it sends it. If you're running a local network attached to a cable ISP (a very common setup among players), then your first router sends it to the second router. So now you've had 3 hops and the signal hasn't even left the house yet!

Then your ISP's router will pick it up, process it again, and determine where to send the signal. Then their wholesale provider picks it up, and so on and so forth. The signal will reach a point we generally consider the midpoint, although that's a bit of a misnomer, called the backbone. This is the highest-speed and highest-traffic part of the internet. It is the backbone of the internet! From there the signal goes through another series of hops to get to the game server's house.

From there, it usually goes first through a high speed connection device, such as a dsl router or somesuch, and then goes to a regular router used in homes to setup home networks, and finally arrives at the game server. Now what? Well, now the game server processes your turn, usually some fraction of a second after you originally pressed the button to indicate your turn. Is it done?

No, it has to send a signal back to your client when it marked the turn.


Bugs in the game?

Bugs in the game are usually blamed for lag. In fact, you can tell who's at least new and/or who doesn't know jack about computers by how much they blame the game for the inaccuracies they see on the screen. The simple fact is, Armagetron Advanced has a very solid network system that is relatively bug-free. I'm not going to pretend it's perfect, but it's definitely good enough that if you suspect a bug and you didn't write the network code yourself, then you're probably wrong. I'm not saying you are wrong, just that you're probably wrong. I'd bet money that the lag you're experiencing and blaming the game for comes from somewhere else.

An associated comment usually comes along the lines of "but there has to be something in the game that can be done to compensate for it." This statement frequently accompanies a list of other games where lag isn't as big of a problem.

Z-man's take

I'd like to add that the prime reason you feel more lag in AA than in, say, Counter Strike, is the difference in the game dynamics. In shooters, moving (thus defending by taking cover) and attacking (pointing your mouse and shooting) are independent actions. Taking a hit does not influence your movement immediately (it does later if you get wounded, or immediately if your pushed away by an explosion). Thus, it its possible to treat attack and defense separately in the network code and hide the lag. When you shoot at someone, you can shoot at the position you see him now, and even though your enemy really has moved away already, you'll still hit him because the server can compensate.

In AA, attacking is moving. Defending is moving. Getting hit has an immediate and strong impact on your movement. It's not possible to divide the different aspects of movement in the network code. (Well, it is, but the result would be that it would be impossible to survive close combat for both participants) In this sense, AA is a sports game, a beat em up to be precise. And you don't see them played over the net a lot precisely because they're heavily affected by network latencies (and because it's that much more fun if you're playing against someone in the same room).

Z-man's take (part 2)

There are several causes of lag.

You need to understand what happens when you press the turn key. First, at the time you press it, your client doesn't have the full information of the state your cycle is in. It only has an extrapolation that is based on what the server last sent. The first causes of lag lie directly here, the extrapolation may be wrong. If any/all of (1) - (5) goes wrong, you'll be lag sliding/jumping on your own screen with possible crashes.

  • (1) Maybe the reason is a network problem, packets from the server to your client getting lost or delayed.
  • (2) Maybe it is a bug in your client that causes the extrapolation to be wrong or it may just a regular gameplay thing.
  • (3) Maybe you've been riding within the lagometer of another player, and that player turned and threw a wall your way.

(note)Obviously, if your client is sending the turn with a wrong gamestate, then the server will have a different view and won't be able to execute your command exactly as you expect. The server has the capacity to detect these errors, but it can't distinguish between (1) and (2). And it can't fix the problem by believing the command data from the client, this would give way to cheating.

So let's assume your client has a reasonable game state in memory. Your turn command is then sent over the network to the server.

  • (4) Of course, it can get lost or delayed on the way. Because the commands carry game time stamps, the server can detect and correct the problem (within reasonable bounds to avoid cheating), and it does so.
  • (5) If there is a server bug, your command won't be executed as your client correctly predicted. After executing your command, the server sends a message back to you with the result of the execution, the client incorporates that into the extrapolation, which brings us back to (1).

(note)The same happens to the other players/cycles. After their command is executed, updates about it are also sent to your client.

  • (6) (almost the same as 1) Those updates can again be lost or delayed, making the cycles go straight longer than they should, and when the update finally arrives, they jump or teleport around. Or the clients make a mistake in reading the update
  • (7) (almost like 2), then you see them jump as well. Or the server made a mistake and delayed the command execution
  • (8) (almost like 5), again the result is a jump of an enemy cycle. Those three are almost indistinguishable in their effect.
  • (9) There is "Death Lag". Death lag is the server being nice to you. Just before you would die, the server stops your clock and waits for life saving commands from your client that were possibly delayed. The same applies to the others, so if you see another player sitting in front of a wall that strange extra bit, it's the intentional death lag. And if you experience lag and death at virtually the same time, chances are that you were not actually killed by lag, but that the lag you perceived was actually the effect of the server trying to keep you alive. Also, the client ignores all game world events that usually would kill you and does not predict your death, because false positives there would be very disturbing. That's why you sometimes happen to see your cycle pass behind the receding trail end of another cycle or over a vanishing trail, touching it a bit, going straight for half a second, then warping back and exploding. That's death lag and the intentionally missing death prediction of the client working together.
  • (10) GPU lag, when your graphics card is lagging behind the rendering commands sent to it; This should only happen if you modify the "Swap Mode" default in the performance tweaks section. [Finish is pretty much a guarantee of no trouble, flush should work, and fastest is not guaranteed to do anything. The swap function the current SDL uses on Mac OS X does a flush all by itself, so on that platform fastest is flush, flush is double flush, and finish is finish followed by flush. The performance impact of a redundant flush in this sense is negligible, however. —Jonathan]
  • (11+12) timer lag on the server or client, when the timer is somehow behaving wrongly; since the timer is essential for everything, bad things happen then. The two worst cases not caused by code bugs seen so far were a 5% accelerated clock rate on a client because the MB was overclocked, and timer jumps due to occasional clock syncs on Bugfarm.
  • (13+14) CPU lag on server and client, when the processor is not strong enough to simulate the game and your commands get delayed because of that.

What can be done about lag?

Lucifer sez

Fix the internet.

As flippant as that is, it is the source of lag. We've experimented (we as in "server admins") with a number of solutions over the last year or so (and I assume it was happening earlier) and here's what we've found, near as I can tell.

If the machine is dedicated to the armagetron server, it will have less lag. Even my server has some interesting lag effects whenever it receives mail (it's a mail server too), and that's with the armagetronad process reniced to a low level (-15 I think it is).

If the machine is on a slow processor with a small cache, it will have more lag. This effect is amplified if the machine is not dedicated to the game and/or if you run more than one game server.

Lag is reduced by parallel processing on the server, so if you have a fancy Xenon processor or a dual proc setup you can have less lag.

Lag is reduced by running a 2.6 Linux kernel over *any* other kernel (Windows, Linux 2.4, I dont' know that any others have been tested in this way). It can probably be reduced even further by applying the low-latency patches, but since those are mostly used for audio work I don't know how well they'd work here.

The closer the server is to the backbone, the less lag. The closer the client is to the backbone, the less lag the client has. (I'm pretty far from it, hence the high pings on teh Crack Pipe, but they do look better than they used to be)

After you've done all of this, the lag will be noticeably less, but there will still be numerous complaints. This is because the internet itself is the source of all lag. So all of these measures only decrease the roundtrip time for a packet that has already crossed the internet and reached the machine.

There have been a number of improvements in this area. 0.2.7.1 had simulation improvements, and also an improvement where the server can be anchored to a certian number of simulation steps per minute. So that's why 0.2.7.1 servers have more predictable lag than older servers, and when everyone is connected with newer clients, you can really see the difference. 0.2.8_beta2 has a major network refactoring that also seems to reduce lag, even though it may not be noticeable in your ping. I don't know that it would be noticeable in your ping, and since there aren't yet that many servers running the new betas the only measure I have is my own, which has less lag than the older version.

The only improvements I can think of would be to optimize code here and there, and I don't even know which code needs to be optimized. I know that periodically some one or more developers profile various pieces of code and that there has been optimizing. I also know that most of the developers around here think about performance while coding. ANd I also know that there is code that is not well optimized (although I can't point to any specific line, if I could do that, I could optimize it, right?).

So let's assign a value, C, which represents how much work is required to thoroughly optimize the code and reduce latency to bare minimum while retaining reliability and accuracy. Now let's assign a value, T, which represents how much time the entire development team has to dedicate to exactly that. Let's assign one more variable that represents how long before really fast internet arrives in everyone's houses (faster than even broadband now) and includes faster processors, faster network cards, and generally very low latency on the hardware end. We'll call this one H, for "toy".

C / T > H

Brain Lag

These are 6 consecutive frames from a 15 FPS video. Three frames after the balloon moved it's clear that I responded, confirming the 200 ms figure. Actually it's probably a split frame longer, but then it was a semi-complex "untrained" reaction. Yes, the balloon blew on the third try.

It's worth pointing out that there is also lag in your body. Psychologists have even measured it. It's the same basic problem as network latency, but it's worth considering. Your eyes see what's on the screen, but it takes some milliseconds for the signal to be processed in your brain. Then it takes some milliseconds for you to determine a reaction and act on it, and then it takes some milliseconds for the signal from your brain to travel to your fingers and make something happen. Because of the visual nature of the game, you can't take advantage of certain reflexive systems, but through training your fingers you can reduce the amount of nerve bandwidth needed to command your light cycle.

When I took my Intro to Psychology class, my teacher didn't have any actual numbers, it would be awfully nice of someone were to google them up. Nevertheless, brain lag is real, and it really affects the game. It's not as noticeable as network latency, but when you add the two together you get some interesting results. --Lucifer

"Trained" simple reactions supposedly need about 200 ms. But a game usually isn't just pushing a button when something flashes, so you can easily add some more. You have to identify an event first, then think up a reasonable response, and then finally press the appropriate button(s). I don't know how much additional time is needed, but I find that my effective reaction time in games hovers around 300 ms. That can include part of the response, such as actually clicking on something if it isn't tiny and far away. I do know that more intelligence tends to result in shorter complex reaction times. The reason you might not normally perceive brain lag but just its consequences is that you don't know any better. —Jonathan 05:21, 10 June 2007 (PDT)