On September 29, 2016, Day One of the 2016 World Championship, we disabled Aurelion Sol due to a serious gameplay bug. On October 2, Day Four of Worlds 2016, the Star Forger made his grand re-entrance to the Rift. This is the story of a spectacularly ill-timed bug and the teams who worked around the clock to find a fix.
San Francisco – 10:30 PM (PT), September 29
North America’s TSM and China’s Royal Never Give Up (RNG) begin the final game of Day One of Worlds 2016. Just two minutes in, TSM Bjergsen pauses the game due to a visibility issue with RNG Xiaohu’s Aurelion Sol. Aurelion’s passively orbiting stars had completely disappeared.
LUKE RINARD, Aurelion Sol’s Designer: As Aurelion Sol’s game designer, I was really really excited to see him in pro play at that level. He’d been picked before, but like… it’s Worlds! And he got picked first day! Then the game was paused, and I thought, “Oh no… I bet somehow it’s Aurelion.” I do that a lot when Bard and Aurelion are in the game, I’m like, “I bet it’s him!” and it usually isn’t. This time it was.
BRIAN BOSSE, Engineering Architect: I was at home watching the stream and saw there was a pause, and every time there’s a pause my heart skips a beat. I’m praying like, “Audio issue… audio issue… please god be an audio issue” because it often is (and usually a quick fix). This one was not.
AMELIA VON HADEN, Gameplay Programmer: We were watching the event and we saw the replay happen and we were like… “Aaaaaah, noooo!”
RAVEN KEENE, Head Referee: The immediate reaction to bugs is always the same: just try to gather as much information as possible. In this particular case, the ref chimed in saying that Bjergsen was complaining because he was unable to see Aurelion Sol’s stars. We got the video recording from stage, which gives us the vision the player in question would have. I did not see an issue on my spectator client, but pulling up Bjergsen’s video file, it was very clear he was seeing an issue.
We restarted the client and initially it was ok. We were saying “Oh, ok this might have just been an issue where the asset didn’t load in properly and we won’t have to issue a remake in this case.” But then within a minute the issue popped up again, and at that point it became clear the restart was not actually helping in this scenario. We had to move towards the next solution, and that’s when the remake was offered.
NICK TROOP, Esports Live Producer: To see a champion potentially disabled for the remainder of the event, that we didn’t know of ahead of time, in some senses feels like we failed, even though some bugs are unique and unexpected, like Aurelion Sol’s.
Los Angeles – 11 PM, September 29
With Aurelion Sol disabled, the race was on to find the cause of the bug and identify a potential fix. A team of Rioters rushed to the Los Angeles office to dig into the issue.
BRIAN BOSSE: I was on our internal chat at the time, and the channel started blowing up with “We have a problem with Aurelion Sol’s missiles.” I had some part in creating that system, so I’m like, “Ok, it’s time to get on my A-game. Here we go.” Once it was confirmed through the second pause that this wasn’t going away, I put on the bat cape, came into the office and started digging in.
AMELIA VON HADEN: I hopped into chat right away just to see what was going on with the bug. Since I’m a gameplay programmer, I’m pretty familiar with the game code, so if it was a game-related issue, I thought it was something I could help with. I was also on the champion team for a short while as Aurelion Sol was in development, so I had a little bit of insight into his passive’s implementation. I ended up in the triage channel for Worlds where they were talking about the bug. Since it was something I could potentially help with, I thought, “What the hell, I’ll drive into the studio.”
BRIAN BOSSE: I was driving to work and I was thinking, “What the hell has gone wrong in the esports environment, and is this systemic to everything?” Anything is on the table at that point. We’re outside of work hours, the normal reporting structure doesn’t exist, anyone who’s available has to go and jump, and I was available, so I jumped.
San Francisco – 12 AM, September 30
With the day’s show over, the esports team in San Francisco immediately starts trying to reproduce the bug. This is a key step in identifying an issue and looking for a fix. At this point there are three teams across the continent working on the issue: the esports team at the event in San Francisco, the development team in Los Angeles, and a dedicated external Quality Assurance team in Montréal.
RAVEN KEENE: We were there for hours trying to rule out every possible reproduction step we could get. We were just trying to mimic the scenario we were seeing on stage to the best of our abilities, and then trying to pull variables away. I don’t remember how many tests we ended up doing, but it was like 20+ instances of us restarting the game, over and over again and seeing what was causing this.
NICK TROOP: The earliest repro that happened was an engineer using our single-player tool. He managed to repro it by having like nine Aurelion Sols in one game, which obviously is not possible, but they were able to work with that from an engineering standpoint and we were able to test outcomes on the stage. It’s one thing to disable the champion for the rest of the day until we understand the problem, but it’s another thing for Raven to be faced with making the call of disabling the champion for the duration of the tournament. We don’t want to be in that position — it sucks for viewers, and it sucks for pros.
RAVEN KEENE: What made this case interesting is that it was so unclear. We began using the pro player accounts to look at it because our test accounts were not getting the repro we needed. That had us thinking maybe this is an issue related to accounts because everything we created and all our variables are constant, and the only thing we changed was the account itself.
Los Angeles – 2 AM, September 30
The team in Southern California successfully produces its first reproduction and begins working to reverse-engineer it to find the root cause.
BRIAN BOSSE: The first thing to do in a case like this is to understand the facts—what you know and what you don’t know. At that point we had a video of Bjergsen’s screen and that’s all. We tried to reproduce it by loading up the client on our local machines and following as closely as we could to exactly where Aurelion Sol went, but it didn’t repro. Fuck. That would have been easy.
AMELIA VON HADEN: We’d been sitting there for hours analysing the code, and I took our single-player tool and hacked it so it could have 20 zero-delay spectators instead of just 1. I was testing what the game was doing with multiple spectators attached. I had like 20 spectators going and my computer was chugging, while others were just trying to repro it.
BRIAN BOSSE: By about 2 AM, we had a repro of the symptom but with a different cause. That was one of our developers running 10 clients and the server on the same machine very rarely reproducing it. Like 1 in 50 or 100, sitting there until it happens, and that only happened for one of the missiles.
AMELIA VON HADEN: Finally we got a repro, and we were like “OH, SHIT, WE’VE GOT IT. CAN WE DO ANYTHING ABOUT THIS?” It was repro’d on a build that we couldn’t attach our debugger to, so immediately everyone went to programmers’ machines and started trying to repro it in the same way. After like 20 tries, I got it! It was only two missiles gone, but we had something. It was like defusing a bomb. As soon as I had the debugger attached, we had to call Brian: because he wrote the code, he knew it. If anyone was going to be able to step through this thing and figure it out, Brian would be the guy to do it.
BRIAN BOSSE: It wasn’t even the same thing happening on stage. But we were able to, with an instance of that, dig through a debugger and figure out what state that missile had been in in order to cause that symptom. And that was enough to, over the course of the next five hours, trace back through to the actual cause here with the spectators. It took… 12 hours, end to end, including a small period in there for sleep. The last member of the team left at 4:30 AM and the first member came back in at 9 AM.
Los Angeles – 9:30 AM, September 30
The root cause had been identified, and Rioters started filtering into the office for work the next day.
LUKE RINARD: The next day I went into work and found out that Aurelion was probably disabled for a day or two, but we have a fix. I was surprised, because it was before lunch, like… how do you have a fix? Brian had been there all night, and his desk was entirely full of soda cans. He looked a little haggard but he was still up and working, and he was excited to fix the problem.
AMELIA VON HADEN: Brian’s castle of soda cans was pretty hilarious.
RAVEN KEENE: A picture was sent around showing the desk full of cans. We’re talking dozens!
LUKE RINARD: I walked over to Brian’s desk, and he was explaining the issue to someone—he was doing the engineer thing where he was just gesticulating. We tried to ask him what was going on, and one of the engineers next to him was like, “Whoah whoah, shhhh, he’s gotten to something important here. He’s had an epiphany.”
BRIAN BOSSE: We were always suspicious of the spectator issue since that is the fundamental difference between the esports environment and the live environment, because esports uses a different kind of spectator system. While this was going on, they were doing repros on stage with Bjergsen’s account, with the actual situation that was going on, trying to narrow down, “When does it reproduce? When does it not reproduce?” With the information we gleaned from that, plus the information gleaned through debugging the bizarre symptom repro that we had, eventually we were able to draw a causal line between start and finish.
San Francisco – 10 AM, September 30
With the issue discovered and a fix identified, the esports team made the call to re-enable Aurelion Sol for Day Four of Worlds.
RAVEN KEENE: Our first reaction to this was “Thank god,” and the second was praising the Bosse god for the work he did. It then became a question of looking at the fix. Was it an easy fix? In this case it was, because we were able to adjust the accounts accordingly. Then it was about communicating this to every person at Riot who could have a potential impact on this bug. In this case, because it was related to accounts, that included all the spectators and our international spectators, so it took a lot of work with our Live Production team to communicate with all of them. We had to make sure the competing teams knew asap when it would be available, including looking at when scrim blocks occurred because we needed to get this out before any of the scrim blocks were played.
BRIAN BOSSE: Big relief. We went from everything being on the table to this very contained mitigation and it’s all going to be fine. It was almost an unbelievably favourable end to this.
LUKE RINARD: I went and said very nice things to everybody that I could find who was involved, and told them it really mattered to me. The folks that helped out were just total bosses. Their reaction to me saying “Thank you for doing this” was… “Well yeah, of course.” Almost like saying, “What are you thanking me for? Of course I’m going to come in in the middle of the night on Thursday for hours.”
BRIAN BOSSE: I want to emphasise that nobody was called in for this—I had a couple of engineers, a handful of QA folks, and some production staff volunteering their time to do this. Nobody was forced to come in and work on this. It was amazing to see those people stay for those ridiculous hours to fix this.
AMELIA VON HADEN: I guess it was my duty to help. It was late, so most people were home, and there were new matches the next day. I knew about the issue, and they could use my help. I just kind of felt like… it was the right thing to do. To get the best Worlds experience, it was the best thing to do. I know a lot of teams had been practicing Aurelion and everybody had been hoping to see that played. To think that we would have Worlds without him, it was like… shit, no way.
SAN FRANCISCO – 8:30 PM, October 2
Aurelion Sol is re-enabled on Day Four and picked by Counter Logic Gaming’s Huhi in their match against ROX Tigers. Huhi dominates the Korean champions on his favourite pick, helping CLG steamroll the Tigers.
AMELIA VON HADEN: We felt kind of similar to people who make a champion; when they see it being played, they always have a knot in their stomach thinking it’s going to have a bad game, or it’s going to break… With this Aurelion Sol thing, when we saw him being played we all had that same feeling, hoping it wouldn’t break this time.
RAVEN KEENE: When he initially got picked again, it was one of those “Hold onto your butts” moments. We held our breath for those initial couple of minutes. Once it hit that point where the game started flowing and you started seeing the plays, the entire atmosphere changed; internally, externally, players, fans, everything. It changed the story from Aurelion Sol, the disabled champion, to Aurelion Sol, the Destroyer.
LUKE RINARD: After he got picked, and CLG just completely wrecked with him… I was like, “maybe this character is too powerful, maybe he’s a little strong.” Seeing him at Worlds, and seeing him be powerful and successful, was hugely gratifying to me. After that game on Monday I went in and thanked everybody again! I went through and just told everybody how that made my month to see him played that Sunday. Then seeing him get played in game three of the Final, in such a pivotal game—that made the rest of my month.
Worlds is the pinnacle of international competition in League, and losing a champion as important as Aurelion Sol at Worlds would have been absolutely devastating for players and spectators. The Rioters who came together to fix this bug worked around the clock, ensuring all the champions were available to the world’s best players during the fight for the Summoner’s Cup.If you’re interested in the technical details behind the bug, check out Brian Bosse’s breakdown of the cause behind it.