The Good, The Bad, and the League: 11/16 - 11/29
_Your semi-weekly dose of server problem-os, NA League news, and other! (Moved to Mondays for easier updates)_
Preseason is here (for better or for worse) with a bunch of dragons, executions, new items, and another set of Clash testing coming to NA! Also, there’s been a bunch of roster changes across various LCS teams, TFT changes, All-Stars incoming, and several items have been removed from League. Oh yeah, and a lot of problems, so here’s a wall of text. :(
* **Mac voice still not working properly** Bug grouped with other Mac bugs. Something to do with .plists. Haven’t seen any major progress from JIRA yet.
**Server Stuff: **
* **Delayed game stat processing(11/18, ~8 hours)** Automated alerting notifies the NOC that game stat processing is starting to fall behind due to high volume of incoming stats. The NOC doesn’t escalate and instead watches to see if the lag behind reaches a critical state. It does not, and the system catches up to real time in an acceptable time frame.
* **Delayed game stat processing(11/19, ~8 hours)** See above
* **Delayed game stat processing(11/21, ~5 hours)** See further above
* **Delayed game stat processing(11/23, ~15 hours)** See further further above
* **Delayed game stat processing(11/24, ~20 hours)** See further further further above
* **Delayed game stat processing(11/25, ~31 hours)** See further further further further above
* **LoL Client doesn’t load landing page correctly on first login(11/20, ~21 minutes)** Sydney Rioters notify the NOC that Rioters logging in for the first time that day are seeing login issues, but only on the first login for that day. Various testing rules out account specific issues, and a quick workaround is found (clicking on any tab and then back again fixes it). A JIRA bug is created to track and solve the underlying issue, but the root problem doesn’t warrant a hotfix.
* **Impacted parties and game invites(11/21, ~50 minutes)** Rioters report issues with friend/chat list to the NOC. Quick checks show that relogging fixes the problem for most Rioters, but further investigation shows problems with a dedicated service. Engineers confirm that service is undergoing a deploy, and problems should disappear once the deploy has finished.
* **Impacted parties and game invites(11/25, ~21 minutes)** Rioters report issues with friend/chat list to the NOC. Investigation shows a specific network node was timing out, causing the problems. Rebooting the node solved the problem.
* **Riot API high response times(11/22, ~51 minutes)** NOC is manually alerted that the Riot API is exceeding high response time limits. Investigation shows the response time was caused by the League Connect app, and additional scaling is dedicated to the appropriate services, lowering the response times back within acceptable limits.
* **Outage impacting player ability to get in game(11/14, ~22 minutes)** NOC is notified by automated alerting that there’s an overload in one monitoring system. Compensation mode is enabled and messaging broadcast. LPs on call start an investigation, which points to a specific service that froze, and needed a reboot to reset the issue properly.
* **Outage impacting player ability to get in game(11/14, ~26 minutes)** See above
* **Issues with AskRiot and support site(11/25, ~128 minutes)** Reports filter in that the Ask Riot page isn’t working properly. NOC escalates to the appropriate LP and triage shows that it’s likely a DNS issue. Rioters reach out to appropriate contacts to get the proper context, and end up taking down the page since it’s not reachable anymore.
* **Store gifting tab causes errors(11/25, ~22 hours)** Reports filter that gifting is intermittently failing after a recent store deploy. NOC escalation reaches the LP, who starts a large investigation to uncover why gifting isn’t working properly. The root cause is discovered, and rolling restarts of the core services are needed to return the store to proper working order.
* **Match History unavailable(11/27, ~7 hours)** NOC is alerted that match history is failing to load on NA. NOC starts to escalate the issue and enables tickers across the support sites. The issue spreads from NA to all Riot Regions, and engineers need to restart the core instances and disable functionality until the load decreased.
* **Error rate spike, impacting ranked borders on load-in(11/27, ~90 minutes)** Automated alerts notify the NOC that there are error rates exceeding acceptable for a specific service. NOC escalates to the on-call team associated with the service, who discover the root service needs to undergo a full reboot. The reboot is done, and the errors don’t drop off. Further investigation shows a single node is causing problems, and once that node is killed, the errors disappear.
**Game Stuff: **
* **Critical node restart impacts game starts(11/20, ~20 minutes)** NOC receives direct reports from Rioters that they are stuck at 0 in champ select. Compensation mode is quickly enabled after verification is made that multiple game lobbies are stuck in limbo. LPs on call make the decision to push messaging and start triaging, only to have all stuck lobbies slowly filter into active games without crashing out.
* **Fiddlesticks ult invisible to players(11/20, ~3 hours)** The NOC is notified by game designers that Fiddlesticks may require a champion disable due to rising reports of invisible ultimates. 13 minutes later, Fiddlestick is disabled across all Riot-Regions. After a quick triage and change to the back-end services, Fiddle is re-enabled across all regions and messaging is removed. Total time of Fiddlesticks problems: ~3 hours.
* **Poro King accidentally enabled(11/26, ~2 minutes)** The NOC is notified that the game mode was accidentally enabled by a developer by mistake, and immediately turned it off after realizing the mistake. Sorry fluffy, you’ve got to wait for the right time to be enabled.