An analysis of large Internet outages affecting Iranian networks in early 2020
Authors: Ramakrishna Padmanabhan (IODA - CAIDA, UC San Diego), Anant Shah (Verizon Media Platform), Nima Fatemi (Kandoo), Alberto Dainotti (IODA - CAIDA, UC San Diego)
Overview
Internet connectivity in Iran has been known to suffer from disruptions, especially during times of crises and political upheavals. In this post, we use several complementary data sources to examine Internet connectivity in Iranian networks in a two-month period (February 17 to April 17 2020) covering events such as the legislative election (held on Feb 21 2020) and the early spread of COVID-19 cases in Iran. We analyze four Internet connectivity outages affecting Iranian networks during this time.
Our findings show:
a) Widespread Internet connectivity outages, affecting multiple networks simultaneously, continued to occur in Iran.
- On March 3 and March 11, we observed outages that affected many large Iranian networks, including cellular networks.
- These outages bore similarities to the widespread outages observed in Iran in November 2019due to a government-mandated Internet connectivity shutdown: during all of these outages, many major Internet Service Providers (ISPs) were affected, with some ISPs suffering a near-complete loss in connectivity. However, the causes of the outages on March 3 and 11 are unclear.
- Unlike the November 2019 outages, the outages on March 3 and 11 affected non-state-owned Internet Service Providers (ISPs) to a much larger extent than state-owned ISPs.
- Differently from the outages in November 2019 where ISPs observed outages that began several hours apart, the outages on March 3 and 11 affected several ISPs at nearly the same times. This simultaneity in outage start times could be the result of a single point of failure being affected. Alternatively, some mechanism (intentional or unintentional) that caused synchronized outages may have played a role.
b) Isolated outages (non-widespread across ISPs) can significantly impact individual major ISPs but without large effects on other ISPs.
- On April 3, Information Technology Company (ITC, AS12880) experienced a 90-minute Internet outage that affected a large share of its customers.
- On February 27, Shatel (AS31549) experienced a 30-minute Internet outage.
- During both these outages, other major providers did not seem to experience substantial drops in connectivity.
c) The complementary views offered by multiple data sources increases outage detection accuracy and also allows us to uncover additional nuances.
- The range of data sources we use includes Internet routing data (IODA’s BGP signal), active probing data (IODA’s active probing signal, ZeusPing signal), and data from end-user machines (IODA’s Darknet signal, CDN data signal). These sources provide lenses into different aspects of Internet connectivity.
- Depending upon the nature of an outage and the networks it affects, the outage may be visible in some sources, but not others. For example, the outage affecting Shatel on February 27 was visible in the ZeusPing data source but not in other data sources.
- By combining inferences from multiple data sources, we are able to discover additional nuances. For example, during the outage affecting ITC on April 3, the CDN and Darknet signals show that there was a short-lived recovery from the outage that lasted a handful of minutes before connectivity dropped again to previous levels. Evidence of this recovery was not present in the other signals.
Background and motivation
Iran’s Internet connectivity has experienced several large-scale outages in the recent past.The most notorious of these occurred in November 2019, when the Iranian government mandated a week-long Internet connectivity shutdown in response to widespread protests over fuel prices. Several reports (Oracle, Netblocks), including our collaborative post with OONI, showed the unprecedented scale and complexity of this event. During this event, cellular ISPs such as IranCell (AS44244) and MCCI (AS197207) were first affected on November 16th. A few hours later, most of the other large ISPs also experienced outages, including both state-owned ISPs such as Iran Telecom Co (AS58224) as well as non-state-owned ISPs such as Shatel (AS31549). Recovery from the outage occurred a week later, from November 23rd onwards.
In the time since, there have been additional reports of Internet outages in Iran but there has been uncertainty regarding the extent to which various networks were affected. A potential Internet connectivity shutdown event in December 2019 was covered by media outlets (BBC, U.S. News) but the scale of the outage was unclear. Newsweek reported upon several “mild outages” in the wake of the downing of Ukraine International Airlines Flight 752 in early January 2020 but the evidence for these outages was sometimes anecdotal. There were also reports of an outage due to a cyber-attack on February 8 2020 but which networks were affected and to what extent is unclear.
The ambiguity in assessing the extent of outages arises from the inherent challenges in detecting outages of various kinds: depending upon the nature of the outage and the networks affected, it may be apparent in some outage monitoring tools but not others. For example, even a widespread power outage that affects primarily end-users may not be visible in Internet routing data, since Internet routers are often in well-provisioned data-centers with backup power. However, such an outage may be visible in a data source containing measurements from user’s machines (such as IODA’s darknet data source).
Given the increasing susceptibility of Iranian ISPs to Internet outages on the one hand and the challenges in accurately detecting outages on the other, we studied Iranian Internet connectivity from February 17 to April 17 2020 using diverse measurement “lenses” obtained from a variety of data sources. This period includes the quadrennial Iranian legislative election and also the early spread of COVID-19 cases in Iran. In this initial report, we cover four large outages that we observed during this time.
Methodology
We used a variety of data sources to investigate Internet connectivity in Iranian networks. We first describe IODA’s data sources. Next, we describe two novel and experimental data sources: (a) ZeusPing, a prototype fine-grained active probing system under development at CAIDA and (b) a dataset from a large Content Delivery Network (CDN). These diverse data sources helped us detect outages more accurately and also discover additional nuances about these outages.
Existing IODA data sources
The Internet Outage Detection and Analysis (IODA) project of the Center for Applied Internet Data Analysis (CAIDA) at University of California San Diego measures Internet connectivity outages worldwide in near real-time.
In order to track and confirm Internet disruptions with greater confidence, IODA uses three complementary measurement and inference methods based on Internet routing (BGP) announcements, active probing, and Internet Background Radiation (IBR) traffic. The routing announcements from BGP allow us to track reachability according to the Internet global routing system (the so-called Internet control plane). IODA uses routing data extracted from RouteViews and RIPE RIS to obtain the BGP “signal”. IODA’s existing active probing approach uses the Trinocular methodology developed at USC’s ISI to detect outages. Our implementation of Trinocular pings a few addresses at random from /24 blocks that are likely to respond to pings. We send pings to each block in 10-minute rounds. Using Bayesian inference, the system reasons about responses from blocks and detects outages when a /24 block’s responsiveness is lower than expected.The Darknet data source represents Internet Background Radiation (IBR) traffic that is often from actual user machines. IBR traffic is generated by millions of machines worldwide and is often a result of these machines being infected by malware or misconfigured in some other way. These methods result in connectivity “liveness” signals, whose status (for each country) is always publicly visible in theIODA dashboard.
ZeusPing
To augment IODA’s existing Trinocular-based active-probing scheme, we launched ZeusPing, a novel fine-grained active-probing-based system under development at CAIDA. IODA’s existing Trinocular-based system detects outages at the /24 granularity and may not identify an outage if even a single address in a /24 block responds to probing. Thus, it potentially neglects outages affecting /24 blocks only partially, including larger outages affecting multiple /24 blocks. The ZeusPing system probes much more broadly than Trinocular and is therefore capable of detecting a superset of Trinocular outages, including those that affect many /24 blocks only partially. Thus, ZeusPing complements IODA’s existing data sources and enables fine-grained analysis of outages that allows detailed characterization about the IP addresses affected by outages, their geographic-scope (which regions are affected), and the outages’ durations.
Since mid February 2020, ZeusPing has been sending pings from 4 globally distributed vantage points provided by the Kandoo team to 50% of the IP addresses geolocating to Iran (around 6M addresses) every 10 minutes. We call these 10-minute periods “ rounds” of measurement; every measured address receives a ping from 4 different vantage points in each round.
We find evidence of potential outages by analyzing responses (or the lack thereof) to these pings per round. By finding rounds where there is a significant drop (and subsequent increase) in ping-responsive addresses, we are able to determine with fine granularity when outages begin and end.
CDN dataset
Among IODA’s data sources, the Darknet data source offers the best view into outages affecting end-users. However, it uses the count of IP addresses that are sending traffic from a network to measure “liveness”. This address-based view can distort the liveness signal in networks that use Carrier Grade NAT (CGN) since multiple users may be using the same IP address in such networks. Cellular networks often use CGN technology; consequently, the Darknet data source can sometimes lack visibility into cellular address space.
We collaborated with a large commercial CDN vendor to obtain a complementary “liveness” signal that may be able to capture end-user outages from cellular and non-cellular networks. We term this dataset the “CDN dataset”. This dataset consists of the total number of requests per minute from Iranian ASNs that were sent to the global CDN platform. The time series of the number of requests for each ASN was scaled by a value unique to that ASN, thus only preserving the “trends”, i.e., fluctuations seen during the analysis period. Consequently, signals across ASNs cannot be compared for volume since they are scaled differently. However, for the purpose of determining if an ASN continues to have Internet connectivity, we are interested in the trend of its signal and not its volume. Thus, in the results that we present below, we further normalized these scaled values to fall between 0 and 1, to enable easy trend-comparison with signals from the other data sources.We expect that during an outage, users that lose connectivity will not be able to reach the CDN for fetching content. This should result in a drop in the number of requests seen from the network/ASN to the CDN. Thus, a drop in the number of requests signal from the CDN not only serves as ground truth for validating outages seen in other active or passive outage detection systems but also provides visibility into ASNs where existing tools might lack visibility (such as wireless providers). Additionally, like the darknet data, the CDN data is available at fine time-granularity—-once every minute for each ASN.
Towards a more accurate, nuanced view
By using these complementary data sources, we obtain a more complete view of outages. This complementarity is a result of (a) the different network phenomena that each data source measures and (b) the different time-granularities of measurements.
The data sources we used present lenses into different aspects of Internet connectivity. While the BGP data source measures Internet routing traffic and can therefore yield highly accurate outage inferences when there is a measured drop, outages do not always affect Internet routers (and consequently routing traffic) and can therefore be invisible in the BGP signal. Active probing has the potential to yield fine-grained outage data in networks which respond to active probes but several networks block probing traffic. While the Darknet and CDN data represent liveness traffic collected from user-machines, they can sometimes be erratic, leading to difficulties in accurately interpreting their signals for outage detection.
The different time granularities of these data sources also results in more effective outage detection. The Darknet and CDN data sources have 1 minute time granularities, the BGP data source has 5-minute, and the active probing data sources (both IODA’s Trinocular-based system and ZeusPing) have 10 minute time granularities. Consequently, the active probing data sources may not be able to detect sub-10-minute outages but the Darknet and CDN Requests data may be able to detect such short-duration-outages as well.
Detected outages
Here, we report upon four large Internet outages in Iran during this period. We present a summary of our findings about each outage and then follow-up with detailed visualizations and analyses.
Several overlapping network outages on Mar 3 2020
Summary
- The largest outage we found, in terms of both addresses and networks affected, occurred around midnight between Mar 2 2020 and Mar 3 2020.
- Several of the affected networks, including cellular networks, had outages that began nearly simultaneously. This finding is in contrast to the outages we observed during the November 2019 shutdown in Iran; the outages during that event began at different times for different ASes.
- However, the extent of the outages varied across providers. Two large state-owned providers (Iran Telecom Co (AS58224) and ITC (AS12880)) observed only small outages whereas some non-state-owned providers (such as Shatel (AS31549), Asiatech (AS43754), Mobin Net (AS50810), DATAK (AS25124)) observed outages that appeared to affect the entirety of their address-space.
- The fine-grained ZeusPing data corroborates that the extent of the outage varied across networks. In addition, ZeusPing data reveals which addresses continued to remain connected even when most others in that network did not. Addresses detected by ZeusPing as remaining connected may represent potential candidate relays for circumventing outages.
- We observe that one of the largest non-state-owned Iranian ISPs, Shatel (AS31549) lost connectivity for the entirety of its address-space in all three of IODA’s data sources.
- This complete loss of connectivity is corroborated by the ZeusPing and CDN signals, which also dropped to zero.
- The BGP, Darknet, ZeusPing, and CDN signals suggest that the outage lasted from approximately 12:40 AM to 1:30 AM.
- Another large non-state-owned ISP, Pars Online (AS16322) lost connectivity for a large part of its address-space in all three of IODA’s data sources as well as in the ZeusPing and CDN request signals.
- The timing of the outage is almost identical to that of Shatel’s, with the outage lasting from approximately 12:40 AM to 1:30 AM.
- In contrast to the outage that affected Shatel, a small number of addresses from Pars Online continued to have Internet connectivity, as evidenced by IODA’s active probing and Darknet signals and by ZeusPing’s signal; these signals did not drop to 0.
- The ZeusPing data allows us to determine exactly which addresses had an outage, and which continued to be ping-responsive throughout the outage. Addresses which remain connected may serve as potential relays for other addresses to circumvent the outage (if local connectivity between addresses exists).
- Iran Telecom Co, a state-owned ISP, experienced multiple outages between 11:45 PM on March 02 and 1:30 AM UTC.
- However, the extent of each outage is small relative to non-state-owned providers such as Shatel (AS31549) and Pars Online (AS16322).
- Similar to the other state-owned ISP (Iran Telecom Co), ITC (AS12880) also observed several relatively-small outages between 11:45 PM on March 02 and 1:30 AM UTC.
- Each individual outage appears to begin and end at the same time in both ITC (AS12880) and Iran Telecom Co (AS58224).
- Although IODA can sometimes have limited visibility into cellular outages, IODA’s BGP signal for Iran Cell observed a significant drop at around 12:40 AM. The timing of this drop aligns with the times at which drops were observed for several other ISPs, including Shatel (AS31549) and Pars Online (AS16322), suggesting that these outages are related.
- The outage is also visible in IODA’s active probing signal and is visible particularly clearly in the ZeusPing signal. We see from this signal that there was a relatively small outage that occurred at 12:40 AM and a larger one at 1:30 AM, followed by recovery of most addresses affected by both outages at 1:40 AM.
- We also observe that there appears to be a reduction in the CDN Requests signal between 12:40 to 01:40 AM providing additional corroboration.
- MCCI is another major cellular ISP in Iran which observed drops in Internet connectivity in the hours before and after midnight on March 03.
- IODA’s BGP and active probing signals suggest that AS197207 observed an outage at 10 PM on March 02 that lasted for twenty minutes. The ZeusPing signal offers additional corroboration for this outage; however, there does not seem to have been a significant drop in the CDN request signal at this time.
- Like other networks, AS197207 also experienced a drop in the BGP signal at 12:40 AM on March 03 but there is no significant drop in IODA’s Active Probing and ZeusPing signals. However, the CDN request signal experienced a drop during the same time that may be consistent with an outage.
Several overlapping network outages on Mar 11 2020
Summary
- This outage affected multiple networks at approximately 14:00 UTC on Mar 11 2020 and lasted less than 10 minutes.
- Similar to the outage that occurred on Mar 3 2020, the outages affected multiple ASes, including cellular ASes, at roughly the same times.
- Like the Mar 3 2020 outage, some large non-state-owned ASes (Shatel, Pars Online) appear to have experienced a more severe outage compared to state-owned ASes.
- Shatel (AS31549), a large non-state-owned ISP, appears to suffer near-complete loss in Internet connectivity according to IODA’s Darknet data source where the signal drops to 0 from 14:00 to 14:08 UTC.
- Although the ZeusPing data corroborates that an outage occurred at 14:00, there still remain more than 90,000 addresses that remain ping-responsive. ZeusPing detects outages at the 10-minute granularity using pings from distributed vantage points; thus, if addresses respond to pings at least once within a 10-minute round, they will not dropout.
- Iran Telecom Co, a large state-owned ISP also experienced a significant outage at 14:00, though the outage appears to affect a smaller part of its address-space compared to outages that affected non-state-owned ISPs such as Shatel.
- The IODA and ZeusPing signals indicate that recovery from the outage occurred within the next 20 minutes.
- We also observe a second potential outage for AS58224 in the ZeusPing signal at 16:20 UTC.
- The IODA signals indicate an outage for Iran Cell, a major cellular ISP, at 14:00. The start-time of the outage for Iran Cell is identical to the start-times of the outages for the other networks above (and several others).
- The ZeusPing signal does not show a significant drop. We examined the ZeusPing data during this time in more detail and found that pings to many addresses from AS44244 timed out during this round (14:00 to 14:10) from some of the vantage points but responded to others. Since this outage only lasted 8 minutes, we suspect that the majority of addresses responded to pings from at least one of the vantage points during the other 2 minutes.
- The CDN signal observed an increase during this time. Recall that the CDN signal represents a normalized view of the number of requests from users in the AS. Consider the scenario where some customers in Iran Cell experience an outage but other customers retain cellular connectivity. If the customers who continued to have cellular connectivity from Iran Cell had experienced an outage for their residential provider (Shatel customers, for example), and had switched to Iran Cell for hotspot services, there would have been an increase in requests from these users. The signal we observe is consistent with this scenario.
Summary
- ITC (AS12880) experienced a significant outage on Apr 3 2020 that began at approximately 8:50 AM UTC and lasted for around 90 minutes.
- The outage is visible in both IODA’s current active probing data source as well as in ZeusPing signals, although the finer-grained ZeusPing signals show that the outage affected the majority of AS12880’s address-space (as opposed to only ~40% as indicated by IODA’s active probing data source). ZeusPing’s estimate of the extent of the outage is consistent with the massive drop observed in the Darknet signal.
- The CDN signal further corroborates that the outage was particularly severe, as evidenced by the steep drop in the signal at the same time as the drops in the ZeusPing and Darknet signals.
- Since the darknet and CDN data sources are collected with 1-minute time-granularity (compared to IODA’s active probing and ZeusPing sources that are collected with 10-minute time-granularity), we observe in these signals a small-recovery at around 9:30 am that is immediately followed by another massive outage. Since this recovery appears to have lasted for only 3 minutes, it is not clearly visible in the other sources but the timing of the recovery in the CDN and Darknet signals aligns well. Further, just before the outage ended at 10:00 am, the CDN and Darknet signals plummeted to nearly 0, indicating that there was a brief outage that affected nearly all of ITC’s address-space just before recovery.
Summary
- The ZeusPing signal suggests that Shatel (AS31549) experienced an outage on Feb 27 2020 that began at around 12:20 AM and lasted for around 30 minutes. The other data sources do not seem to observe drops during this time.
- We investigated why this outage had not been detected by IODA’s current active probing data source. We determined that the likely cause was because this outage affected most /24 address blocks only partially. More than 15,000 addresses that had been responding to pings in the previous round (12:10 to 12:20 AM) had stopped responding to pings in this round. 13,000 of these addresses were from 749 /24 blocks that each had at least 10 addresses that stopped responding to pings. However, each of these 749 /24 blocks had at least a few other addresses that were continuing to be ping-responsive through this outage. Since the current active probing data source in IODA is based upon the Trinocular methodology, which detects outages at the /24 granularity, the presence of some responsive addresses in each /24 prevented the detection of the outage.
- This event shows the value in using multiple data sources for outage detection since some outages may only be visible in a subset of data sources.
Conclusion
In this post, we used diverse data sources to study Internet connectivity in Iran between February 17 to April 17 2020. We presented analyses about four significant outages from this period that were visible in at least one of the data sources and highlighted our findings about the outages (how many networks were affected, duration etc.) and about their visibility in different data sources. We found that the lenses offered by these data sources allowed us to detect outages more accurately and also discover additional nuances about these outages, thereby reinforcing the need for multiple data sources to study Internet connectivity.
While this post analyzed some of the largest outages, other outages occurred during this period as well. With the exception of the CDN data source, the data collected from the other sources are publicly available. The data from IODA’s data sources for all these outages can be accessed through the IODA platform. Data collected from the prototype ZeusPing data source (which is currently under development) in this blogpost will eventually be released publicly; for now, it is available upon request.
Acknowledgments
We are deeply grateful to the Open Technology Fund for supporting this research. We would also like to thank David Belson for his helpful feedback.