Introduction

OpenART

Tier1 and aspiring Tier2 providers interconnect only in large metropolitan areas, due to commercial incentives and politics. They won’t often peer with smaller providers, because why peer with a potential customer? Due to this, it’s entirely likely that traffic between two parties in Thessaloniki is sent to Frankfurt or Milan and back.

One possible antidote to this is to connect to a local Internet Exchange point. Not all ISPs have access to large metropolitan datacenters where larger internet exchanges have a point of presence, and it doesn’t help that the datacenter operator is happy to charge a substantial amount of money each month, just for the privilege of having a passive fiber cross connect to the exchange. Many Internet Exchanges these days ask for per-month port costs and meter the traffic with policers and rate limiters, such that the total cost of peering starts to exceed what one might pay for transit, especially at low volumes, which further exacerbates the problem. Bah.

This is an unfortunate market effect (the race to the bottom), where transit providers are continuously lowering their prices to compete. And while transit providers can make up to some extent due to economies of scale, at some point they are mostly all of equal size, and thus the only thing that can flex is quality of service.

The benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through the IXP improves routing efficiency and fault-tolerance, and it avoids traffic going the scenic route to a large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.

IPng Networks really believes in an open and affordable Internet, and I would like to do my part in ensuring the internet stays accessible for smaller parties.

Smöl IXPs

One notable problem with small exchanges, like for example [FNC-IX] in the Paris metro, or [CHIX-CH], [Community IX] and [Free-IX] in the Zurich metropolitan area, is that they are, well, small. They may be cheaper to connect to, in some cases even free, but they don’t have a sizable membership which means that there is inherently less traffic flowing, which in turn makes it less appealing for prospect members to connect to.

At IPng, I have partnered with a few super cool ISPs and carriers to offer a Free Internet Exchange platform. Just to head the main question off at the pass: Free here actually does mean “Free as in beer” or [Gratis], a gift to the community that does not cost money. It also more philosophically wants to be “Free as in open, and transparent” or [Libre].

Two examples are:

.. but there are actually quite a few out there once you start looking :)

Growing Smöl IXPs

Some internet exchanges break through the magical 1Tbps barrier (and get a courtesy callout on Twitter from Dr. King), but many remain smöl. Perhaps it’s time to break the chicken-and-egg problem. What if there was a way to interconnect these exchanges?

Let’s take for example the Free IX in Greece that was announced at GRNOG16 in Athens on April 19th. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will be Gratis. But I will be connecting one very special member to Free IX Greece, AS50869:

FreeIX Remote

Free IX: Remote

Here’s what I am going to build. The Free IX Remote project offers an outreach infrastructure which connects to internet exchange points and PNIs, and allows members to benefit from that in the following way:

  1. FreeIX uses AS50869 to peer with any network operator who is available at public internet exchanges or using private interconnects. It looks like a normal service provider in this regard. It will connect to internet exchanges, and learn a bunch of routes.
  2. FreeIX members can join the program, after which they are granted certain propagation permissions by FreeIX at the point where they have a BGP session with AS50869. The prefixes learned on these member sessions are marked as such, and will be allowed to propagate. Members will receive some or all learned prefixes from AS50869.
  3. FreeIX members can set fine grained BGP communities to determine which of their prefixes are propagated and at which locations.

Members at smaller internet exchanges greatly benefit from this type of outreach, by receiving large portions of the public internet directly at their preferred peering location. Similarly, the Free IX Remote routers will carry their traffic to these remote internet exchanges.

Detailed Design

Peer types

There are two types of BGP neighbor adjacency:

  1. Members: these are {ip-address,AS}-tuples which FreeIX has explicitly configured. Learned prefixes are added to as-set AS50869:AS-MEMBERS. Members receive all prefixes from FreeIX, each annotated with BGP informational communities, and members can drive certain behavior with BGP action communities.

  2. Peers: these are all other entities with whom FreeIX has an adjacency at public internet exchanges or private network interconnects. Peers receive some (or all) member prefixes from FreeIX and cannot drive any behavior with communities. With respect to internet exchanges and peers, AS50869 looks like a completely normal ISP, advertising subsets of the customer AS cone from AS50869:AS-MEMBERS at each exchange point.

BGP sessions with members use strict ingress filtering by means of bgpq4, and will be tagged with a set of informational BGP communities, such as where the prefix was learned, and what propagation permissions that it received (eg. at which internet exchanges will it be allowed to be announced). Of course, prefixes that are RPKI invalid will be dropped, while valid and unknown prefixes will be accepted. Members are granted permissions by FreeIX, which determine where their prefixes will be announced by AS50869. Further, members can perform optional actions by means of BGP communities at their ingress point, to inhibit announcements to a certain peer or at a given exchange point.

Peers on the other hand are not granted any permissions and all action BGP communities will be stripped on prefixes learned. Informational communities will still be tagged on learned prefixes. Two things happen here. Firstly, members will be offered only those prefixes for which they have permission – in other words, I will create a configuration file that says member AS8298 may receive prefixes learned from Frys-IX. Secondly, even for those prefixes that are advertised, the member AS8298 can use the informational communities to further filter what they accept from Free IX Remote AS50869.

BGP Classic Communities

Members are allowed to set the following legacy action BGP communities for coarse grained distribution of their prefixes through the FreeIX network.

  • (50869,0) do not announce anywhere
  • (50869,666) or (0,666) blackhole everywhere (can be on any more specific from the member’s AS-SET)
  • (50869,3041) prepend once everywhere
  • (50869,3042) prepend twice everywhere
  • (50869,3043) prepend three times everywhere

Peers, on the other hand, are not allowed to set any communities, so all classic BGP communities from them are stripped on ingress.

BGP Large Communities

Free IX Remote will use three types of BGP Large Communities, which each serve a distinct purpose:

  1. Informational: These communities are set by the FreeIX router when learning a prefix. They cannot be set by peers or members, and will be stripped on ingress. They will be sent to both members and peers, allowing operators to choose which prefixes to learn based on their origin details, like which country or internet exchange they were learned at.

  2. Permission: These communities are also set by FreeIX operators when learning a prefix (eg. on the ingress router). They cannot be set by peers or members, and will be stripped on ingress. The permission communities determine where FreeIX will allow the prefix to propagate. They will be stripped on egress.

  3. Action: Based on the permissions, members can further steer announcements by sending certain action communities to FreeIX. These actions cannot be sent by peers, but in certain cases they can be set by FreeIX operators on ingress. Similarly to the permission communties, all action communities will be stripped on egress.

Regular peers of AS50869 at exchange points and private network interconnects will not be able to set any communities, so all large BGP communities from them are stripped on ingress.

Informational Communities

When FreeIX routers learn prefixes, they will annotate them with certain communities. For example, the router at Amsterdam NIKHEF (which is router #1, country #2), when learning a prefix at FrysIX (which is ixp #1152), will set the following BGP large communities:

  • (50869,1010,1): Informational (10XX), Router (1010), vpp0.nlams0.free-ix.net (1)
  • (50869,1020,2): Informational (10XX), Country (1020), Netherlands (2)
  • (50869,1030,1152): Informational (10XX), IXP (1030), PeeringDB IXP for FrysIX (1152)

When propagating these prefixes to neighbors (both members and peers), these informational communities can be used to determine local policy, for example by setting a different localpref or dropping prefixes from a certain location. Informational communities can be read, but they can’t be set by peers or members – they are always cleared by FreeIX routers when learning prefixes, and as such the only routers which will set them are the FreeIX ones.

Permission Communities

FreeIX maintains a list of permissions per member. When members announce their prefixes to FreeIX routers, these permissions communities are set. They determine what the member is allowed to do with FreeIX propagation - notably which routers, countries, internet exchanges and PNIs the member will be allowed to propagate to.

Usually, member prefixes are allowed to propagate everywhere, so the following communities might be set by the FreeIX router on ingress:

  • (50869,2010,0): Permission (20XX), Router (2010), everywhere (0)
  • (50869,2020,0): Permission (20XX), Country (2020), everywhere (0)
  • (50869,2030,0): Permission (20XX), IXP (2030), everywhere (0)
  • (50869,2031,0): Permission (20XX), PNI (2031), everywhere (0)

If the member prefixes are allowed to propagate only to certain places, the ‘everywhere’ communities will not be set, and instead lists of communities with finer grained permissions can be used, for example:

  • (50869,2010,2): Permission (20XX), Router (2010), vpp0.grskg0.free-ix.net (2)
  • (50869,2020,3): Permission (20XX), Country (2020), Greece (3)
  • (50869,2030,60): Permission (20XX), IXP (2030), PeeringDB IXP for SwissIX (60)
  • (50869,2031,8298): Permission (20XX), PNI (2031), IPng Networks GmbH (AS8298)

Permission communities can’t be set by peers, nor by members – they are always cleared by FreeIX routers when learning prefixes, and are configured explicitly by FreeIX operators.

Action Communities

Based on the permission communities, zero or more egress routers, countries and internet exchanges are eligible to propagate member prefixes by AS50869 to its peers. Members can define very fine grained action communities to further tweak which prefixes propagate on which routers, in which countries and towards which internet exchanges and private network interconnects:

  • (50869,3010,3): Inhibit Action (30XX), Router (3010), vpp0.gratt0.free-ix.net (3)
  • (50869,3020,1): Inhibit Action (30XX), Country (3020), Switzerland (1)
  • (50869,3030,1308): Inhibit Action (30XX), IXP (3030), PeeringDB IXP for LS-IX (1308)
  • (50869,3031,8298): Inhibit Action (30XX), PNI (3031), IPng Networks GmbH (AS8298)

Further actions can be placed on a per-remote-neighbor basis:

  • (50869,3040,13030): Inhibit Action (30XX), AS (3040), Init7 (AS13030)
  • (50869,3041,6939): Prepend Action (30XX), Prepend Once (3041), Hurricane Electric (AS6939)
  • (50869,3042,12859): Prepend Action (30XX), Prepend Twice (3042), BIT BV (AS12859)
  • (50869,3043,8283): Prepend Action (30XX), Prepend Three Times (3043), Coloclue (AS8283)

Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when learning prefixes.

What’s next

Bird

Perhaps this interaction between informational, permission and action BGP communities gives you an idea on how such a network may operate. It’s somewhat different to a classic Transit provider, in that AS50869 will not carry a full table. It’ll merely provide a form of partial transit from member A at IXP #1, to and from all peers that can be found at IXPs #2-#N. Makes the mind boggle? Don’t worry, we’ll figure it out together :)

In an upcoming article I’ll detail the programming work that goes into implementing this complex peering policy in Bird2 as driving VPP routers (duh), with an IGP that is IPv4-less, because at this point, I [may as well] put my money where my mouth is.

If you’re interested in this kind of stuff, take a look at the IPng Networks AS8298 [Routing Policy]. Similar to that one, this one will use a combination of functional programming, templates, and clever expansions to make a customized per-member and per-peer configuration based on a YAML input file which dictates which member and which prefix is allowed to go where.

VPP

First, I need to get a replacement router for the Thessaloniki router, which will run VPP of course. My buddy Antonis noticed that there are CPU and/or DDR errors on that chassis, so it may need to be RMAd. But once it’s operational, I will start by deploying one instance in Amsterdam NIKHEF, and another in Thessaloniki Balkan Gate, with a 100G connection between them, graciously provided by [LANCOM]. Just look at that FD.io hound runnnnn!!1