Dublin PTG 2018: The Edge Sessions

I recently got snowed in at a hotel in Dublin with a couple hundred people I consider friends, a well-stocked bar/restaurant, a fantastic event staff, and a lot of energy.

I know…poor me, right? =)

Before Storm Emma and the Beast from the East settled in on the OpenStack Project Team Gathering and forced the Croke Park Stadium to close (thereby giving birth to the hashtag “snowpenstack”), we got a solid start to the week. I spent most of my Monday at a meeting with Board of Directors and most of my Tuesday at a meeting with the FEMDC SIG talking about Edge computing use cases. As with any long technical discussion, there was a lot to take in and try to keep straight in my head later, so I thought it’d be useful to dump some of my notes here.

First and foremost, I should note that my ramblings here are a supplement to the official etherpads we used during the event for collaborative note-taking:

If you’re not familiar with Edge use cases, it’s probably worth a brief pause here to check out Cloud Edge Computing: Beyond the Data Center, a whitepaper the group published earlier this year. One important factor in the use cases we’re talking about here is that “the cloud” will be distributed to a lot of different sites in geographically dispersed areas. This is a big change in thinking if you’ve been working in “classical cloud computing” where the general notion is to pool a (potentially tremendous) amount of compute/storage/etc resources into one or a few large datacenters. Imagine, for example, that you want a small amount of computing capacity to be available at every cell tower you own in China. Or that you want to run virtualized applications on customer premise equipment: like every set-top box in New York City. Also, consider that your compute capacity might not even stay in one place: you might have some capacity in a train or a connected bus barreling down the Autobahn. Not only might these pools of capacity move, they might also disappear for a time and then reappear later.

Keeping capacity close to the end user is one of the many factors that makes edge computing use cases “squeeze” OpenStack at different points than classical large-pools-of-capacity-in-big-datacenters use cases might. If you have hundreds of thousands of pools of capacity scattered across a large geographic area, is it actually practical to centralize the control plane? Do you need tiers as opposed to a flat or two-level hierarchy? Latency certainly seems like a challenge in these environments, but also resiliency: you certainly don’t want the capacity at all those cell towers to be rendered unusable if a centralized component goes down.

The morning started off with some thinking about how we wanted to approach the discussion for the rest of the day. Based on previous discussions, two basic strategies were in play: a “top-down” approach where there are multiple OpenStack instances managed by some higher-level entity, or a “bottom-up” approach where a cluster of OpenStack instances have some shared components that provide management (for example: they might all share a Keystone instance). After some discussion, it was decided to focus most of the rest of the day’s discussion on an architecture that features multiple (potentially thousands) of regions, with some top-level orchestration to help push common components like images or users to multiple regions.

If you’re not familiar with the concept of regions in OpenStack, here’s a quick primer: generally speaking, regions in OpenStack are thought of as geographically distinct OpenStack entities under common control. For example, in a “classical cloud” use case, I might have a private cloud with a “us-west” and “us-east” region for users within my company based in San Francisco and New York. In most cases, regions are separate API endpoints and each region has it’s own control plane–that is, it’s own instances of things like Nova, Neutron, etc. In some deployments they may share some components (typically Keystone or Horizon) but in many cases those, too, are unique to each site.

If we expand the use of regions to Edge use cases, you might think of a few servers at every cell tower constituting a region, or an entire OpenStack region consisting of a lightweight deployment all housed on a set top box. As the number of regions piles up in the Edge world, it becomes quickly evident that some way of managing all those regions is needed.

It’s worth noting before we go on that this is just one deployment scenario, not necessarily “the” deployment scenario that fits all use cases. Depending on your use case, it’s easy to imagine variants: for example, instead of making each set-top box in an apartment building it’s own region, I might put a control plane in an electrical closet somewhere and have each set top box serve simply as a compute host.

Having chosen an architecture to hone in on for the day, Alan began relating some of the challenges he’d experienced with such an approach and tooling currently in place. That prompted Jonathan Bryce to whip up a new etherpad for Alan’s Problems so we could dig in and work out how to solve them. An advantage of this architectural approach is that it’s closer at hand: we already have capabilities in the OpenStack software to build a substantial portion of such an architecture (for example: the concept of regions has been a thing in OpenStack since the very early days).

I’ll spare you most of the details of the discussion since much of it is captured in the etherpad but essentially we started boiling things down into something that looks like this:

-------------  ----------- -------
| "Golden   |  | "Golden | | NKB |      < Centralized
|  Keystone"|  |  Glance"| |     |        (or tiered)       
-------------  ----------- -------
         \        |          /
          \       |         /
          /       |    |||  \
         /        |    |||   \
---------- ----------       ----------
|region 1| |region 2| |...| |region n|      < Geographically dispersed
---------- ----------       ----------

The general idea here is that in a (massively)multi-region architecture, there are probably still a great number of things that you’d want to be propagated to all or just some regions. For example: when I onboard a new user or create a new project, I might want that new user to appear in all regions, or I might want that project just to exist in certain regions (perhaps “all regions in a certain geography” or “all regions that are actually trains”, or even “all regions that have a policy entitlement to run certain workloads”). Glance is conceptually similar: I might have a certain VNF image that I want propagated to all or many regions. However I as a human being would probably be really frustrated if I had to upload an image or create a user manually on hundreds or thousands of endpoints. It’s worth calling out that “create” is just one action: I might also need to update things, delete them, see what their state is, etc. Moreover, I as a human don’t want to have to keep tabs on that over time and make sure that if I asked for certain objects to be in certain places that they get there and stay there.

This is where the concept of “intent” comes in (and with it, the mysterious “NKB” block in the diagram above). Imagine if I only had to push an image to a single instance of Glance (a “golden” Glance if you will) and in doing so I could also issue some statement of intent. That statement might be something like:

  • “Here is my image, it should be in all regions tagged ‘set-top-box’.”
  • “Here is my image, it should be available to download in all regions marked ‘cell-towers-with-1-mile-of-denver’ and should actually be present in all regions marked ‘high-use-towers-within-1-mile-of-denver’.
  • “Here is my image, nuke it from orbit in all regions.”

Getting an image into Glance is pretty standard stuff: I can simply deploy a centralized Glance server (or more likely a horizontally-scaled set of them). The “intent” bit could possibly be accomplished using existing metadata primitives available in various OpenStack services: things like image metadata, Neutron tags, extra specs, etc. Again: it might be useful to make use of existing primitives because they already exist, so we can get a head start on getting a solution deployed. We could alternately have a single service that provides a more descriptive and unified language for this. We then also need something that carries out the intent that we’ve provided–and makes sure it stays carried out. E.g. if the image I said should be in a certain region gets deleted somehow, that abnormal state should be detected and resolved by having the image synchronized from the “golden Glance” again.

That last bit involves carrying out actions across multiple regions (among other things). Orchestrating an action over multiple regions sounds a bit like what the Kingbird project does, though it also has other elements to it (one might imagine, for example, using something like Congress to define policy or “intent”. For the sake of simplicity in conversation, we referred to the synchronization/intent/policy service as simply “New Kingbird” (or “NKB”) for lack of a better name. Elements of what we need for such a service already exist in parts in other services, but it’s likely we need to build either some glue or a new component for this part of the architecture.

One point was reiterated several times that bears repeating here as well: any time you’re dealing with an intent-based or policy-based system with a potentially very large number of endpoints to manage, you also need to be able to query the system to see what it’s progress is toward fulfilling your stated intent. If we imagine a “New Kingbird” component, that means I should be able to ask it things like “Hey, I asked you to push this image to a bunch of regions …is it there yet?” and have it answer “Your image now exists in region A, I’m pushing it to C, D, and F right now, E-G will be next, and oh by the way I got a connection timeout from B when I tried 13 seconds ago so I’ll try it again in 47 seconds (which will mark my fourth attempt).”

I think that provides a basic architectural overview…that said, there were a ton of details also discussed. I’ll note a few here, but once again I’ll invite you to check out the etherpad for more.

  • We should not assume that a given user has administrative rights either to the “golden” components or to all regions. For example, a given user might state his intent to push a new project to region A, but his access rights might only allow him to push it to B, C, and D.
  • There is a need for some sort of portal (I for one dare not say “single pane of glass”!) to get a general overview of what’s going on and what state things are in. Here again though, privilege levels and roles should be considered.
  • The diagram I’ve drawn above shows a simple two-level hierarchy. One should not assume that’s the way it would actually work for a substantially large and dispersed use case: there might actually be hierarchies here (for example, a “golden keystone” that serves all regions in a metropolitan area rather than an entire country…but that can pull users/projects it wants to propagate to it’s child regions from a central “golden Keystone” higher up).
  • It should also not be assumed that all regions need to pull directly from a single “golden” instance, though that’s perhaps the easiest thing to diagram to kickstart a conversation. Instead, imagine a world where Region A might receive an instruction that it needs to download a particular image. It could get it from a central “golden Glance” but it might also know that it has high bandwidth/low latency connections to Region B, C, and D available. Upon querying, it determines that C has already downloaded the image, so A fetches it from C rather than going to a central source. This sort of peer-to-peer structure bears some resemblance to other things that already have code and protocols established (for example: BitTorrent or perhaps even blockchains for bits about intent and assertion).
  • One advantage of this architecture is that it provides a degree of separation of concerns. Imagine a simple two-level hierarchy with a “golden Glance” and a hundred regions beneath it. It’s simple for the admin, since he can push an image to one place and have it propagate to many. But it also provides some resiliency: if that “golden Glance” goes down, all the regions can keep operating (they simply won’t receive new images or commands to delete existing ones, etc until the golden Glance returns). If something truly catastrophic happens, one would also theoretically still be able to use the API endpoints of the individual regions to re-exert control in the absence of the centralized components (would it be tedious and require a lot of foreach loops? Yes. Would it be possible to do things? Also yes.). Add in the torrent-like capability mentioned in the previous bullet point, and you get an even more resilient architecture.
  • There are some discernable differences in the types of things we’d want to propagate down to regions. For example: a “user” or “project” amounts to a few rows in a database (which is small and fast to push even on relatively low speed/high latency links). A VM image, on the other hand, might be several gigabytes and take considerably longer. Those sort of differences should be taken into account, and point to the need for things like the ability to query current state vs intended and progress to synchronizing the two. It also points to the potential use case for “lazy loading” of some object types. I might know that a give image is going to be needed in a handful of regions and declare my intent that it should be pushed down to all of them. I might also know that the same image might be needed in some others, but not necessarily and it might be ok if it took a while to start up. For those regions, I might merely want to declare intent that the image be available to them (e.g. it resides in another region they know how to talk to, or in a central/regional place that they can download from) and let them fetch it when and if it’s needed.
  • Another service that might factor into this design is the placement service from Nova (which might eventually be extracted into it’s own standalone repository/service. It already offers things like traits and resource pools that would be potentially useful.
  • It’s also entirely possible that not all regions will be the same: some might be configured differently than others or might be running a different version of the OpenStack software (some might not even run all the services…I might have Manilla in some but not all regions). In fact, it could be argued that this becomes a virtual certainly in massive edge cases with a large number of edges that may temporarily drop off the map. Resolving differences is something that needs to be addressed, and again there are some existing tools like Oaktree and Shade that might be useful for doing so.
  • One could also imagine completely different ways to solve these problems. This architecture isn’t meant to be a final end-all-be-all for all users and use cases–rather, it was meant as something concrete that could be used for discussion to flush out what problems exist, could be solved, and need more consideration.

If any of this sounded interesting to you, feel free to join the next FEMDC Meeting and/or join the edge-computing@lists.openstack.org mailing list.

Tags// , , , ,