People sometimes describe DevOpsas a factory. It’s a good analogy. Like a factory, code goes go in one end of the DevOps line. Finished software comes out the other.
I’d take the idea one step further. In its highest form, DevOps is not just any factory, but a ‘lights-out’ factory.
Also called a “dark factory,” a lights-out factory is one so automated it can perform most tasks in the dark, needing only a small team of supervisors to keep an eye on things in the control room. That’s the level of automation DevOps should strive for.
In a lights-out DevOps factory, submitted code is automatically reviewed for adherence to coding standards, static analysis, security vulnerabilities and automated test coverage. After making it through the first pass, the code gets put through its paces with automated integration, performance, load and end-to-end tests. Only then, after completing all those tests, is it ready for deployment to an approved environment.
As for those environments, the lights-out DevOps factory automatically sets them up, provisions them, deploys to them and tears them down as needed. All software configuration, secrets, certificates, networks and so forth spring into being at deploy time, requiring no manual fidgeting with the settings. Application health is monitored down to a fine-grained level, and the actual production runtime performance is visible through intuitive dashboards and queryable operator consoles (the DevOps version of the factory control room). When needed, the system can self-heal as issues are detected.
This might sound like something out of science fiction, but it’s as real as an actual, full-fledged lights-out factory. Which is to say, “real, but rare.” Many automated factories approach lights-out status, but few go all the way. The same could be said of DevOps.
The good news is that you can design a basic factory line that delivers most of the benefits of a “lights-out” operation and isn’t too hard to create. You’ll get most of the ROI just by creating a DevOps dark factory between production and test.
Here is a checklist for putting together your own “almost lights-out” DevOps solution. Don’t worry. None of these decisions are irreversible. You can always change your mind. It will just take some rework.
1. IaaS or PaaS or containers – I recommend PaaS or Containers. Infrastructure as a Service has its place, but it has its downsides on price-point and configuration management. When you’re running a VM, it’s always on, so your spend for the resource is 100 percent though your utilization isn’t maxed out, so you’re paying to keep the VM running even while it’s not in use. The setup and configuration are also more complex, as you have to deploy the bare-metal instance and then deploy a configuration. Lastly, running IaaS, it’s all too easy to just run bespoke VMs and fall back into old habits. I’m a big fan of PaaS because you get a nice price point and just the right amount of configurability, without the added complexity of full specification. Containers are a nice middle ground. The spend for a container cluster is still there, but if you’re managing a large ecosystem, the orchestration capabilities of containers could become the deciding factor.
2. Public cloud or on-premises cloud – I recommend public cloud. Going back to our factory analogy, a hundred years ago factories generated their own power, but that meant they also had to own the power infrastructure and keep people on staff to manage it. Eventually centralized power production became the norm. Utility companies specialized in generating and distributing power, and companies went back to focusing on manufacturing. The same thing is happening with compute infrastructure and the cloud providers. The likes of Google, Amazon and Microsoft have taken the place of the power companies, having developed the specialized services and skills needed to run large data centers. I say let them own the problem while you pay for the service.
There are situations where a private cloud can make sense, but it’s largely a function of organizational size. If you’re already running a lot of large data centers, you may have enough core infrastructure and competency in place to make the shift to private cloud. If you decide to go that route, you absolutely must commit to a true DevOps approach. I’ve seen several organizations say they’re doing “private cloud” when in reality they’re doing business as usual and don’t understand why they’re not getting any of the temporal or financial benefits of DevOps. If you find yourself in this situation, do a quick value-stream analysis of your development process, compare it to a lights-out process, and you’ll see nothing’s changed from your old Ops model.
3. Durable storage for databases, queues, etc. – I recommend using a DB service from the cloud provider. Similar to the decision between IaaS and PaaS, I’d rather pay someone else to own the problem. Making any service resilient means having to worry about redundancy and disk management. With a database, queue, or messaging service, you’ll need a durable store for the runtime service. Then, over time, you’ll not only have to patch the service but take down and reattach the storage to the runtime system. This is largely a solved problem from a technological standpoint, but it’s just more complexity to manage. Add in the need for service and storage redundancy and backup and disaster recovery, and the equation gets even more complex. Again, the cloud providers are more than willing to own those problems, and offer cost-effective, scalable solutions for common distributed services that need high durability.
4. SQL vs. NoSQL – Many organizations are still relational database-centric, as they were in the 90’s and 00’s, with the RDBMS the center of the enterprise universe. Relational still has its place, but cloud-native storage options like table, document, and blob provide super-cheap high-performance options. I’ve seen many organizations that basically applied their old standards to the cloud, and said, “Well, you can’t use blob storage because it’s not an approved technology,” or “You can’t use serverless because it’s an ‘unbounded’ resource.” That’s the wrong way to do it. You need to re-examine your application strategy to use the best approach for the price point.
I once had a client whose data changed fairly slowly (every few weeks) but had to be accessed much more frequently. First they tried querying the same static data with the same queries, over and over. The performance was OK, but execution time went down significantly when the DB cache was primed. Then there was a push to give the DB instance more RAM so it could hold more data in the cache. We offered an alternative where we just precomputed static read models and dumped them in blob storage. The cost of the additional storage was a couple dollars a month, where increasing the specs of the DB would have cost more than a hundred a month. We achieved faster performance for less cost, but it required re-evaluating our approach.
5. Mobile – Mobile builds are one of the things that can throw you for a loop. Android is easy, Mac is a little more complicated. You’ll either need a physical Mac for builds, or if you go with Azure DevOps, you can have it run on a Microsoft Mac instance in Azure. Some organizations still haven’t figured out that they need a Mac compute strategy. I once had a team so hamstrung by corporate policy, they were literally trying to figure out how to build a “hack-intosh” because the business wanted to build an iOS app but corporate IT shot down buying any Macs. Once we informed them we couldn’t legally develop on a “hack-intosh,” they just killed the project instead of trying to convince IT to use Mac infrastructure. Yes, they abandoned a project, with a real business case and positive ROI because IT was too rigid.
6. DB versioning – Use a tool like Liquibase or Flyway. Your process can only run as fast as your rate-limiting step, and if you’re still versioning your database by hand, you’ll never go faster than your DBAs can execute scripts. Besides, they have more important things to do.
7. Artifact management, security scanning, log aggregation, monitoring – Don’t get hung up on this stuff. You can figure it out as you go. Get items in your backlog for each of these activities and have a more junior DevOps resource ripple each extension through to the process as its developed.
8. Code promotion – Lay out your strategy to go from Dev to Test to Stage to Prod, and replace any manual setup like networking, certificates and gateways with automated scripts.
9. Secrets – Decide on a basic toolchain for secrets management, even if it’s really basic. There’s just no excuse for storing secrets with the source control. There are even tools like git-secret, black-box, and git-crypt that provide simple tooling and patterns for storing secrets encrypted.
10. CI – Set up and configure your CI tool, including a backup / restore process. When you get more sophisticated, you’ll actually want to apply DevOps to your DevOps, but for now just make sure you can stand up your CI tool in a reasonable amount of time, repeatedly, with backup.
Now that you’ve made some initial technology decisions and established your baseline infrastructure, make sure you have at least one solid reference project. This is a project you keep evergreen and use to develop new extensions and capabilities to your pipelines. You should have an example for each type of application in your ecosystem. This is the project people should refer to when they want to know how to do something. As you evolve your pipelines, updated this project with the latest and greatest features and steps.
For each type of deployment — database, API, front end and mobile — you’ll want to start with a basic assembly line. The key elements to your line will be Build, Unit Testing, Reporting, Artifact Creation. Once you have those, you’ll need to design a process for deploying an artifact into an environment (i.e. deploying to Test, Stage, Prod) with its runtime configuration.
From there, keep adding components to your factory. Choose projects in the order that gets you the most ROI, either by eliminating a constraint or reducing wait time. At each stage, try to make “everything as code.” Always create both a deployment and rollback and exercise the heck out of it all the time.
When it comes to tooling, there are more than enough good open-source options to get you started.
To sum up, going lights-out means committing to making everything code, automated, and tested. You may not get there with every part of your production line, but just by tackling the basics, you’ll be surprised how much you can get done in the dark.