Matrix Of Services

Matrix of Services, which I will call MAXOS, is a way to organize continuous delivery of large systems. To start, big applications are divided into smaller services. For example, one service might render a Web page, and call a different service to get information about a product. Each development team maintains a small number of services, and releases changes when changes are ready. Amazon, which has thousands of services, will release a change about once every 11 seconds, adding up to about 8000 changes per day.

Service changes: Branch review, then centralized CI

1) Change a service

Each team makes changes to the services they are responsible for. Often, they use a branch review process to contribute changes that are promoted to the staging or integration version.

The MAXOS system emphasizes developer responsibility. Programmers lead each team, and the team takes complete responsibility for design, programming, testing, and release of some set of services. There are often so many different services that the operations team cannot track and configure all of them. To compensate, developers take some responsibility for operations, monitoring performance, responding to alerts, and pushing fixes.

2) Test services together

The most recent version of each service runs in a centralized continuous integration system. Automated tests exercise the calls between services to make sure that the integrated system works correctly. This system is often radically centralized, with one test system for thousands of services. Having only one test system makes it possible to maintain a complex and complete test environment.

If a service passes all of its tests in the integration environment, its team can promote it into production. If it doesn't pass the centralized tests, then the team can track down the calls that failed and work with other teams to fix any integration problems. The centralized continuous integration system uncovers dependency problems, and greatly simplifies the task of organizing a big project with interrelated parts.

MAXOS allows mere humans to build computer systems at a new level of scale, dubbed "Web-scale IT." I will write more about the implications of MAXOS in the chapter on "Continuous Enterprise"