oDesk is the world's largest online marketplace for contract talent. Their Web site allows businesses to find, hire, manage, and pay talented freelancers from around the world. oDesk posts over 1.5 million jobs a year in categories like web development, software development, networking and information systems, and design and multimedia.
oDesk provides an example for organizations who have Scrum-style, multifunctional teams and want to Unblock! for more capacity and more frequent releases.
oDesk has been able to dramatically expand their development capacity by utilizing a geographically distributed organization. By recruiting and managing geographically distributed teams, they bypass the intense competition for talent around their home location in Silicon Valley. The oDesk development team now has about 200 people, including about 150 programmers, 15 project managers, 5 designers, 30 test engineers, 10 devops, and 10 database specialists. Seventy-five percent of these people work from remote locations.
It's not easy for remote engineers to understand a system as big as oDesk. They need to have great problem solving skills, because they are physically isolated and working alone. They need great communication skills, because communicating with online chats and hangouts is harder than with a white board. They also need to be technically very independent.
oDesk's recruiting process reflects these requirements. They hire only the top developers available on oDesk, and only after extensive testing. Candidates spend 1-2 weeks in a "bootcamp," where they work on simple changes and learn coding practices and technical architecture. They don't get access to the entire code base until they are experienced.
To release more frequently, oDesk made a few specific changes.
Organize by feature: Previously, teams were organized by layer, working on front end, services, or the database. This required a lot of communication and (online) meetings to coordinate a complete change across all layers. The new teams own relatively well defined and self-contained areas of the product, such as freelancer search, client onboarding, or the payment platform.
oDesk currently has 25 teams. Each team has a PM (product manager or product owner), and many PMs work on two or three teams at a time. Each team also has an architect who has a fairly rich role, in that he is at the same time the scrum master, the engineering manager, the architect and the lead dev. An architect manages between three and ten developers. A typical team will include three PHP developers (our front end), one html/css/javascript developer, one Java developer (our middle layer), and one test engineer. The bigger teams tend to informally break themselves out as multiple smaller teams.
Continuous integration: All commits trigger a CI run which completes all unit tests in about 10 minutes.
Teams are responsible for testing and operations: oDesk has kept a small centralized QA organization for overall regression testing and test framework development, but most of QA is in the teams. Development teams are also taking more responsibility for operational roles: DevOps and SiteOps.
Feature flags: Developers are encouraged to commit to the master branch very frequently, and to use feature flags to control the visibility of their features.
I interviewed the oDesk VP of Products, Stephane Kasriel, to find out how oDesk adapted to more frequent releases.
How do you feel about working with a geographically distributed team?
We "eat our own dog food." Seventy-five percent of the team is remote (oDesk freelancers), and mostly they are remote from each other too, working from home. We have three small offices, but most of the teams don't work from there.
This is awesome in many ways. We have access to amazing talent that other Bay Area-based startups are ignoring, and we are able to attract and retain people for a long time. We hired 40 developers in the last six months; I don't think that many startups are able to do that. Our developers are heavy users of our web site, which allows everyone to think about ways to improve the product constantly.
There are downsides. Obviously it's just harder to collaborate remotely. While remote teams are aware that they need to adapt their schedules to maximize overlap in time zones, it's still not close to 100%. All in all, we think that our productivity is about 20% less than it would be if we had the same developers locally. But of course that's a very theoretical exercise, because we don't think we could have that number of talented people here given the incredible level of competition in our neighborhood.
Did you change testing? Did you add test layers?
Yes, this was a huge part of it. We have always had a unit testing system and a functional testing system. However, they weren't essential. Unit tests had low coverage. They could be broken on master for days at a time, even weeks in some cases.
Functional tests were owned by a test automation team that was completely isolated from the dev team. The architects would tell the manual QA team what the features were about. The manual QA team would write test case documents and execute them manually. Then they would tell the test automation team to automate a subset of those tests. As a result, coverage was extremely low, and the tests broke all the time. When the product changed, it took days for the automation team to know how to change their tests. Everyone was dissatisfied with how the interaction worked.
Now, developers are fully accountable for unit tests. Many teams have switched to test driven development. But even for the ones that haven't, there's a strict requirement that you cannot merge code that doesn't pass all unit tests. We've moved most of the test automation engineers into the Agile teams, under the direct supervision of the team lead. We've also radically rebalanced the QA organization from having a large manual team and a small automation team, to having a small manual team and a large automation team.
We have also dramatically improved the speed of execution of the tests. We profiled the testing code and improved it. We switched from browsers to PhantomJS, parallelized, and added hardware. We can run all of the tests in a few hours instead of holding code for a full week before release. We've become comfortable releasing a lot more often because we are much more confident that our unit and functional tests cover the important use cases and that nothing dramatic will happen when we release.
Did you change the code review process?
We moved from posting code on a review board to active pull requests. But probably more importantly, we've moved from big commits and long running branches, which are a complete nightmare, to frequent commits and frequent merges.
Did you add to your feature flag system?
We created the feature flag system for the purpose of switching to Continuous Delivery. We never needed it before, because developers would primarily merge when features were complete. Feature flags are not perfect; for example, nobody is excited about having to clean up the dead code after feature flags are permanently turned on. But feature flags have enabled us to move to a "branch by abstraction" model, which really helps reduce the risk in each release.
Did you change any roles? For example, did you move any responsibility for approval from testers to developers? We find that this almost always happens in a continuous process. How did you change the role of developers, test, and devops?
Yes. Responsibility moved from a centralized QA organization, which still exists but is 10% of its former size, to each team lead. We are not yet at the point where every team has its own independently deployable assembly. At least half of the teams still contribute to a single deployment artifact, so there is still overall management oversight, but it's relatively lightweight. Team leads sign off in a permanent skype chat, and then we release the code.
It's a big group. How do you manage dependencies and communication between teams?
At the beginning of the year we publish a 10 page product and technical strategy document, which guides us throughout the year. Every six weeks, each Agile team presents progress and plans to our engineering directors. The directors can help them identify and solve dependencies. We have a group of eight directors that meets weekly to make significant architectural decisions.
We're trying to remove dependencies by verticalizing the stack. It's an ongoing process and one that takes months to complete. Of course, even in a verticalized environment, there are always features that cross boundaries. When the crossing is small and the team is knowledgeable, the team changes the other team's code and gets it reviewed and approved through pull requests. When the crossing is higher, the teams synchronize during their sprint planning meeting, and dependent tasks are prioritized by the corresponding teams. When prioritization issues occur, they get escalated and I make a decision with our directors. Communication happens mostly between the architects and team leads, a relatively contained group of about 40 engineers. We have a weekly 90-minute meeting, which happens over Google Hangout and a phone line.
Are you dividing your code into separate services?
We're still in the process. The idea is to go from three fairly entangled horizontal tiers (front, middle, and database) to verticalized stacks. One of the main challenges in doing so is to isolate reusable classes and functions into packages, and then rewrite the non-reusable code into independent repositories that leverage those shared packages.