Guest blog post
Author: Ross Momtahan is writing on behalf of DCIM supplier, Geist www.geistglobal.com
This year at Data Center World 2015 I was lucky enough to see a presentation from Paul Slater, Lead Architect of the Modern Data Center Initiative at Microsoft.
Paul spoke to us about the world that he works in – the world of hyperscale data centers.
To put into perspective the sort of scale we are talking about, Microsoft adds 3600 cores to their data centers every day.
If you’re struggling to visualise that then don’t worry because you’re in the same boat as me!
Working on this sort of scale has presented Microsoft with some difficult problems and they have had to change their approach over the years – learning a few lessons on the way.
Here are my top three takeaways from Paul’s presentation:
Be flexible and use standardised equipment
Start with the assumption that you are wrong! The fact that things very rarely go to plan is why Paul values flexibility so much.
Microsoft can fill their data centers within a few months and ensure that all equipment is of the same age, same generation and the same standards.
If you’re wanting to emulate this and don’t have the same resources then you should consider cordoning off your data center so that each part is filled with standard equipment.
Paul also stated that they have multiple designs that depend on the location of the data center – although they are not looking to move to Scandinavia as Facebook and Google have done.
Automation helps increase predictability
Warren Bennis once said “The factory of the future will have only two employees, a man and a dog. The man will be there to feed the dog. The dog will be there to keep the man from touching the equipment.”
Whether or not you believe that quote to be true – I personally think that you could automate the feeding of the dog too – it’s certainly true that machines are more predictable than humans.
Microsoft strip people out of data centers as much as they (ahem) humanly can. One thing Paul said that struck me was that on their data center tours, the number of people on the tour usually outnumbers the total number of people working in the data center. In their largest data center they have 5 full time staff.
Build redundancies at hardware level and resilience at service level
This one is simple maths. If you have a service using two data centers with 99.9% availability then you can estimate the overall availability of the service to be 99.9999%. Surely this is much easier than trying to get that one data center up to 99.9999% availability? This principle can be applied on many levels.
Paul also discussed the choice other businesses might have to make between building a new data center, colocation, managed hosting and public cloud services. All are valid choices, with cloud services growing massively in popularity due to the financial flexibility that they offer.
On the flip-side, cloud services give you lots of financial flexibility but very little solution control.
Unfortunately for those who are more concerned with solution control than financial flexibility, the CFO is generally the person who wins these arguments and the CFO is more interested in financial control than solution control.
One other downside of using a public cloud solution that needs to be considered is that it affects usage patterns. If it’s extremely easy to consume cloud services and increase costs incrementally this is very likely to happen! This makes it harder to understand the total impact on IT budget even if the cloud solution seems cheaper on face value.
Overall I found the talk from Paul to be very engaging and insightful – if he’s speaking at a data center conference near you then I’d recommend you take out the time to check out what he has to say.