The 4th prediction from XebiaLabs about DevOps in 2017 relates to the growth of big data projects and the challenges that will bring around testing, Continuous Integration (CI) and Continuous Delivery (CD). Certainly, Big Data is big news and this article from Forbes magazine summarises the almost universal view from analysts that the market is set for significant growth.
However, if you scratch below the surface, a somewhat different picture emerges. A Gartner note from October 2016 highlights that getting Big Data projects to production is a challenge, with only 15% of respondents surveyed having got their pilot projects into production. Gartner focused its note on a lack of a well-defined ROI in projects. This may well be true, but we see that there are a number of cultural and technical challenges that need to be overcome to ensure the success of Big Data initiatives.
In many cases, it appears that organisations are not using DevOps approaches in their Big Data projects. The field of Data Science is quite foreign to most IT departments. Even where DevOps is practiced, Data Scientists have tended to form their own development teams completely divorced from operations. In building their algorithms and developing their models they are probably unaware of the performance implications of what they are designing.
Virtualisation has come late to the Big Data party. Operations teams, divorced from the development teams, have sourced and configured the infrastructure in a very traditional way, only to find that the developed system does not perform well. In this scenario, getting the data scientists to rework their algorithms or the operations team to provision new hardware is not a fast or cheap solution.
Using public cloud is an option, but scaling up Big Data applications can very quickly lead to costs that outstrip initial estimates and load balancing across multiple different types of deployment soon becomes cheaper to manage with in-house IT teams. There might also be concerns about data security and privacy that precludes use of public cloud services. This has led more organisations to look to private cloud implementations for their Big Data projects.
Big Data projects in general and Hadoop deployments in particular, throw up technical and operational challenges not seen in more traditional application environments. Companies are using DevOps tools like Puppet and Chef to provide manageable solutions but are having to add an array of specific capabilities and tools on top, to the extent it feels like they are crafting their own PaaS solutions.
This is still a relatively new market and vendors are working to deliver more robust, integrated automation tools to help manage the development and deployment of Big Data systems. This is not a reason to delay your Big Data projects. Many of the issues we have discussed are cultural and organisational. Others are about finding, integrating and deploying the right tool sets. These are the same issues early adopters of DevOps encountered. They are the challenges that we at Percipience cut our teeth on. They are the challenges we can help you navigate and ensure you get both the ROI and competitive advantage that the effective use of Big Data can bring you.