Thinking about more than >200 technologies can be complex.
Even if you want to start your next big data challenge with a capable technology.
So, we started to build a big picture
with the goal to provide an overview of the passionated open source community around big data technologies.
We collected data about:
Based on the following GitHub command that produced log files for each repository:
git log --pretty=format:"\"%H\";\"%an\";\"%ae\";\"%cn\";\"%ce\";\"%s\";\"%cI\"" > ../technology.csv
Let's take a closer look on the collected data and find out which technology has the most progress made based on their contributions.
The MySQL technology is by far the technology with the highest amount of commits and contributors.
But technologies like Spark, Elastic Search, or even Machine Learning with scikit-learn is enjoying an increasing number of contributors. We think that those technologies with a high number of contributors will continue their progress with a lot of developments in the next months or years.
The question is: How can we encompass those developments with a raising variety of upcoming technologies and more important how do we know about all these technologies?
How about initiating a community that collects and structures all these technologies to solve this overview issue? Oh yes, this would be awesome ...
Good point, there already is an awesome community that curates big data technologies and discuses related domains to summarize related technologies.
But, why is it necessary to put big data technologies into domains?
Can we really handle the variety of big data technologies through abstract domains? Yes, but each domain is different ...
We can see that certain domains include a high variety of specialized technologies like "Distributed Programming" with a number of related technologies like flink, storm, tez, etc. On the other hand, there are technologies with more present technologies. Those domains fulfill a wider range of application. One example is the "Search engine and framework" domain with lucene-solr or elasticsearch.
But to some extent there are technology with broad history like MySQL compared to Spark for example. Is it really useful to collect all these technologies over a period of time?
Yes, because we can see the ups and downs.
The timeline shows technologies with a significant number of commits like MySQL (server), Kiji, or Impala and their ups and downs of implemented features. In general, the community is organic and people can quickly join to their favourite project. There is an increasing amount of contributions year-by-year especially at the beginning from 2014.
Some further questions arise because of these insights. Who are the contributors and do they contribute to one or more projects?
Every big data technology requires passionated contributors within a embedded core team to sustain. Tom Lane and Bruce Momijan for PostGreSQL, Mike Bostok for D3.js, or Jonathan Ellis for Cassandra to just name a few that have a really high number of contributions.
Our last question in this discovery is: Do we have really an organic community that joins, adopts, or turnes to specific big data technologies?
Big data technologies are awesome and insights of this community say why:
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. Please see www.deloitte.com/de/UeberUns for a more detailed description of DTTL and its member firms. Deloitte provides audit, tax, financial advisory and consulting services to public and private clients spanning multiple industries; legal advisory services in Germany are provided by Deloitte Legal. With a globally connected network of member firms in more than 150 countries, Deloitte brings world-class capabilities and high-quality service to clients, delivering the insights they need to address their most complex business challenges. Deloitte’s more than 225,000 professionals are committed to making an impact that matters. This communication is for internal distribution and use only among personnel of Deloitte Touche Tohmatsu Limited, its member firms and their related entities (collectively, the “Deloitte network”). None of the Deloitte network shall be responsible for any loss whatsoever sustained by any person who relies on this communication
Oliver Bieh-Zimmert
|