Gold BlogCloud Computing, Data Science and ML Trends in 2020–2022: The battle of giants

Kaggle’s survey of ‘State of Data Science and Machine Learning 2020’ covers a lot of diverse topics. In this post, we are going to look at the popularity of cloud computing platforms and products among the data science and ML professionals participated in the survey.



By George Vyshnya, Co-Founder/CTO at SBC

Figure

Every time you see a giant, you must know that that giant might be just a dwarf somewhere else! ― Mehmet Murat ildan

 

Introduction

 
I am going to dedicate a series of articles to the insights from the data collected in Kaggle’s survey of ‘State of Data Science and Machine Learning 2020’ (https://www.kaggle.com/c/kaggle-survey-2020).

The survey covered a lot of diverse topics, and each of them deserves a separate post to discuss the respective trends.

Notes:

  • Kaggle (www.kaggle.com) is a global community made up of data scientists and machine learners from all over the world with a variety of skills and backgrounds. The community has around 3 million active members. Although it is not rigorously representative of the entire population of Data Science and ML professionals across the globe from the sociological perspective, it still constitutes the significant fraction of the practitioners and professionals in the field. Therefore, the results of the survey can really draw the projections of where the Data Science and AI/ML industry is likely to evolve in the next couple of years.
  • You can check the repo per https://github.com/gvyshnya/state-of-data-science-and-ml-2020 to see how every insight discussed in this post has been discovered.

 

Battle of Giants

 
In this post, we are going to look at the popularity of cloud computing platforms and products among the data science and ML professionals participated in the survey. In particular, it will cover

  • Cloud Platforms usage
  • Cloud Computing products usage
  • Cloud ML products usage
  • BigData platforms
  • BI tools (mostly, the cloud-based ones)

The line of the narrative in this chapter will be often attached to the good news and opportunities for the top three cloud service providers in the market as follows

  • Amazon Web Services (AWS)
  • Google Cloud Platform (GCP)
  • Microsoft Azure Cloud (MS Azure)

Note:

  • Survey organizers defined non-professionals as students, unemployed, and respondents that have never spent any money in the cloud. Everybody else is considered to be a professional

 

Usage of Cloud Service Providers

 
Image for post

We find that the top three list of cloud service providers among the Kaggle survey responders are

  • AWS
  • GCP
  • MS Azure

The rest of the cloud service providers seems to loose the competitive edge toward the top three provider list above at the moment.

Image for post

Also, it is notable that ‘None’ category slightly exceeds the size of MS Asure bar, and it means the market may not be saturated with the cloud service provider offerings.

We also see professionals with 3–5 years and 5–10 years of programming experince to be the largest group of users of the top 3 cloud service providers. More senior professionals (with 10+ years of experience) are less represented within the cloud service users on each of top 3 platforms (depending on the marketing priorities, special actions to educatesuch seniors could help to get the better spread of the cloud services).

As we can see, the majority of top three cloud service provider users fit into the roles below

  • Data Scientists
  • Software Engineers

The third posion is held by

  • ML Engineeris (AWS, GCP)
  • Data Analysts (MS Azure)

As noted earlier, ‘Other’ occupation group is too big itself, and it might be worth breaking it into more granular categories in the future surveys. As we see, ‘Other’ group takes a significant fraction of each cloud service platform users (although it is never seen in the top three list ffor any off the platforms).

In terms of the user occupation and programming experience, all of the top three Cloud Service Providers share the same trends below

  • Data Scientists with 3–5 and 5–10 years of programming experience are the top user groups for AWS within the survey respondents
  • In Software Engineer, ML Engineer, and Data Analyst groups, professionals with 3–5 and 5–10 years of experience predominate
  • In Research Scientist, Data Engineer, DBA, Statistician and Other groups, professionals with 10+ years of experience are the biggest fraction of the users
  • In Product/Project Management group, professionals with 5+ years of experience are the biggest fraction of the users
  • In Business Analyst group, we see the users with 1–2 years of experience to dominate

In terms of the organizational envrionments, the most Data Scince and ML professionals using Cloud Services can be found in

  • Organizations with 0–49 employees, having 1–2 workers dedicated to Data Science workloads
  • Organizations with 10000+ employees, having 20+ workers dedicated to Data Science workloads

So we can conclude that AWS, GCP, and MS Azure tightly compete on the same types of the organizations/users.

 

Usage of Cloud Computing Products

 
Image for post

We find that

  • in the segment of cloud computing engines, Amazon EC2 is more popular than its rivals from Google (Google Cloud Computing Engine) and MS Azure (Azure Cloud Services)
  • in the segment of cloud functions, AWS Lambda is more popular than its rivals from Google (Google Cloud Functions) and MS Azure (Azure Functions)
  • in the segment of cloud container runners, Amazon Elastic Container Service is more popular that its rivals from Google (Google Cloud Run) and MS Azure (MS Azure Container Instances)
  • Google holds the second place in cloud computing engine and cloud function segments, and it is on the third place in the cloud container runner segment
  • there is a huge pool of responses with ‘None’, and it is most likely to indicate the entire market of cloud computing applications is not saturated yet

In terms of user roles, the most users of every cloud computing product above hold the roles below (top to bottom)

  • Data Scientists
  • Software Engineers
  • ML Engineers
  • Data Analysts

 

Usage of Cloud Computing Products by Programming Experience

 
Image for post

In addition to the insights above, we see that the top number of cloud computing product users fall into the following clusters in terms of their programming experience

  • 5–10 years of experience
  • 3–5 years of experience
  • 10–20 years of experience

Juniors and super-seniors (20+ years of programming experience) seem to be less covered by the respective knowledge/skills.

 

Usage of Cloud ML Products

 
Image for post

We find that

  • Google Cloud AI Platform / Google Cloud ML Engine leads the ML cloud products usage ‘nomination
  • the second and third best are Amazon SageMaker and Azure Machine Learning Studio, respectively
  • Data Scientists are the top users of cloud ML products (for every product investigated)
  • There is a huge chunk of responders who indicated they do not use cloud ML products at all — it indicates the market is under-saturated, and there is a good growth potential, subject to resolving the marketing and end-user barries on the way

 

Usage of Cloud ML Products by Programming Experience

 
In addition to the insights above, we can see that the cloud ML products are mostly used by the responders who has programming experience of

  • 3–5 years
  • 5–10 years

 

Usage of Cloud ML Products by Organization Size and DS Capacity

 
Image for post

We find that the majority of organizations in every size category does not use any Cloud ML Products at the moment.

For the tiny fraction of those who use them, there are interesting insights as follows

  • in small organizations (0–49 employees), Google Cloud AI Platform / Google Cloud ML Engine dominates
  • in the middle-sized organizations (50–249 employees), Google Cloud AI Platform / Google Cloud ML Engine and Amazon SageMaker titly
  • for companies of bigger size (250+ employees), the size of Data Science team is often correlated with the preferred Cloud ML Product (smaller teams sticks to Google Cloud AI Platform / Google Cloud ML Engine more, and Data Science teams with 20+ headcount are more inclined to use Amazon SageMaker )

 

Usage of Big Data Products By Occupation

 
Image for post

We find that

  • Overall top 3 list is constituted by three relational DBMS platforms (MySQL, PostgreSQL, MS SQL Server)
  • MongoDB, a non-relational database platform, takes position 4 in the list
  • Other relational DBMS platforms in the list (Oracle, IBM DB2, SQLite) are behind MongoDB
  • In the segment of truly cloud-based Big Data products, Google BigQuery overruns its Amazon and MS Azure competitors ( Amazon Redshift, Amazon Athena, Amazon DynamoDB, and Microsoft Azure Data Lake Storage)
  • Google Cloud SQL instances are still less popular then ‘native’ relational database instances for MySQL and PostgreSQL
  • MS Access is still in use in the industry
  • Data Scientists are the top users of each product in this list

 

Big Data Product Usage Patterns by User Occupation and Programming Experience

 
We find that

  • MySQL and PostreSQL are the most popular database management platforms within each occupation
  • MongoDB is quite popular with Software Engineers (although less popular then MySQL and PostreSQL)

 

Big Data Product Usage Patterns by Organization Size and Data Science Capacity

 
We find that

  • Almost all of organizations except from the extra-large ones address their data management needs with MySQL, PostgreSQL, and MongoDB the most
  • Extra-large organizations (with 10000+ employees) prefer to work with MySQL, MS SQL Server, Oracle, and PostgreSQL

 

Usage of BI Tools

 
Image for post

We find that

  • Tableue and MS Power BI outperforms other rivals significantly
  • Google Data Studio becomes a challenger to the leading BI products above, occupying the third place in the list
  • Data Scientists, Data Analysts, Research Scientists, and ML Engineers are the most frequent users of BI tools
  • Huge fraction of the survey responders indicated they do not use BI tools at all

 

AWS Professional Users Across the Globe

 
Image for post

We find that

  • AWS is popular among survey respondents in India and USA the most
  • Brasil, Japan and UK go into tier 2 in terms of the number of respondents from these country who use AWS

 

GCP Professional Users Across the Globe

 
Image for post

We find that

  • India is the top country where GCP is used
  • USA takes the second place in the rank but it is significantly below India (unlike AWS, where India and USA were relatively on a par)
  • Japan and Brazil are in the tier 2 in terms of the number of respondents from these country who use AWS
  • GCP is less popular in UK, Canada and Australia vs. AWS
  • GCP outperforms AWS in Turkey, Indonesia, and Russia

 

MS Azure Professional Users Across the Globe

 
Image for post

We find that

  • Top country in terms of the number of MS Azure users is India (although MS Azure is well behind AWS and GCP there)
  • USA holds the second place in the rank, and the number of MS Azure users is on a par with the number of GCP users in the US
  • Brazil belongs to tier 2 in term of the number of MS Azure users
  • In the majority of the countries (except the US), the number of MS Azure users is smaller then the number of GCP and AWS users

 

Summary

 
In this post, we reviewed the state of art with using Cloud computing platforms, products, and tools by the professionals in Data Science and ML industry. These are not just their preferences as of the end of 2020. These are cornerstones which are most likely to determine the trends for 2021–2022 as well.

The next couple of years will be crucial in the battle of Cloud Computing giants for minds, arms, and budgets in the Data Science and ML industry. Although AWS’s position still looks stronger than other top rivals, the challenges from GCP could be the intrigues part of the market reshaping in the years to come. At the same time, MS Azure seems to keep its strong positions in North America (while having little chances to penetrate other continents significantly vs. AWS and GCP).

However, we entered the age of global turbulence. 2021, the year under the Star of Kings, may expose us to unexpected surprises in every aspects of our lives.

Note: you can check the repo per https://github.com/gvyshnya/state-of-data-science-and-ml-2020 to see how every insight above has been discovered.

 
Bio: George Vyshnya is Co-Founder/CTO at SBC, helping CEOs and CTOs to grow revenue via implementing smart AI, BI and Web solutions.

Original. Reposted with permission.

Related: