5 tips to help data scientists evolve in an expanding field

June 5th, 2020

This article is by Geoff Soon and originally appeared on the Digicon Asia Blog here: https://www.digiconasia.net/tips/5-tips-to-help-data-scientists-evolve-in-an-expanding-field


Data scientists need to pull more weight in establishing a pervasive data-first corporate culture. Here is how …

Data scientist roles are among the hottest tech jobs in the labor market and demand looks set to increase. The global shortage of talent in this area is particularly acute in developed areas of the Asia Pacific where Big Data jobs exist, such as Australia and Singapore.

Data scientist positions are also among the 10 fastest growing digital positions in the Guangdong-Hong Kong-Macao Greater Bay Area.

Indeed, data science professionals have a unique blend of skills that combine computer science, statistics, modeling, mathematics, and business acumen—a tough skill set to find. Organizations also want more out of the talent they have, as they seek to inject AI and machine learning (ML) into critical business processes to remain competitive.

Technology advancements are providing exciting ways to automate ML and implement better data management techniques, but data scientists must stay on top of these trends and act as the champion for increased efficiency and faster results in the organization.

Adopting automation-first technology and incorporating ML models into operations can ease the challenges data scientists face. Here are five practical tips that can help data scientists to stay up-to-date and excel in a rapidly changing workplace.

  1. Use AutoML frameworks to boost productivity
    Automated Machine Learning (AutoML) represents an opportunity to transform how data science and ML work together. AutoML platforms automate the tasks associated with developing and deploying ML models. These platforms standardize and democratize data science best practices, optimize and train data across a multitude of algorithms, and accelerate tasks that are extremely time-consuming and require vast amounts of knowledge.By automating data science processes and deploying the most powerful ML models, data scientists can save a significant amount of time and effort during data modeling and achieve insights faster.

    Once automatic modeling is complete, data scientists can use their business intuition to improve hyper-parameters of models as they see fit. Additionally, they can transform or combine model features or find third-party data to supplement the training data sets—an area where machines currently fall short.

    As new algorithms with better capabilities and performance are created and refined, AutoML platforms provide data scientists with quick access to cutting-edge algorithms without requiring them to study and master each new one. The most up-to-date best practices are automatically built into the AutoML platforms, with guardrails in place to prevent novice or citizen data scientists from skipping a critical step in the process.

  2. Adopt MLOps to embed AI into operations
    Last year, IDC estimated that spending on AI systems will grow at a compound annual growth rate (CAGR) of 50% from 2018 to 2022, reaching a total of $15.06 billion in 2022. However, this accelerated spending will be for naught if organizations cannot operationalize these technologies.According to Gartner analysts, only 47% of ML models actually go into production, which represents a huge gap between data science resources and actual business value and impact. AI requires more than data-scientists to be effective. Operations teams must be involved to help with critical operational and production tasks such as monitoring, alerting, upkeep, and compliance. Referred to as machine learning operations (MLOps), this new practice promotes the joint management of the ML data pipeline by bridging the gap between data scientists and operations teams.

    A collaborative partnership means that those in the appropriate roles can manage tasks that fall naturally within their realm. This type of support frees data scientists to focus on business issues and deal with data preparation and ML models to provide faster insights.

  3. Promote data consolidation for workflow efficiencies
    Obtaining data is the first thing data scientists do, but this fundamental requirement leads to two of the biggest data challenges faced today.The first is that data generated by disparate sources is often stored in separate silos. The second is that data exists in formats that are challenging to combine.

    As a result, data scientists must undertake the time-intensive task of discovering and gaining access to data before the other steps required to create a consistent and manageable data set. This becomes an ongoing issue when more data must be retrieved to complete a model. As a result, data scientists are the perfect spokespeople for data consolidation. Encouraging organizations to invest in a data platform where all data can be consolidated, and made accessible from a single source, will achieve workflow efficiencies.

    A single data location also protects organizations against the use of incomplete data sets or bad information.

  4. Unlock your data for greater insights
    Many organizations have the equivalent of an iceberg of data. Only a small subset is visible, while the rest exists “underwater”—unexamined and unused. It is important for data scientists to ensure their organization processes as much of that data as possible.Invisible data is especially prevalent in organizations that use data lakes for long-term storage. The best way to ensure light is shed on “dark data” is to make data easily discoverable, accessible, and usable.

    Companies can benefit by setting up self-service data refinery platforms where employees can refine data and share it with each other, further refining and sharing their joint insights. Sharing internal data with trusted partners, suppliers, customers, and vendors is another way to derive value, and organizations interested in monetizing data, can create marketplaces for other companies to license and pay for anonymized, aggregate data.

  5. Collect more data and encourage more ideas
    The other side of sharing data is bringing new data in. Organizations rarely collect all the data they need for analysis, and data scientists should not feel constrained to use only what they have internally. Data gathering should be part of every organization’s data strategy, and data scientists are in the perfect position to open new doors.Another idea gaining momentum is the use of a center of excellence (COE) for data, analytics, and data science. By centralizing resources, organizations create efficiencies and discover new ways for teams to collaborate around data, analytics, and business questions. This brings cognitive diversity to the conversation, helping to push boundaries and unveil new areas for exploration.

The future for data science is ripe with opportunity. Data scientists represent some of the brightest minds at work across dozens of industries, which demonstrates why demand is high and many business-critical challenges fall under their purview.

With this momentum, now is the time for data scientists to take the lead around AutoML, MLOps, and best practices for data management and usage.