Thomas Legrand

I am a Data Scientist at Mention interested in programming and mathematics. Focused on Python, Kotlin and Go.

Big Data World 2019 in short

20 Nov 2019 » others

Day 1

KeynoteLength (min)
DataOps & MLOps - feedback & HPE approach: from concepts to testing to production25
Successful transformation Big data: recommendations and feedback25
Come discover how companies can easily and quickly implement Data Science in the cloud with Microsoft Azure and Dataiku.25
What if we used the power of algorithms to solve social and environmental problems?25
IA, a full member of your team?40
What are the challenges of the CDO?45
Immersive analytics25
Streaming data at the edge” : treat and value its data flows as close as possible to the source25
AI for Business - Insert Machine Learning in Business25
AI Is Not Software: Successful Machine Learning Projects25

Day 2

KeynoteLength (min)
DataOps, or how to renew Analytics and Machine Learning-oriented data consumption modes25
Towards the data-centric enterprise: how to rethink its organization?25
The challenges of data culture25
Data Driven Culture Construction at M6: The Bet for Data Self Service25
How to rhyme data governance and project agility?40
How Data visualization Speeds Machine Learning Projects25

Key takeaways

Data Culture

The data culture is the set of values, practices and methods shared within a company that improve awareness of its information assets and its value.

Several talks emphasized on the importance of building a genuine data culture within the enterprise to be able to be able ton conduct projects.

Many projects, Gartner estimates around 85%, fail to reach production. Companies that develop a deep understanding across the whole organization are more likely to succeed.

Knowing this, some companies take the path of self-service data like M6, a national TV channel. They want the access to be fast, iterative, collaborative and simple for everyone. For this purpose, they have followed 3 steps:

  • Deploying Superset, an open source data query tool
  • Teach SQL basics to everyone
  • BI team behaving like a coach

It’s also important to develop a common vocabulary to be able to communicate properly, demystify and remove the fear from the unknown.

Organization & Project Management

Data becomes information when it reduces indecision

The Happn CDO gave some insights on how they developed their team. One should not promise what’s impossible but rather emphasize on the long term goal by improving backend tracking and providing business specific dashboard for every team.

She backs the idea of data team members within business oriented squads and the need to be involved in strategic decisions: the Data team core roadmap should be part of the enterprise roadmap.

To succeed in a Data project, one needs to ensure the quality of the data, because it’s the key to good results and deeply specify the business use case to be solved.

Data Governance & Infrastructure

Data governance is the overall management of the availability, usability, integrity and security of data used in an enterprise.

Before addressing infrastructure matter it is important to spend time on Data Governance: one should not save every data they have, but rather focus on use cases and refine the quality data. One should ask what is the data, where it comes from, where it goes, who should have access to it ? It’s important to build a dictionary to understand each field of each database.

In term of infrastructure, I’ve identified similarities across different companies:

  • a Data Lake: in the cloud, containing any internal and external data
  • a Data Hub: a processed and indexed subset containing golden source
  • a Data Lab: a separate environment built to allow data scientists to run studies and create prototypes
  • a Database: used for analytics to produce dashboards and answer basics questions.

As a example, a consulting firm gave different examples of stacks:

  • Azure, Cloudera
  • Hortonworks, Oracle, DataIku
  • OneFS, PostegreSQL, GCP