Month: January 2020

Cloud Architect of the Year awards 2020

Massive congratulations to Rob Reus, Dennis Vijlbrief and Rene Pingen! They are the well-deserved winners of the respective Cloud Engineer, Cloud Security Architect and Cloud Architect of the year awards.

Well done and special thanks to our sponsors NetApp, Microsoft, Vamp.io and of course Global Knowledge.

We will return with a new edition of the Cloud Architect of the Year awards next year!

Cloud Engineer of the Year 2020

Rob Reus

Cloud Security Architect of the Year 2020

Dennis Vijlbrief

Cloud Architect of the Year 2020

René Pingen

How to avoid getting stuck in a data swamp

In the run up to the much anticipated Cloud Architect Alliance diner and award show on January 30th, our speakers give a sneak peak into what they have to say that night. Data Solution Architect René Bremer and Senior Cloud Solution Architect Sarath Sasidharan, both currently at Microsoft, share their take on architecting secure data lakes that add business value.

The challenge of scattered data

Many enterprises are considering setting up an enterprise data lake. They want (and often need) to get more business value out of the data that is available within the organization. But even though the potential value of data is recognized, actually using data in this way can be a challenge.

According to René Bremer and Sarath Sasidharan, enterprises need to centralize data first before they can use it to create business value. “Large enterprises face a problem when data needs to be accessible to different teams, departments and offices.” Bremer says. “The last thing you want is teams exchanging data bilaterally, as this data will be ‘marginalized’ and most likely is not accessible to the rest of the organization. It’s difficult to control data that does not have a single source that is used by the whole organization. Centralizing data therefore is the first step to creating a data lake that adds business value.”

Controlling accessible and useful data in a data lake

Centralizing data is a first step, but it certainly should not be the only one. For data in a data lake to become useful, it should be clear who owns it and who is allowed to access it. A pub-sub service should be in place to ensure asynchronous use of data. “Both metadata and a clearly defined pub-sub paradigm are key to creating a functional data lake” Sasidharan explains. “Any data that is put into a data lake needs to be stored according to a proper methodology, with proper metadata and scheming validation for pub-sub to work as it is intended to.”

Adding metadata may seem a fairly straightforward affair, but when terabytes or petabytes of data are involved, a strict policy for assigning metadata fields and labels makes the difference between a functional data lake and a dysfunctional data swamp, Bremer says. “You need to think long and hard about the way metadata is added to data. Business metadata, technical metadata and operational metadata need to be compulsory additions to data that is put in a data lake. Without them, it will be difficult to determine the source and value of data. And even worse, it will be difficult to have tools use the data at all. Ideally, metadata can be interpreted by a variety of tools, for example Bricks or Sequel.”

And that’s not all, according to Sasidharan. “You will also need to think about the data lake from the consumer’s perspective. You need to adhere to certain standards for API’s, streaming and other ways data in a data lake can be published. It should be easy for a data consumer to access the desired data while at the same time controls should be in place to ensure only authorized users can access certain data for a predefined period of time. A single pane of glass to manage consumption of data in a data lake is indispensable to a secure and efficient enterprise data lake.”

Learnings from real customer cases

To eliminate data siloes between application and business departments, Microsoft has created an open source data model (called Common Data Model) in collaboration with SAP and Adobe. During the upcoming Cloud Architect Alliance event on January 30th, Bremer and Sasidharan will discuss two customer cases and will elaborate on pub-sub flows, metadata and the Common Data Model, and general principles that need to be considered when implementing a data lake. To get guests started, they will even share some generic patterns on metadata and pub-sub. Make sure you don’t miss out and claim your free ticket right now! Don’t wait too long, there’s only a few spots left.

The sense and nonsense of cloud-native data lakes

In the run up to the much anticipated Cloud Architect Alliance diner and award show on January 30th, our speakers give a sneak peak into what they have to say that night. First up is Maurits van der Drift, mission critical cloud engineer at Schuberg Philis. Van der Drift will talk about the sense and nonsense of cloud-native data lakes and how data can help to enhance business models.

Nothing more than a database?

Gone are the days when tens or hundreds of on premises or colocated servers were clustered with Hadoop to store and process unstructured data. Cloud-native data lakes offer a far more flexible way to create and maintain databases without the need to manage the physical hardware. Another major advantage is that cloud-native data lakes are not bound to a specific location, which makes planning and executing migrations (from data center to data center, for example), a thing of the past. Even though a cloud-native data lake may in its core function like a data warehouse or data base, its cloud-native nature offers far more possibilities.

“One might think the term ‘cloud-native’ is just marketing lingo for what is essentially a database”, Van der Drift says. “And even though that would be a legitimate take on the cloud-native data lake from a purely conceptual point of view, recent technological developments have really changed how data can be leveraged to add business value. The cloud is indispensable in this sense, because it offers an efficient, tailor made solution for data analysis and value extraction.”

Know what to look for before you take a dive

That being said, the promise of data lakes has not yet been fulfilled, Van der Drift observes. “When the idea of data lakes started to become more popular, there seemed to be no limit to the business value unstructured data could generate. Combined with AI and machine learning, many organizations were told they were sitting on massive amounts of useful data and they needed to tap into this new possible revenue stream. The truth is however that in many cases, the unstructured data organizations have is tough to leverage from a business perspective. In addition, even if value can be extracted from unstructured data, it can only be successfully leveraged if you know what you are looking for. In other words: you need to have made the business case for a cloud-native data lake before you actually start creating one.”

Even though Van der Drift has some reservations when it comes to ‘the promises of Big Data’, he notes there are many ways to create business value with data lakes. And it has become easier to do so. ‘A major advantage of cloud-native data lakes is the razor-sharp efficiency they bring. You don’t have to invest in hardware anymore, you don’t have to keep developers on your payroll to maintain your clusters and you only pay for what you use. Infrastructure costs for cloud-native solutions usually are only 10% of the total costs of a project. This means that a small increase in efficiency can rationalize a large increase in infrastructure costs. Needless to say, the threshold to use data analysis is lowered, but it remains important to know why you want to set up a cloud-native data lake and what the desired results are.”

Claim your free ticket for our upcoming event

Do you want to see Maurits explain when and how a cloud-native data lake can add business value? And would you like to enjoy the company of your peers and a free three course dinner while doing so? Join us January 30th 2020 during our next event: ‘How to build a cloud-native data lake & annual Election Night’. During this diner show, you will learn everything there is to know about cloud-native data lakes and can also enjoy the annual Cloud Architect of the Year election. There number of available tickets is limited so make sure to claim yours right now!

Month: January 2020

Cloud Architect of the Year awards 2020

Cloud Architect of the Year awards 2020

How to avoid getting stuck in a data swamp

The challenge of scattered data

Controlling accessible and useful data in a data lake

Learnings from real customer cases

The sense and nonsense of cloud-native data lakes

Nothing more than a database?

Know what to look for before you take a dive

Claim your free ticket for our upcoming event

SPONSORS

Veeam