A data lake is a centralized place, like a lake, that allows you to hold a lot of raw data in its native format, structured and unstructured, at any scale. You can store your data as-is, without having to first structure the data or define it until its needed. It can then be used for creating reporting dashboards and visualizations, real-time analytics, and machine learning to guide better programmatic advertising decisions.
As data grows and diversifies, many marketing and especially digital strategy teams are finding that traditional methods of collecting data are becoming outdated and are pushing for something more centralized like a data lake. According to Aberdeen research done in September 2017, the average company is seeing the volume of their data grow at a rate that exceeds 50% per year. Additionally, these companies are managing an average of 33 unique data sources, according to the research study. With data split into silos by team, like search, social or direct marketing, CMOs are being challenged with how to efficiently manage the analysis for their media campaigns. If they don’t consolidate their data, they risk targeting the same consumer more than once or even exposing them to the wrong message.
Why Do You Need a Data Lake?
Most data platforms will only store data if it’s been formatted to fit a particular structure, like rows and columns. So unstructured data like log files, data from click-streams, social media, and internet connected devices typically can’t be uploaded into a data platform until the data has been defined. A Data Lake allows you to import all marketing data in real-time, from multiple sources and in its original format. It also allows you to scale data of any size. Then you can figure out how to use it in an automatic yet personalized way to attract and retain customers through digital advertising. Companies like Digilant can help you set up a Data Lake and use it for media activation.
What is the difference between a data lake and a Demand Management Platform (DMP)?
If you are a digital marketer, a Data Lake allows companies to collect PII data (Personally Identifiable Information), which DMPs do not. A DMP’s is main function is the collection of cookie data for media audience activation where a Data Lake is often the first step used by data scientists to expand the knowledge of the DMP. The DMP often connects directly to the media activation tool which for programmatic is most likely a DSP (Demand Side Platform). A DMP will establish connections between several external data providers, and the data lake then supplements it with new internal data like social media feeds or connected device data.
Four Main Advantages to Having a Data Lake
Data Lakes allow you to store relational data (a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables.) —operational databases (data collected in real-time), and data from line of business applications, and non-relational data like mobile apps, connected devices, and social media. They also give you the ability to understand what data is in the lake through crawling, cataloging, and indexing of data.
2. ANALYTICS
Data Lakes allow data scientists, data developers, and operations analysts to access data with their choice of analytic tools and frameworks. This also includes open source data frameworks such as Apache Hadoop, Presto, and Apache Spark, and commercial offerings from data warehouse and business intelligence vendors. Data Lakes allow you to run Analytics without the need to move your data from one system to another.
3. MACHINE LEARNING
4. IMPROVED CUSTOMER INTERACTIONS

In Summary
Marketers and Media Buyers would want to implement a data lake for three main reasons. First, they want to take advantage of more advanced and sophisticated analytical tools and dashboards, using a more complex and diverse foundation of information. Secondly, they also want to make traditional activities — like data access and speed of retrieval — more efficient and easier to accomplish. The third reason is they want to bring all the data from the different parts of the organization into one place creating efficiencies of time as well as cost savings. While not every company succeeds at achieving all three objectives simultaneously, the most effective ones will able to see positive results on their ability to make better programmatic media buying decisions.