Data Lake vs Data Warehouse
What's the difference anyway?
So I was recently helping a buddy of mine prepare for an interview. He hasn't had a ton of cloud experience, so I was showing him what I know. I made an offhand reference to a data lake and he asked something along the lines of, "Is that like a data warehouse?"
I got to break out my favorite analogy
A data lake is where you go fishing. A data warehouse is where you can find exactly the thing you're looking for.
That is to say, a data lake is where we would dump all of the raw, unrefined data points to something like a NoSQL repository. Maybe a data point (sales data for a particular product on a particular day, for example) exists in the lake and maybe it doesn't. Then some clever person in the Business Intelligence department fishes through the data and transforms it into something more usable for other purposes, most likely a report of some kind. We don't know what we might find, but if it exists, it's probably in there, at least for n months/days/years.
If those sales records exist, they can be found in the lake. Grab your rod and let's see what we can pull in.
Conversely, the data warehouse is where we have our vetted information that is good to go, we've shaved the yak and situated everything in an easy to find, easy to consume manner. This is the Final Word(tm) on how much we made in August of 2016. We expect data in a certain place in a certain way in a data warehouse. Like finding the Kallax at Ikea.
That sales report is on Aisle 12, Section 4, Shelf 2.
- My buddy landed the job, so huzzah for him!
- He recently sent me a Data lake and data warehouse – know the difference (sas.com) which is a slightly more wordy high-level description of what I've written here.
Software Development Nerd