Amazon
Defining Amazon S3 Bucket and Path Names for Data Lakes
Pages
67
Time to read
78 mins
Publication
Language
English
Pages
67
Time to read
78 mins
Publication
Language
English
This guide provides a framework for establishing a consistent naming standard for Amazon Simple Storage Service (Amazon S3) buckets and paths within data lakes hosted on the AWS Cloud. It outlines the importance of a naming standard in enhancing governance and observability of data lakes, as well as in identifying costs associated with different data layers and AWS accounts. The guide recommends the use of at least three distinct data layers: the raw data layer for initial data ingestion, the stage data layer for processed data optimized for consumption, and the analytics data layer for aggregated data in a ready-to-use format. Additionally, it discusses the necessity of using a landing zone for sensitive data and provides guidance on mapping S3 buckets to AWS Identity and Access Management (IAM) policies. The intended audience includes data architects, data engineers, and solutions architects who are tasked with setting up data lakes on AWS, emphasizing the need to adapt the recommendations to align with organizational policies.