Blog by Divebell

A red bucket from which symbols of unstructured data are flowing out

Protecting Your S3 Data: Is Amazon Macie Really Your Best Option?

Vikram Shrowty
Jul 6, 2022

Blob stores like S3 tend to be a convenient dumping ground for all kinds of data. Unsurprisingly, vast quantities of data accumulate in them very quickly. The data typically includes log files, backup archives, documents, images, and big-data files like Parquet and Avro. Knowing what kind of sensitive data resides amidst all this — and if it is being used in a compliant manner — can be daunting.

To add to this, it isn’t easy to ascertain if all the content in an S3 bucket is adequately secured. Unlike most repositories with ACLs and permissions, S3 access can be controlled by free-form policies that are not amenable to standard entitlements-and-permissions audit processes.

To tackle these challenges, Amazon launched Macie a few years ago as a data protection system for S3 buckets. As with several security products launched by cloud vendors, it has been my experience that while these may allow you to to check “yes” on the compliance assessment form, they are found to be wanting when it comes to protecting your data.

I don’t make this assertion lightly. Here are the key reasons for my assertion:  

1. The Heavy Lifting is Left to You

Yes, Amazon Macie detects sensitive data — but that’s it. It has no concept of policies, consent, legitimate and illegitimate uses of data, or remediation workflows. Users are handed a huge pile of findings and left to fend for themselves. The result: Most users do nothing.

2. Limited File Formats Supported

It merely supports about a dozen file formats. This is woefully inadequate considering there are hundreds of file formats that can house sensitive enterprise data.

3. Zero Visibility into Image Files

Many companies have sensitive data such as passports, driving licenses, and scans of paper documents stashed away in their S3 buckets. Amazon Macie does not support image files and provides no visibility into them.

4. Not a Comprehensive Solution

It only works with S3. This may feel like an unfair criticism, considering that Amazon Macie was built for S3. But in the real world, S3 is just one of the places that a company's data needs to be monitored. Amazon Macie users therefore need to buy, configure, maintain, and monitor additional products and solutions for other data storages such as Google Drive, Slack, Azure Blob Store, Google Cloud Storage, and RDS, to name a few.

5. The ‘Big Data’ Issue

It has limited support for big data file formats like Parquet and Avro. Data-savvy companies often build data lakes on S3 that contain terabytes of such data. Protecting this data entails a scanning strategy that incorporates data lake topology and partitions. Again this is an area where Amazon Macie can be found wanting, leading to an even bigger pile of “findings.”

6. DSAR Support

Many new privacy regulations like the EU’s GDPR or California’s CPRA mandate that you respond to “Data Subject Access Requests” (DSARs) — customers or employees can request that you locate, and/or delete and rectify data about them. These time-consuming DSARs need to be completed within a specific time period. To be compliant with the law, you’ll need to be able to fulfill this requirement across the data in your S3 buckets. Amazon Macie is of little help here.

7. Ballooning Costs

It gets very expensive, very quickly. If the lackluster capabilities don't turn you off Amazon Macie, your AWS bill certainly will!

However, I’ve got good news: We’ve built a world-class platform to protect your data that will not only handle everything in your S3 buckets, but also data stored just about anywhere else. So, if you are looking to protect your data, you will be better off with a solution like Divebell. Meanwhile, if you have a differing viewpoint about Amazon Macie and my conclusions — I would love to hear from you! My intent here is to foster a discussion about privacy and technology related to data protection. Good data protection is good for everyone. Let’s talk!

Any opinons expressed here and statements made are not legal advice, nor representations or warranties, and are intended to promote discussion around technology and data protection.

Contact Us