How would you decrease the amount of storage needed for storing e-mail messages for Gmail?

  Google
Add Your Answer
Answers (1)

Clarify question

Before going to brainstorm solutions to reduce storage of Gmail, can I ask some clarifying questions?

  • Are there any constraints for the solutions? (Assuming the answer is no)

 

Cause

I think there are many reasons why Google Mail storage has increased significantly time by time.

  • The first reason is all emails in Gmail lasting forever and never being deleted. As I know Google Mail does marketing that their mails will never be ended regardless of any reason.

  • The second reason is there are a large number of spam emails sent every hour which is much more than the important mails including promotion, social network, …. Everyday, people receive a myriad of spam emails from different sources like social networks (facebook, twitter, linkedIn, …), web/app they use gmail account to register for, promotion they received from different brands and products, …

  • There are many gmail users who don’t use gmail very often. There are many people creating gmail to do registration or leaving accounts created when they were still children.

  • A large number of media (photos, videos, …) is uploaded directly to Gmail storage. There are many situations where the same media data is stored in different mails because of some reasons. It unnecessarily occupies a huge space in our storage because the size of media data is much bigger than text data.

 

Solution

Therefore, some solutions coming to my minds are that:

 

Description

Pros

Cons

Limit the storage time of spam

The system will limit the lifetime of spam emails being unread to 2 years, because I assume that after reviewing data with DS, we see that most people wouldn’t review their spam mail after 2 years.

Save storage

Some not-spam emails are also deleted when they were incidentally put into the spam category.

=> Enhance the performance of spam detector

Handle the overlapping media

We will detect and remove the duplicate media in our database. For example, if 5 pictures are totally the same according to our detecting module, the system will keep only one and delete the others. Then we will use the existing one for all emails containing deleted pictures.

Save storage

Make sure the detecting module performs precisely to avoid removing different photos.

Compressing all mails of not regularly active users

Our system can save the storage by applying the compression algorithms to emails of gmail users who haven’t used gmail for a specific period of time, 2 years for example. Compressing data can help storage save a lot of memories

Save storage

Compressing data makes the time for extracting and query data takes longer.

But we can handle this problem by starting extracting when users access their gmail after a long time and have to verify their authentication.

From my point of view, we can combine all methods together to reduce the redundant data in Gmail storage.

The first solution we should use is to limit the storage time of spam because its cons don’t matter a lot, and we can test by solf-deleting and see the feedback from customers. If some bad things happen, we can reverse the mail we deleted.

Next, handling overlapping data would help Google save a lot of spaces for storing their data related to email. But we should make sure that our detecting overlapping module peforms correctly.

We can do the A/B testing to see the behavior of not regularly active users before applying in large scale.