Using Google Cloud Storage in your Django Project

Danny Brown, Developer

Article Categories: #Code, #Back-end Engineering

Posted on

A few tips on some simple and not-so-simple ways to use Google Cloud Storage in a Django Project

Earlier this year, we were tasked with implementing Google Cloud Storage (GCS) on a Django project. GCS would be used to store uploaded images and static assets.

We also had an additional complication: we needed two separate storage buckets, one for public assets, and one for private ones. For context, a storage bucket in GCS is just the container for the data you are saving there (read more about buckets).

Let's take a look at a way to set up GCS in a Django project, along with a way to go about implementing multiple bucket storage.

My use case

I needed two buckets; one public, one private. The public bucket would be used for most things in the app: static images, uploaded files, etc. The private bucket would be used as a way to store files that drive some data visualizations. These files are large (1-2 GB) CSV files, not something users would need to see or download.

The way I solved this is with django-storages. This is a package meant to handle cloud storage for a variety of cloud providers, like Google, Amazon, and Microsoft. We'll be looking at some GCS-specific scenarios, but the ideas are fairly translatable between those three cloud providers.

Basic GCS Django setup

Before tackling multiple buckets, here is how you could set up baseline GCS storage with django-storages.

  • Install the package.
pip install django-storages[google]
  • Add the necessary and/or helpful settings to your settings file (YMMV with what settings you need).
from google.oauth2 import service_account
    
# ...
    
GS_BUCKET_NAME = "YOUR BUCKET NAME" 
    
DEFAULT_FILE_STORAGE = "storages.backends.gcloud.GoogleCloudStorage"
    
MEDIA_URL = "URL.to.GCS"

GS_CREDENTIALS = service_account.Credentials.from_service_account_file(
    "path/to/credentials.json"
)
GS_EXPIRATION = timedelta(minutes=5)
    
GS_BLOB_CHUNK_SIZE = 1024 * 256 * 40 # Needed for uploading large streams, entirely optional otherwise
  • To break down these settings
    • GS_BUCKET_NAME is the name of your primary bucket in GCS.
    • DEFAULT_FILE_STORAGE is the class that is used by Django when storing almost anything.
    • MEDIA_URL allows Django to know where to look for stored files.
    • GS_CREDENTIALS is a variable for django-storages to allow it to access your credentials.json file that you get from GCS.
    • GS_EXPIRATION is a time value for how long a generated URL is valid for. The default is a day; however we were tasked with shortening it to five minutes, so that URLs to uploaded PDFs or similar could not be sent around to anyone.
      • This setting is needed only for non-public buckets in GCS that want signed URLs. I mentioned that we had a "public" and "private" bucket, but in terms of GCS settings, neither bucket will be "public". That way, only users that are authenticated in our system will be able to generate a signed URL to see uploaded files. This contrasts our actually "private" bucket in that no non-admin user will be able to access files in that bucket.
    • GS_BLOB_CHUNK_SIZE is needed when uploading large files. See the docs for both django-storages and GCS for more information on chunk size.

After these settings and the package are installed, you should be ready to use GCS.

# the following code is from the django-storages docs
>>> from django.db import models
>>> class Resume(models.Model):
...     pdf = models.FileField(upload_to='pdfs')
...     photos = models.ImageField(upload_to='photos') ...
>>> resume = Resume()
>>> print(resume.pdf.storage)
<storages.backends.gcloud.GoogleCloudStorage object at ...>

Back to the problem at hand: multiple buckets

Support for multiple buckets is not something that is necessarily built into django-storages. To give a little more context, we needed a FileField on only one model to go to a different bucket. Every other FileField instance should go to the default bucket.

  • Add another setting to your settings file.
PRIVATE_GS_BUCKET_NAME = "other-bucket-name"
  • Create a class that subclasses off of django-storages storage class.
from django.utils.deconstruct import deconstructible
from django.conf import settings
from storages.backends.gcloud import GoogleCloudStorage
    
@deconstructible
class PrivateGCSMediaStorage(GoogleCloudStorage):
    def __init__(self, *args, **kwargs):
        kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME")
        super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
  • Go to your model file, and add a new argument to the FileField constructor.
from myfile import PrivateGCSMediaStorage
    
class Upload(models.Model):
    csv_file = models.FileField(
        storage=PrivateGCSMediaStorage,
        # any other settings...
    )
  • Then, just run the commands to make and apply migrations and you should be set!
python manage.py makemigrations
python manage.py migrate

Breaking it down

  • But hold on, what did that all accomplish? Let's break down the custom class.
@deconstructible
  • This line is a decorator that adds a deconstruct method to the class, allowing it to be serialized and used in Django migrations. Read more about this process.
class PrivateGCSMediaStorage(GoogleCloudStorage):
  • We are subclassing off of the class provided to use by django-storages.
def __init__(self, *args, **kwargs):
    kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME")
    super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
  • This is overriding GoogleCloudStorage's initialization method. All we are doing is providing it a custom "bucket_name" attribute, which we are setting as our private bucket's name. That way, when a file is stored through this storage class, it will be stored to that separate bucket.

    • Note that GoogleCloudStorage gets its bucket name from the required django-storages settings variable, GS_BUCKET_NAME.
  • Then, we can pass the class reference itself to the storage argument on the FileField. The docs on that tell us that we can use a storage object, or a callable which returns a storage object. This will be very useful if, locally, you don't want files to be stored in two separate buckets. You'd rather them be stored the default way, all in one bucket.

Here is how you'd go about that

  • Set a flag in your settings
USE_PRIVATE_STORAGE = False # will be set to True in production or wherever
  • Then, add this function under your PrivateGCSMediaStorage class (or wherever you want).
from django.core.files.storage import default_storage

def select_storage():
    return PrivateGCSMediaStorage() if settings.USE_PRIVATE_STORAGE else default_storage
  • This way, the model Upload will use default_storage if that setting is set to False. default_storage is what FileField defaults to when you do not provide a storage keyword argument, so all models will use the same storage type.
  • Change your FileField like so
class Upload(models.Model):
    csv_file = models.FileField(
        storage=select_storage,
        # any other settings...
    )
  • Rerun migrations just like above and you will see something similar to this in the generated migration.
operations = [
    migrations.AlterField(
        model_name='upload',
        name='csv_file',
        field=models.FileField(storage=your.project.path.select_storage),
    ),
]
  • This therefore allows all files uploaded to the Upload model to be stored in our secondary GCS bucket, while every other file field will go to the default GCS bucket.

That's it!

This could naturally be extended to allow for any number of additional buckets. The flexibility of allowing for multiple buckets with differing levels of security could be incredibly helpful with hiding certain information away from the users on the cloud-storage level. Additionally, a similar version of this is possible with the AWS and Microsoft Azure implementations of django-storages, where you can have multiple S3 buckets or Azure Blobs with similar security constaints. Best of luck with your Django projects!

Danny Brown

Danny is a developer in the Falls Church, VA, office. He loves learning new technology and finding the right tool for each job.

More articles by Danny

Related Articles