Using Google Cloud Storage in your Django Project
A few tips on some simple and not-so-simple ways to use Google Cloud Storage in a Django Project
We also had an additional complication: we needed two separate storage buckets, one for public assets, and one for private ones. For context, a storage bucket in GCS is just the container for the data you are saving there (read more about buckets).
Let's take a look at a way to set up GCS in a Django project, along with a way to go about implementing multiple bucket storage.
My use case #
I needed two buckets; one public, one private. The public bucket would be used for most things in the app: static images, uploaded files, etc. The private bucket would be used as a way to store files that drive some data visualizations. These files are large (1-2 GB) CSV files, not something users would need to see or download.
The way I solved this is with django-storages. This is a package meant to handle cloud storage for a variety of cloud providers, like Google, Amazon, and Microsoft. We'll be looking at some GCS-specific scenarios, but the ideas are fairly translatable between those three cloud providers.
Basic GCS Django setup #
Before tackling multiple buckets, here is how you could set up baseline GCS storage with
- Install the package.
pip install django-storages[google]
- Add the necessary and/or helpful settings to your settings file (YMMV with what settings you need).
from google.oauth2 import service_account # ... GS_BUCKET_NAME = "YOUR BUCKET NAME" DEFAULT_FILE_STORAGE = "storages.backends.gcloud.GoogleCloudStorage" MEDIA_URL = "URL.to.GCS" GS_CREDENTIALS = service_account.Credentials.from_service_account_file( "path/to/credentials.json" ) GS_EXPIRATION = timedelta(minutes=5) GS_BLOB_CHUNK_SIZE = 1024 * 256 * 40 # Needed for uploading large streams, entirely optional otherwise
- To break down these settings
GS_BUCKET_NAMEis the name of your primary bucket in GCS.
DEFAULT_FILE_STORAGEis the class that is used by Django when storing almost anything.
MEDIA_URLallows Django to know where to look for stored files.
GS_CREDENTIALSis a variable for
django-storagesto allow it to access your
credentials.jsonfile that you get from GCS.
GS_EXPIRATIONis a time value for how long a generated URL is valid for. The default is a day; however we were tasked with shortening it to five minutes, so that URLs to uploaded PDFs or similar could not be sent around to anyone.
- This setting is needed only for non-public buckets in GCS that want signed URLs. I mentioned that we had a "public" and "private" bucket, but in terms of GCS settings, neither bucket will be "public". That way, only users that are authenticated in our system will be able to generate a signed URL to see uploaded files. This contrasts our actually "private" bucket in that no non-admin user will be able to access files in that bucket.
GS_BLOB_CHUNK_SIZEis needed when uploading large files. See the docs for both django-storages and GCS for more information on chunk size.
After these settings and the package are installed, you should be ready to use GCS.
# the following code is from the django-storages docs >>> from django.db import models >>> class Resume(models.Model): ... pdf = models.FileField(upload_to='pdfs') ... photos = models.ImageField(upload_to='photos') ... >>> resume = Resume() >>> print(resume.pdf.storage) <storages.backends.gcloud.GoogleCloudStorage object at ...>
Back to the problem at hand: multiple buckets #
Support for multiple buckets is not something that is necessarily built into
django-storages. To give a little more context, we needed a
FileField on only one model to go to a different bucket. Every other
FileField instance should go to the default bucket.
- Add another setting to your settings file.
PRIVATE_GS_BUCKET_NAME = "other-bucket-name"
- Create a class that subclasses off of
from django.utils.deconstruct import deconstructible from django.conf import settings from storages.backends.gcloud import GoogleCloudStorage @deconstructible class PrivateGCSMediaStorage(GoogleCloudStorage): def __init__(self, *args, **kwargs): kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME") super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
- Go to your model file, and add a new argument to the
from myfile import PrivateGCSMediaStorage class Upload(models.Model): csv_file = models.FileField( storage=PrivateGCSMediaStorage, # any other settings... )
- Then, just run the commands to make and apply migrations and you should be set!
python manage.py makemigrations python manage.py migrate
Breaking it down #
- But hold on, what did that all accomplish? Let's break down the custom class.
- This line is a decorator that adds a
deconstructmethod to the class, allowing it to be serialized and used in Django migrations. Read more about this process.
- We are subclassing off of the class provided to use by
def __init__(self, *args, **kwargs): kwargs["bucket_name"] = getattr(settings, "PRIVATE_GS_BUCKET_NAME") super(PrivateGCSMediaStorage, self).__init__(*args, **kwargs)
This is overriding
GoogleCloudStorage's initialization method. All we are doing is providing it a custom "bucket_name" attribute, which we are setting as our private bucket's name. That way, when a file is stored through this storage class, it will be stored to that separate bucket.
- Note that
GoogleCloudStoragegets its bucket name from the required
- Note that
Then, we can pass the class reference itself to the
storageargument on the
FileField. The docs on that tell us that we can use a storage object, or a callable which returns a storage object. This will be very useful if, locally, you don't want files to be stored in two separate buckets. You'd rather them be stored the default way, all in one bucket.
Here is how you'd go about that #
- Set a flag in your settings
USE_PRIVATE_STORAGE = False # will be set to True in production or wherever
- Then, add this function under your
PrivateGCSMediaStorageclass (or wherever you want).
from django.core.files.storage import default_storage def select_storage(): return PrivateGCSMediaStorage() if settings.USE_PRIVATE_STORAGE else default_storage
- This way, the model
default_storageif that setting is set to
FileFielddefaults to when you do not provide a
storagekeyword argument, so all models will use the same storage type.
- Change your
class Upload(models.Model): csv_file = models.FileField( storage=select_storage, # any other settings... )
- Rerun migrations just like above and you will see something similar to this in the generated migration.
operations = [ migrations.AlterField( model_name='upload', name='csv_file', field=models.FileField(storage=your.project.path.select_storage), ), ]
- This therefore allows all files uploaded to the
Uploadmodel to be stored in our secondary GCS bucket, while every other file field will go to the default GCS bucket.
That's it! #
This could naturally be extended to allow for any number of additional buckets. The flexibility of allowing for multiple buckets with differing levels of security could be incredibly helpful with hiding certain information away from the users on the cloud-storage level. Additionally, a similar version of this is possible with the AWS and Microsoft Azure implementations of
django-storages, where you can have multiple S3 buckets or Azure Blobs with similar security constaints. Best of luck with your Django projects!