Make Remote Files Local with Ruby Tempfile

Lawson Kurtz, Former Senior Developer

Article Category: #Code

Posted on

We live in the age of remote resources. It's pretty rare to store uploaded files on the same machine as your server process. File storage these days is almost completely remote, and for very good reasons.

Using file storage services like S3 is awesome, but not having your files accessible locally can complicate the performance of file-oriented operations. In these cases, use Ruby's Tempfile to create local files that live just long enough to satisfy your processing needs.

Anatomy of Tempfile#new

 # What you want the tempfile filename to start with
name_start = 'my_special_file'

# What you want the tempfile filename to end with (probably including an extension)
# by default, tempfilenames carry no extension
name_end = '.gif'

# Where you want the tempfile to live
location = '/path/to/some/dir'

# Options for the tempfile (e.g. which encoding to use)
options = { encoding: Encoding::UTF_8 }

# Will create a tempfile
# at /path/to/some/dir/my_special_file_20140224-1234-abcd123.gif
# (where '20140224-1234-abcd123' represents some unique timestamp & token)
# with a UTF-8 encoding
# with the contents 'Hello, tempfile!'
Tempfile.new([name_start, name_end], location, options) do |file|
 file.write('Hello, tempfile!')
end

Example Application

URL to Tempfile: Remote File Processing

We have a service that takes a URL and processes the file it represents using a Java command-line utility. Our command-line utility expects a filepath argument, so we must create a local file from the remote resource before processing.

 class LocalResource
 attr_reader :uri

 def initialize(uri)
 @uri = uri
 end

 def file
 @file ||= Tempfile.new(tmp_filename, tmp_folder, encoding: encoding).tap do |f|
 io.rewind
 f.write(io.read)
 f.close
 end
 end

 def io
 @io ||= uri.open
 end

 def encoding
 io.rewind
 io.read.encoding
 end

 def tmp_filename
 [
 Pathname.new(uri.path).basename,
 Pathname.new(uri.path).extname
 ]
 end

 def tmp_folder
 # If we're using Rails:
 Rails.root.join('tmp')
 # Otherwise:
 # '/wherever/you/want'
 end
end

def local_resource_from_url(url)
 LocalResource.new(URI.parse(url))
end

# URL is provided as input
url = 'https://s3.amazonaws.com/your-bucket/file.gif'

begin
 # We create a local representation of the remote resource
 local_resource = local_resource_from_url(url)

 # We have a copy of the remote file for processing
 local_copy_of_remote_file = local_resource.file

 # Do your processing with the local file
 `some-command-line-utility #{local_copy_of_remote_file.path}`
ensure
 # It's good idea to explicitly close your tempfiles
 local_copy_of_remote_file.close
 local_copy_of_remote_file.unlink
end

Tempfiles vs Files

Ruby Tempfile objects act almost identically to regular File objects, but have a couple of advantages for transient processing or uploading tasks:

  • Tempfiles' filenames are unique, so you can put them in a shared tmp directory without worrying about name collision.
  • Tempfiles' files are deleted when the Tempfile object is garbage collected. This prevents a bunch of extra files from accidentally accumulating on your machine. (But you should of course still explicity close Tempfiles after working with them.)

Common Snags

Rewind

Certain IO operations (like reading contents to determine an encoding) move the file pointer away from the start of the IO object. In these cases, you will run into trouble when you attempt to perform subsequent operations (like reading the contents to write to a tempfile). Move the pointer back to the beginning of the IO object using #rewind.

 io_object = StringIO.new("I'm an IO!")
encoding = io_object.read.encoding

# The pointer is now at the end of 'io_object'.
# When we read it again, the return is an empty string.
io_object.read
# => ""

# But if we rewind first, we can then read the contents.
io_object.rewind
io_object.read
# => "I'm an IO!"

Encoding

Often you'll need to ensure the proper encoding of your tempfiles. You can provide your desired encoding during Tempfile initialization as demonstrated below.

 encoding = Encoding::UTF_8

Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
 # Your code here...
end

Obviously your desired encoding won't always be the same for every file. You can find your desired encoding on the fly by sending #encoding to your file contents string. Or if you're using an IO-object, you can call io.object.read.encoding.

 encoding = file_contents_string.encoding
# or
# encoding = io_object.read.encoding

Tempfile.new('some-filename', '/some/tmp/dir', encoding: encoding).tap do |file|
 # Your code here...
end

Read more about Ruby encoding.

Extensions

By default, files created with Tempfile.new will not carry an extension. This can pose problems for applications or tools (like Carrierwave and soffice) that rely on a file's extension to perform their operations.

In these cases, you can pass an extension to the Tempfile initialization as demonstrated above in Anatomy of Tempfile#new.

 # A quick refresher
Tempfile.new(['file_name_prefix', '.extension'], '/tmp')

If you need to dynamically determine your file's extension, you can usually grab it from the URL or file path you are reading into your Tempfile:

 uri = URI.parse('https://example.com/some/path/to/file.gif')
path = '/some/path/to/file.gif'

Pathname.new(uri.path).extname
# => '.gif'

Pathname.new(path).extname
# => '.gif'

Local Development (Paths vs URLs)

Many developers use local file storage for their development environment. In these cases, local file paths often appear in methods that are expecting URLs. Not fun.

OpenURI to the Rescue

If you need to write code that supports reading files from both file paths and URLs, OpenURI is your saviour.

OpenURI is accessible via the Kernel function open, which provides a file-like API for both local and remote resources.

 open('/path/to/your/local/file.gif') do |file|
 # Your code here...
end
 open('https://s3.amazonaws.com/your-bucket/file.gif') do |file|
 # Your code here...
end

We like Ruby Tempfiles for performing file-oriented operations on remote resources. What do you use?

Thanks to Ryan Foster for his contributions to the sample code.

Related Articles