The Imputation Service accepts input files that are stored either locally to the user's machine or in the cloud (specifically Google Cloud Storage). If a pipeline accepts more than one input file, all files must be either local or cloud-based.

Cloud Input File Requirements

Users must grant the service read-access to their cloud-based input files. This is done by granting the following entities read access (the Storage Legacy Object Reader role is sufficient) to the bucket containing those files:

broad-data-science-services@firecloud.org - this grants the Service access to your data
your Terra proxy group, which looks like PROXY_{user-specific-id}@firecloud.org and can be found on your Terra profile - this allows the Service to confirm that you have access to the data you're submitting (this protects you against others attempting to gain access to your data)

The service will check that both it and you have read access to each file or parent bucket before starting the job.

When submitting your job, the path provided for each file input should be the gsutil uri, i.e. in the format "gs://bucket-name/optional/sub/path/file-name"

Helpful hints

The read access described above can be revoked after the job is complete; note that the access must persist during the job, since the service does not copy your files into our own storage container but rather accesses them directly from your bucket
Because the service reads data out of your bucket, if your bucket location is not in us-central1, you will incur network data transfer fees. To minimize these fees, move your data to a bucket in us-central1.
If your input data is in a Terra workspace bucket, you only need to grant broad-data-science-services@firecloud.org read access to your workspace; you can skip the proxy group step.

Cloud Inputs

Cloud Input File Requirements

Helpful hints

Comments

Cloud Input File Requirements

Helpful hints

Related articles