Because Coiled runs in the cloud, the easiest way to get large datasets into Coiled is to store your data in the cloud: for example, by using S3 on AWS or Google Cloud Storage (GCS) on GCP. If you run your Coiled clusters in your own cloud account, you can also minimize the data transfer costs.
NOTE: Don't forget to add the appropriate Python storage library to your environment: s3fs for S3, gcsfs for GCS, etc.
When you run computations on Dask clusters managed by Coiled, you can access many different file formats using the typical approaches used by Dask, Python, and related libraries.
Comments
0 comments
Please sign in to leave a comment.