While Coiled works hard to take the challenges out of moving from local to distributed Cloud computing for data science with Python, that transition is still not entirely trivial, and it’s not uncommon to encounter issues as you learn to use Coiled.
Find here, some ideas for self-help with common issues, and some best practices for seeking help from Support Engineers at Coiled.
The Basics
Our Docs
It’s obvious, but has to be said the first place to start is with our docs. We do try to include the key information you need there. Please review this before looking for Support help.
Slack Etiquette
Please use Slack responsibly. Please use #coiled-cloud for most support questions. And, when you do ask questions, please try to use one thread per topic. That is, start a separate thread for separate topics, or as a follow-up when a thread is stale or very large, but otherwise please do not make many short posts on the same topic in separate threads - make a post, then follow up as replies in a thread there, not separate posts. Also please stay on topic, and be cordial and polite: Please do not post on non-Coiled related topics, avoid discussions of politics or religion, avoid hate speech, be respectful to all, and keep your language PG (safe for work). We reserve the right to delete posts at our sole discretion that unnecessarily clutter slack channels, or which are inappropriate.
Software Environments
Python--as you likely know if you’ve reached the point where you want to do distributed computing--relies upon hundreds, if not thousands of supplementary packages; notable among these are things like Pandas and Numpy. As is explained in our docs, it’s important for the software environments on your local machine (the Python client) and on the Coiled Cluster (the Dask scheduler and workers) to be aligned. And, of course, many Python packages are frequently updated.
Possibly the most common issue reported by users are clusters that fail to start, or where killed workers are observed, due to version mismatches. For instance, in January 2022 Pandas was updated from version 1.3.5 to 1.4.0. Supposing you were using a Coiled software environment from before the update, then created a new local environment after, you would have seen killed workers due to differences between 1.3.5 and 1.4.0.
Another common error is when a user creates a software environment on Coiled but omits dask as a dependency. We do not actually include that in our base containers, so it needs to be specified in your coiled software environments. You don’t actually have to specify Dask dependencies (be sure to include the conda-forge conda channel when using Conda), but if you need to pin something to a specific version you can by explicitly including it in your dependencies. It is possible to rely upon other packages you install to pull in dask as a dependency, but you can run into issues if there are (for instance) obsolete versions pinned in a dependency list. For example Coiled software environments, you may want to visit the Coiled software page or the Coiled Examples software page.
As noted above, in all of these be alert for updates in dependencies, or changes in what the packages you specify pull in. For instance, as of this writing (January 2022), dask is updated on an approximately two week cadence, normally overnight on Fridays.
Other software related ‘gotchas’ include things like a need to include the s3fs dependency if you want to read and write to s3 buckets (or gcfs for GCP). Other packages to be alert for include pyarrow and fastparquet if you want to work with parquet files.
Local/Distributed Data
Another thing that folks often overlook is that data available on their local machine is not automatically available to a Coiled Cluster. Generally, things work best if you keep the data you’re working with in a cloud storage facility like AWS s3 buckets (see our FAQ on this topic). There are ways to upload files from your local machine, but this is a more advanced topic and can be slow compared to accessing an s3 bucket, (or git repos for custom packages) particularly for large files.
Getting Support Help
When you do need to ask for help, Coiled does have a team of Support Engineers ready to help.
Read the Docs
As noted above, before coming to us for help, please do take time to read our docs, and to review the docs for any packages you’re using for information there on distributed computing and their interaction with Dask. We’re here to help with Coiled, and we try on other topics, but if you’re attempting to do a complex job like pulling in images from an external library, then doing machine learning on those via a relevant package, all while using an orchestration or automation package… It may be up to you in that case to sort out all the complexities (though, note, we do offer Enterprise packages that include in depth Dask programming support; please contact our sales folks for more information there).
Create an MRE
If at all possible, please consider creating a minimal reproducible example of your problem. That saves significant amounts of time in understanding what you’re trying to do, and lets us work to reprouce your problem locally with our full suite of diagnostic tools available.
Run Local Diagnostics & Provide Details
We can help you best when we have complete information about your issue and your local and Coiled configuration. Please consider sharing complete information on things like:
-
Your local software environment.
-
Any Coiled and client software environments you’re using.
-
conda list output from your client software environment, and
coiled.get_software_info("<your-env-name>")
can be helpful
-
-
Any custom or specialized packages you’re attempting to use.
-
Code snippets for:
-
Software (local and Coiled) Environment creation.
-
Cluster instantiation
-
Where your error occurs
-
-
Error Tracebacks, and times the errors occured (so we can look in our logs).
-
Cloud provider and repository information (but not your credentials!!).
-
Output from
coiled.get_notifications()
andcoiled.diagnostics()
.
Wrap-up
Thanks for reading! This can help us help you better.
Comments
0 comments
Please sign in to leave a comment.