This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Introduction to Conda for (Data) Scientists: Glossary

Key Points

Getting Started with Conda
  • Conda is a platform agnostic, open source package and environment management system.

  • Using a package and environment management tool facilitates portability and reproducibility of (data) science workflows.

  • Conda solves both the package and environment management problems and targets multiple programming languages. Other open source tools solve either one or the other, or target only a particular programming language.

  • Anaconda is not only for Python

Working with Environments
  • A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.

  • You create (remove) a new environment using the conda create (conda remove) commands.

  • You activate (deactivate) an environment using the conda activate (conda deactivate) commands.

  • You install packages into environments using conda install.

  • You should install each environment as a sub-directory inside its corresponding project directory

  • Use the conda env list command to list existing environments and their respective locations.

  • Use the conda list command to list all of the packages installed in an environment.

Using Packages and Channels
  • A package is a tarball containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.

  • A Conda channel is a URL to a directory containing a Conda package(s).

  • pip can be used to install Python packages into a Conda environment that are not packaged with Conda

Sharing Environments
  • Sharing Conda environments with other researchers facilitates the reprodicibility of your research.

  • Environment files explicitly describe your project’s software environment.

Managing GPU dependencies
  • Conda can be used to manage your key GPU dependencies.

  • Use conda search to identify which version of CUDA libraries are available.

  • For most projects you will not need NVCC and can use the cudatoolkit package from default channels.

  • If your project does need NVCC, try cudatoolkit-dev package or nvcc_linux-64 meta-package (requires separate NVIDIA CUDA Toolkit install).

Glossary

FIXME