Table of Contents

Introduction

The batch_context plugin relies on two types of configuration files in order to correctly handle the job batch.

  1. A Context Config File This is a single config text file that specifies local and remote directories and system communication. It is unique to the user and the project. A user needs to create a new one of these for every project. :so: Name the file to includes the project, user and execution type.
    (:b: contextconfig_local.cfg and contextconfig_remote.cfg )
  2. Batch Config Files These are config text files are are unique to each pipeline script, but do not need to be changed according to the user or project. These handle the data file identification, pipeline order, software and computing resources of each of the scripts. :so: It is recommended that these are named to mirror their corresponding scripts.
    (:b: c1_.scalpart.cfg is the config file for s1_scalpart.htb)

:so: Save you config files as text files, this will make them easy to view, edit and save outside of the batch_context user interface.
:b: All the configuration files are saved as .cfg and located in the analysis/support/config directory.

Context configuration files (*.cfg)

You will need a unique context config file for each project. To get started go to:

  • File ->Batch-> Context Configuration
    newbatchclipped

Editing

You can create a new or edit an old context config in this user interface. To load an existing config file click on the | Load Context Config | button. You will need to fill out a number of fields depending on whether you will be running the job locally or remotely. All jobs must fill out the local dependency fields at the top of the block.

contextedit

A detailed description of each of the fields can be found below:
:so: Set the matlab workspace as your project_name directory so your path names can be shorter.


Required for All Jobs


1. Log Path

  • This is the directory location of your log folder. The log folder is used to collect the intermediate scripts created and used for each file, as well as collecting any error messages.
    :b: Default set to the analysis/log directory.

2. Project Root Directory

  • This is the directory location of your project folder.
    :b: Default left as ' ' as you will have set the matlab workspace to your project_name directory.

3. Dependency Root Directory

  • This is the directory location of your project dependencies. These included EEGLAB and any plugins.
    :b: Default set to the analysis/support/dependencies directory.

Only Required for Remote Jobs


4. Username for the Remote Host

  • This will be your username for your remote host.
    :b: :warning: Default as user_name must be changed to your cluster login user name.

5. Host for Compute Execution

  • This is the computational system on which jobs will be submitted to the scheduler. :b: Default set to redfin.sharcnet.ca

6. Project Root Archive Address

  • This is the address to the archive copy of the project folder. Typically this is on a long term storage system that is not used for compute access but rather for backup of persistent files. On SHARCNET systems the /freezer system is appropriate for this kind of storage. :b: :warning: Default of dtn.sharnet.ca:/freezer/user_name/project_name needs to be adjusted for your project.

7. Project Root Work Address

  • This is the address to the working copy of the project folder. Typically this is on a short term storage system that has good performance with compute nodes. On SHARCNET systems this is typically the global access /work file system or the cluster specific /scratch systems. :b: :warning: Default of dtn.sharnet.ca:/work/user_name/project_name needs to be adjusted for your job.

8. Dependency Address

  • This is the address to the path where necessary files (e.g. the EEGLAB file distribution) are located for access by the compute nodes. Typically this folder is included within each project's file distribution but it can also be a central location shared among several projects.
    :b: :warning: Default of dtn.sharnet.ca:/work/user_name/project_name/analysis/support/dependencies needs to be adjusted for your job.

9. Archive Mount Directory

  • This is the local path where the remote archive directory can be mounted (e.g. using sshfs). Relative paths are prefixed with [local_project] or the current working directory (pwd) if [Local_project] is empty. :b: Default set to remote_archive

10. Work Mount Directory

  • This is the local path where the remote work directory can be mounted (e.g. using sshfs). Relative paths are prefixed with [local_project] or the current working directory (pwd) if [Local_project] is empty. :b: Default set to remote_work

11. Miscellaneous Locations

  • This field allows allows the user to add new location names and addresses. The inputs are expected to be key strings and address/paths of other places (e.g. '[external_backup],/media/user/external/backup/project_name]'). :b: Default left as ' '.

  1. System Variables System variables are custom system commands you can quickly make using the other context configuration variables.

These are strings that will have key strings swapped and then made available for editing before being passed to the "system" command for execution in the system terminal.

For more information visit the Mounting the Project on the Managing Remote Projects page.

:b: Default set as:
sshfs [remote_user_name]@[remote_project_archive] [cd]/[mount_archive]
sshfs [remote_user_name]@[remote_project_work] [cd]/[mount_work]
meld [local_project]/analysis [cd]/[mount_work]/analysis &
meld [local_project]/analysis [cd]/[mount_archive]/analysis [cd]/[mount_work]/analysis &
fusermount -u [cd]/[mount_archive]
fusermount -u [cd]/[mount_work]

Saving

Once you have filled out all of the fields click on the | Save As | button to save the file. :so: Name this file to includes the project and the user
(:b: contextconfig_local.txt and contextconfig_remote.txt can be found in the analysis/support/config directory). Now when running a job you can simply load this file and it will populate the required fields for you. You can also use this text file as a template to quickly edit and create new context config files in a text editor.

Batch Configuration Files (*.cfg)

There needs to be a unique batch config file created for each script in the pipeline. Batch config files are not dependent on the project or the user.
:so: Only create one local and one remote file for each script. To get started go to:

  • File -> Batch -> Batch Configuration
    newbatchclipped

Editing

You can create a new or edit and old batch config in this user interface. To load existing batch files click on the | Get Batch Config File Names | button. This interface allows you to look at, and edit, multiple config files at once. This helps you visualize the pipeline order, and ensure that every script you are using has a designated configuration file for it. If you are loading you can adjust the order of the files by changing the names of them in the the load window. If you do not change anything they will be loaded alphabetically. You will need to fill out a number of fields depending on whether you will be running the job locally or remotely.

editconfigbuttons

Batch Config Fields

  • Execution Function (exec_func)
    • Specifies the execution function this is generated from a list of installed execution function/scheduler plugins. See scheduler plugins for more details.
  • Replace String (replace_string)
    • User supplied string-swap variables. Uses the format of [key],value with each set separated by new lines. Some defaults are supplied and can be seen in builtin string-swaps
  • Order (order)
    • Specifies the running order of the scripts for each data file. Takes the form of '['a [b ...]']' where a is the current order number, b and any following are the jobs that a is dependent on.
  • Session Initialization (session_init)
    • Session Initialization is run once per history file in the submit script. This contains shell code the setups up the environment on the remote for the current job. We use this on the cluster to load modules (programs) and any other environment business.
  • Job Name (job_name)
    • Sets the name that is submitted to the scheduler. This can be used for reporting and monitoring
    • This field accepts [batch_dfn] and [batch_hfn] string swaps
  • Job Initialization (job_init)
    • Is inserted before each datafile in a specific submit.sh script. We use this to add LD_RUN_PATH to LD_LIBRARY_PATH before running Amica in this reference pipeline.
  • M file name (mfile_name)
    • Used for the Octave/Matlab script file name, also used as the log name. This allows us to make log and mfile names that are a specific as we need.
    • This field accepts [batch_dfn] and [batch_hfn] string swaps
  • Script Prefix / M File Init (m_init)
    • Is run in each m file that batch_context generates, used to setup environment specific settings and imports in Octave/Matlab
  • Submit Options / Custom scheduler options (submit_options)
    • Used for generic options to the scheduler, use this if you need to input scheduler or environment specific flags to the scheduler.
  • Memory (memory)
    • Computed value of memory allowance for the job. Uses variables c and sfor calculation. The variables c and s and the whole expression are calculated and assumed to be in 'g' or 'm' based on the last character in the field.
    • i.e. 100 + 30*c + s*0.01m Note that s*0.01 is used in typically reversed order to make it more clear that m is unrelated to the expression.
  • Time Limit (time_limit)
    • Similar to memory it is a computed value of time allowance. Uses variables c and s for calculation. The variable c and s and the whole expression are calculated and assumed to be in 's', 'm' or 'h' based on the last character in the field. i.e. 100 + 30*c + s*0.1m
  • MPI (mpi)
    • Specifies if we need to insert the special flags or strings to use mpi
  • Number of tasks (num_tasks)
    • Specifies the number of processes to use, this field does not specify that the processes will be packed onto on machine.
  • Threads Per Task (threads_per_task)
    • Specifies the number of threads per task. This will request additional cpus for each task/process.
  • Software (software)
    • Specifies if we are using Octave/Matlab or a script. Use Octave to set flags for Octave, leave as none if you want to use another executable file.
  • Program Options (program_options)
    • Arbitrary options to pass after the software field, often used for --traditional in Octave, we moved away from --traditional with out reference pipeline and instead added the specific options to the octave.minit due to bugs that exist in the current stable version of octave.

Saving

Once you have filled out all of the fields click on the | Save As | button to save. If you have multiple files open Ctrl - Click to select as many as you would like to save. These will not be combined into one file. It is recommended that the files are named to mirror their corresponding scripts as they used hand in hand. (ie. S1_Script1.htb and C1_Script1.cfg ) Now when running a job you can simply load this file and it will populate the required fields for you. You can also use the text file created as a template to quickly edit and create new batch config files in a text editor.


Updated/Verified by Brad Kennedy on 2017-08-08

:house: Return to Main Page