API details

Overview

Here's what fastgpu does:

  1. poll to_run
  2. find first file
  3. check there's an available worker id
  4. move it to running
  5. handle the script
    1. create lock file
    2. redirect stdout/err to out
    3. run it
    4. when done, move it to complete or failed
    5. unlock

For demonstrating how to use fastgpu, we first create a directory to store our scripts and outputs:

path = Path('data')

setup_dirs[source]

setup_dirs(path)

Create and return the following subdirs of path: to_run running complete fail out

These are all the subdirectories that are created for us. Your scripts go in to_run.

path_run,path_running,path_complete,path_fail,path_out = setup_dirs(path)

Let's create a scripts directory with a couple of "scripts" (actually symlinks for this demo) in it.

def _setup_test_env():
    shutil.rmtree('data')
    res = setup_dirs(path)
    os.symlink(Path('test_scripts/script_succ.sh').absolute(), path_run/'script_succ.sh')
    os.symlink(Path('test_scripts/script_fail.sh').absolute(), path_run/'script_fail.sh')
    (path_run/'test_dir').mkdir(exist_ok=True)
_setup_test_env()

Helper functions for scripts

These functions are used to find and run scripts, and move scripts to the appropriate subdirectory at the appropriate time.

find_next_script[source]

find_next_script(p)

Get the first script from p (in sorted order)

test_eq(find_next_script(path_run).name, 'script_fail.sh')
assert not find_next_script(path_complete)

safe_rename[source]

safe_rename(file, dest)

Move file to dest, prefixing a random uuid if there's a name conflict

class ResourcePoolBase[source]

ResourcePoolBase(path)

Base class for locked access to list of idents

This abstract class locks and unlocks resources using lockfiles. Override all_ids to make the list of resources available. See FixedWorkerPool for a simple example and details on each method.

class FixedWorkerPool[source]

FixedWorkerPool(worker_ids, path) :: ResourcePoolBase

Vends locked access to fixed list of idents

The simplest possible ResourcePoolBase subclass - the resources are just a list of ids. For instance:

_setup_test_env()
wp = FixedWorkerPool(L.range(4), path)

ResourcePoolBase.unlock[source]

ResourcePoolBase.unlock(ident)

Remove lockfile for ident

If there are no locks, this does nothing:

wp.unlock(0)

ResourcePoolBase.find_next[source]

ResourcePoolBase.find_next()

Finds next available resource, or None

Initially all resources are available (unlocked), so the first from the provided list will be returned:

test_eq(wp.find_next(), 0)

ResourcePoolBase.lock[source]

ResourcePoolBase.lock(ident, txt='locked')

Create lockfile for ident

After locking the first resource, it is no longer returned next:

wp.lock(0)
test_eq(wp.find_next(), 1)

ResourcePoolBase.lock_next[source]

ResourcePoolBase.lock_next()

Locks an available resource and returns its ident, or None

This is the normal way to access a resource - it simply combines find_next and lock:

wp.lock_next()
test_eq(wp.find_next(), 2)

ResourcePoolBase.run[source]

ResourcePoolBase.run(*args, **kwargs)

Run script using resource ident

_setup_test_env()
wp = FixedWorkerPool(L.range(4), path)
_setup_test_env()
f = find_next_script(path_run)
wp._run(f, 0)

test_eq(find_next_script(path_run), path_run/'script_succ.sh')
test_eq((path_out/'script_fail.sh.exitcode').read_text(), '1')
assert (path_fail/'script_fail.sh').exists()

ResourcePoolBase.poll_scripts[source]

ResourcePoolBase.poll_scripts(poll_interval=0.1, exit_when_empty=True)

Poll to_run for scripts and run in parallel on available resources

_setup_test_env()
wp.poll_scripts()

assert not find_next_script(path_run), find_next_script(path_run)
test_eq((path_out/'script_fail.sh.exitcode').read_text(), '1')
test_eq((path_out/'script_succ.sh.exitcode').read_text(), '0')
assert not (path_run/'script_fail.sh').exists()
assert (path_fail/'script_fail.sh').exists()
assert (path_complete/'script_succ.sh').exists()
test_eq((path_out/'script_succ.sh.stdout').read_text(), '0\n')

GPU

class ResourcePoolGPU[source]

ResourcePoolGPU(path) :: ResourcePoolBase

Vends locked access to NVIDIA GPUs

# wp = ResourcePoolGPU('data')
# wp.find_next()

This is a resource pool that uses pynvml to find GPUs that aren't being used (based on whether they have memory allocated). It is implemented by overriding two methods from ResourcePoolBase. Usage is identical to FixedWorkerPool, except that you don't need to pass in worker_ids, since available GPUs are considered to be the resource pool.

ResourcePoolGPU.is_available[source]

ResourcePoolGPU.is_available(ident)

If a GPU's used_memory is less than 1G and is running no procs then it will be regarded as available

ResourcePoolGPU.all_ids[source]

ResourcePoolGPU.all_ids()

All GPUs