Here's what fastgpu does:
- poll to_run
- find first file
- check there's an available worker id
- move it to running
- handle the script- create lock file
- redirect stdout/err to out
- run it
- when done, move it to completeorfailed
- unlock
 
For demonstrating how to use fastgpu, we first create a directory to store our scripts and outputs:
path = Path('data')
These are all the subdirectories that are created for us. Your scripts go in to_run.
path_run,path_running,path_complete,path_fail,path_out = setup_dirs(path)
Let's create a scripts directory with a couple of "scripts" (actually symlinks for this demo) in it.
def _setup_test_env():
    shutil.rmtree('data')
    res = setup_dirs(path)
    os.symlink(Path('test_scripts/script_succ.sh').absolute(), path_run/'script_succ.sh')
    os.symlink(Path('test_scripts/script_fail.sh').absolute(), path_run/'script_fail.sh')
    (path_run/'test_dir').mkdir(exist_ok=True)
_setup_test_env()
These functions are used to find and run scripts, and move scripts to the appropriate subdirectory at the appropriate time.
test_eq(find_next_script(path_run).name, 'script_fail.sh')
assert not find_next_script(path_complete)
This abstract class locks and unlocks resources using lockfiles. Override all_ids to make the list of resources available. See FixedWorkerPool for a simple example and details on each method.
The simplest possible ResourcePoolBase subclass - the resources are just a list of ids. For instance:
_setup_test_env()
wp = FixedWorkerPool(L.range(4), path)
If there are no locks, this does nothing:
wp.unlock(0)
Initially all resources are available (unlocked), so the first from the provided list will be returned:
test_eq(wp.find_next(), 0)
After locking the first resource, it is no longer returned next:
wp.lock(0)
test_eq(wp.find_next(), 1)
This is the normal way to access a resource - it simply combines find_next and lock:
wp.lock_next()
test_eq(wp.find_next(), 2)
_setup_test_env()
wp = FixedWorkerPool(L.range(4), path)
_setup_test_env()
f = find_next_script(path_run)
wp._run(f, 0)
test_eq(find_next_script(path_run), path_run/'script_succ.sh')
test_eq((path_out/'script_fail.sh.exitcode').read_text(), '1')
assert (path_fail/'script_fail.sh').exists()
_setup_test_env()
wp.poll_scripts()
assert not find_next_script(path_run), find_next_script(path_run)
test_eq((path_out/'script_fail.sh.exitcode').read_text(), '1')
test_eq((path_out/'script_succ.sh.exitcode').read_text(), '0')
assert not (path_run/'script_fail.sh').exists()
assert (path_fail/'script_fail.sh').exists()
assert (path_complete/'script_succ.sh').exists()
test_eq((path_out/'script_succ.sh.stdout').read_text(), '0\n')
# wp = ResourcePoolGPU('data')
# wp.find_next()
This is a resource pool that uses pynvml to find GPUs that aren't being used (based on whether they have memory allocated). It is implemented by overriding two methods from ResourcePoolBase. Usage is identical to FixedWorkerPool, except that you don't need to pass in worker_ids, since available GPUs are considered to be the resource pool.