Clean output from step folders

Clean workflow steps’ output.

This module concerns removing unnecessary files, compressing, and archiving files with the same extension to reduce space and stress when listing files in the modules’ step folders.

The two main functions of this module are:

See also the command-line clients haddock3-clean and haddock3-unpack.

haddock.gear.clean_steps.clean_output(path: str | Path, ncores: int = 1) None[source]

Clean the output of step folders.

This functions performs file archiving and file compressing operations. Files with extension seed, inp, out, and con are compressed and archived into .tgz files. The original files are deleted.

Files with .pdb and .psf extension are compressed to .gz files.

Parameters:
  • path (str or pathlib.Path) – The path to clean. Should point to a folder from a workflow step.

  • ncores (int) – The number of cores.

haddock.gear.clean_steps.unpack_compressed_and_archived_files(folders: Iterable[FilePathT], ncores: int = 1, dec_all: bool = False) None[source]

Unpack compressed and archived files in a folders.

Works on .gz and .tgz files.

Registers folders in UNPACK_FOLDERS where compressed and archived files were found.

Parameters:
  • folders (list) – List of folders to operate.

  • ncores (int) – The number of cores.

haddock.gear.clean_steps.update_unpacked_names(prev: Iterable[str | Path], new: Iterable[str | Path], original: list[Union[str, pathlib.Path]]) None[source]

Update the unpacked path names.

Sometimes the step folders are renamed to ajust their index number. Such operation happens after the output data is unpacked. This module, haddock.gear.clean_steps, keeps registry of the folders unpacked to the correct funtioning of the extend_run module.

Given the : list[FilePath] names and the new names of the step folders, this function updates them in the storing list.

Examples

>>> : list[FilePath] = ['0_topoaa', '4_flexref']
>>> prev = ['0_topoaa', '4_flexref', '5_seletopclusts']
>>> new = ['0_topoaa', '1_flexref', '5_seletopclusts']
>>> update_unpacked_names(prev, new, original)
>>> original
['0_topoaa', '1_flexref']

This function only evaluate the name of the last folder. And maintains the type in the original list.

>>> original = ['0_topoaa', Path('4_flexref'), '5_seletopclusts']
>>> prev = ['0_topoaa', 'run_dir/4_flexref', '5_seletopclusts']
>>> new = ['run_dir/0_topoaa', '1_flexref', 'run_dir/2_seletopclusts']
>>> update_unpacked_names(prev, new, original)
>>> assert original == ['0_topoaa', Path('1_flexref'), '2_seletopclusts']
Parameters:
  • prev (list of str or pathlib.Path) – The list of the original names before they were changed.

  • new (list of str or pathlib.Path) – The list of the new folder names.

  • original (list of pathlib.Path) – The list containing the names to record and which names will be changed.

Returns:

None – Edits original in place.