Auxiliary File System¶
Overview¶
To build corpus dataset, each procedure requires temporary files to store data in disk, not memory. While the scale of each corpus is increased, we faced to handle enormously large files. There is no problem to save all temporary files and remove after the build with small corpora. However, if we store them with extremely large ones thoughtlessly, low disk space error may be occurred.
Hence, we designed Auxiliary File System which manages temporary (or
literally, auxiliary) files simply and automatically. It manages whole
auxiliary files created from their manager. The manager records their
auxiliary-scope level to determine which one is currently in unused state and
removes the files that are in unused state for cleaning-up. Every
Builders use AuxiliaryFiles
in build.
Reference¶
-
class
langumo.utils.auxiliary.AuxiliaryFile(name)[source]¶ An auxiliary file object.
Note
It is not recommended to create this class directly without
AuxiliaryFileManager.- Parameters
name (
str) – auxiliary file name.
-
class
langumo.utils.auxiliary.AuxiliaryFileManager(parent)[source]¶ Auxiliary file manager.
- Parameters
parent (
str) – parent workspace directory which will be used for containing auxiliary files.
-
clear()[source]¶ Remove unused auxiliary files.
AuxiliaryFileManager automatically traces unused auxiliary files and remove them to manage the disk space. The manager determines that auxiliary files which are non-locked and have lower auxiliary-scope level – not created in current scope – are in unused state and unnecessary ones. If some files should be preserved, use
lockandsynchronize.
-
create()[source]¶ Create new auxiliary file.
The auxiliary file is usually used as a temporary file. It will be created in
parentdirectory and have current auxiliary level.- Return type
- Returns
new auxiliary file object.
-
synchronize(files)[source]¶ Synchronize auxiliary levels to current.
Some files created in lower
auxiliary_scopeneed to be handled as higher-scope ones. It synchronizes the auxiliary levels of the given files to current scope level.