FIELD: computer engineering.
SUBSTANCE: invention relates to computer engineering. Result is achieved due to the method of hashing files for fast search of duplicates, which consists of the following stages: protocol parser receives originals of files and their metadata from external sources; metadata and originals of files are stored in a database; files are scanned against the database, the hash for which has not been calculated yet; file for which hash has not yet been calculated is hashed, wherein if such a file is less than a given size, then the file is hashed completely, and if the size of the file is greater than the given size, then its first or last blocks of the given size are hashed; file size is added to the obtained hash, and the obtained hash is stored in a database with reference to the file; selection of specific files in a database in accordance with their hashes and search for duplicates.
EFFECT: faster and more accurate search for duplicate files and reduced load on the CPU and data storage system.
3 cl, 1 dwg
Authors
Dates
2024-08-27—Published
2024-02-29—Filed