What is the TrID database?
The TrID database contains file signatures of over 10000 file types. A file signature can be used to determine the file format, whether it is a .DOC file or a .TXT file.
The database is constantly updated and expanded. Due to the large database, an unknown file can be analyzed very accurately. An 'Online TrID File Identifier' is also available for this, which reads in the file to be analyzed and compares it with the database. The results are presented in the order of highest probability.
TrID was launched in 2004 by Marco Pontello (Italy).
Identifying the file type by file signature is more accurate than by file extension. Please also read the next paragraph:
How to determine the file format
The format of a file, i.e. the file type can be determined by the following three characteristics:
- File name (file extension)
Most often, the file format is determined by the file extension. The file format identifier is after the last dot of the file name. Since these file name extensions were limited to three characters in old operating systems, even today most file formats are identified by a one to three-digit identifier, e.g. .H or .DOC.
Determining the file type by means of file extension is not always accurate, since different formats use the same file extension, the user can accidentally rename a file extension and current versions of Windows mask the file extension and viruses (e.g. 'photo.jpg.exe') can hide under 'photo.jpg'. - File content (file signature)
Often the content of a file always starts with the same string. If you open e.g. a PDF file with a text editor, the file starts with '% PDF-1'. The first characters in a PNG file, on the other hand, are '‰ PNG'. Since one can diagnose the file type much more accurately than with the file extension on the basis of such defined character strings, these strings are also called file signatures or 'magic numbers'.
The TrID database contains file signatures and associated file types. - Metadata (MIME type)
The determination of the file type by means of Multipurpose Internet Mail Exstentions is indeed most accurate, but the metadata must be transmitted separately in the header.
In which way can file signatures be displayed?
File signatures are data used to identify or verify a file type. Such signatures are also known as magic numbers and can be represented in one of the following formats:
- HEX: One byte (8-bit) is represented as a two-digit number with 16 digits (0, ..., 9, A, ..., F) (hexadecimal system)
If you open a PDF file in a HEX editor, it begins with the string 25 50 44 46 2D 31 2E. - ISO 8859-1: text in 8-bit character encoding
If you open a PDF file in a normal text editor, it starts with the string % PDF-1. - ASCII: 7-bit character encoding that exactly maps to the lower range of ISO 8859-1 (from 32 to 126).
If you open a PDF file in an ASCII text editor, it starts with the string % PDF-1.