ImgArchive Wiki : Metadata Types

This refers to how the data is generated and is divided into levels. Each level adds greater descriptive value but at the expense of additional effort. This then may be offset by the ease of finding images in the archive. Automatically generated. This is the lowest level of metadata but lays a base for all higher levels and contains the boiler plate information on the image. Then moving up to hand edited information taking the most time to generate.

This information can come from five sources:

base template information
The image file information
Exif standard information
Tool generated information.
Hand edited information.

base template information

This information is default information held in the base template. This is stored in the templates folder. This information contains information that almost all images in the archive will contain. If you are the only photographer submitting to the archive then your name and copyright will be in the base template.

The image file information

This information is generated from the attributes of the file. For example the file will have the following information:

Creation date and time
Modified data and time
A file size
A file name
A file path in the archive

In addition ImgArchive will use the image file to generate the CRC and MD5 checksums. These checksums are used the guarantee the image has not changed after the checksum has been made. For example, if a checksum was created from an image. If that image is then changed (even one byte), if a new checksum is taken then the new checksum will not match the old indicating the image has indeed changed.

The checksums can be used for a number of purposes in the archive. This includes checking for accidental modification of images and testing if an image is not the same as another. One important use is preventing duplicates of an image being added to the archive by maintaining a list of checksums of images already in the archive. It is very unlikely for two different images having the same checksum (I in 14 million in the case of CRC). The MD5 checksum is even less likely to produce the sane checksum for different images.

A form of UUID (Universal Unique identifier) is also generated for each image. This is a unique identifier generated using a random key based the time and date in seconds plus other random data found on the commuter the identifier is generated.

All this information is automatically saved a part of the images asset properties.

Exif standard information

Exif information is a photographic standard that most if not all camera manufactures use to encode information into images. It is a photographic standard by which information about both the image and the camera that captured that image is stored within the image file. In order to read this information an Exif reader is required. This information is normally stored in JPG images; however other formats can also have the Exif information encoded.

ImgArchive can read JPG image and generate the basic Exif information. In order to read the complete Exif information from formats other than JPG the an external reader will need to be used. ImgArchive can be integrated with a number of external readers such as exiftool. Then ImgArchive fails to read Exif in a image file will (if installed) use the external reader. Or optionally always use the external Exif tool.

Tool generated information.

ImgArchive will automatically generate some metadata for identification purposes this includes the following:

The CRC checksum of the image
The MD5 checksum of the image
A Unique Identifier of the image
A sequence identifier of the image

The first two items are to finger-print the image. The CRC is a number that is generated by reading each byte of data in the image and creating which is unique to the contents of the image. If one byte is changed in the image then the checksum will not match. There are about 14 million combinations. The next finger-print checksum is an MD5 digest. This is used to securely encode a finger-print of the image. This type of encoding is use by encryption products to validate secure media.

The MD5 algorithm is a widely used hash function producing a 128-bit hash value. Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption.

Like most hash functions, MD5 is neither encryption nor encoding. It can be reversed by brute-force attack and suffers from extensive vulnerabilities. However a large amount of time and effort needs to be used to attack MD5 and so for this propose is more than adequate.

The last two items identify the image. The first is a unique number that is based on the time and the machine the number the number was generated. As the time is unique and the machine network adapter contains a unique number to then generate a long random number sequence almost guarantees a unique number. These numbers are normally call UUIDs and are used uniquely identify digital media. The second is a sequence number used within the archive. If the database is used then this number is generated by the database, otherwise increasing number is used in a configuration file. This number will be unique within the archive.

Bulk entry metadata

This is second level template data entry based on a set of images. Most imports into the archive will be a number of images taken in the same context. It may be a photo shoot of a subject or location, most of the images will contain the same information. ImgArchive uses a second level template process to apply this metadata. When the images are to be imported all the images that are related need to be within the same folder. When that folder is imported then the template to be associated with the images within that folder can be specified. All the image will then contain the information in the template and the end of the import process.

See Metadata Template

Unique image metadata

This is the most time-consuming as it is applied to individual images normally by you the user of the archive. This sort of information may uniquely identify image to you. For example, in an image description you may put “James blowing out candles at his birthday at 5 years old.” The content being James blowing out candles? This description will make finding this image quite easy, but time consuming to do. But the importance of the image may make it worth doing.

How ImgArchive handles Metadata

ImgArchive uses a number of diffident methods to generated and manipulate metadata from the base template bulk Metadata to the top level image specific metadata. Starting at the bottom the template based data is entered using cascading templates. These templates are contained in the template folder. The first default information is placed in this template. The sort of information that fits in here is probably your name in the Author field and your copyright information. This will make sure your name and copyright is on all images in the archive.

To create this file you will find a file call template.dat. This contains all the fields ImgArchive will read, however each field is commented-out. Copy this file to a file called base.dat, open it, and uncomment the options you need. For example the author field will look like the following: Author = Your Name To update it with your correct find The next level may be the general information about

Camera Generated Metadata.

When the camera captures an image it will store information on that image at the same time, this can be quite detailed. However most of the important details can be stored in a short set of data this will include the image size, is orientation, the date and time the image was taken, the ISO, aperture, exposure, if a flash was used and in some circumstances the GPS location. This information is in a standard called EXIF. ImgArchive will read the basic EXIF information. However it can also operate with other tools to provide a more detail set of EXIF date if required.

Metadata Templates

Metadata templates are used to create basic information to be attached to images. For example, you will be the author of the images you take and modify you also will most probably be the copyright owner etc. However a friend may have asked you to archive some image for them, in which case the author and copyright will be them. To apply the correct keywords a new template can be added that is read after the one applying your name with the friends name, thus there name will be the last keyword substitution. This information will normally be cascaded i.e. base data may be preplaced will more targeted data for the image that the templates relate. The templates are text files that contain a metadata keyword and the value that the will be placed in that keyword. One template may include other templates. Templates are read from the top down to the bottom of the page, any included templates will be read when the include statement is encountered. A primary template will be associated with an image or images. This will be the first template to be read and the process will continue until this template is read from top to bottom. An in memory template will be created at the end of the session to be used in further metadata processing namely the addition of Exif metadata and user generated metadata.

This contains the metadata to be substituted during the template process. As each template is read the contents will be placed into this class. This class will be used for further substitutions by the Exif reading and user defined keyword substitutions.