ImgArchive Wiki : Importing Images into the Archive

In order to add images into the archive you must Import them. Image Archive allows you to import images in many ways so to support the growing number of options cameras have to shoot and manipulate those images while on the camera memory card.

Most of today’s cameras can shoot a number of formats, but these are grouped into two types, RAW and Picture formats. A growing number of cameras can now shoot a mixture.

RAW images are those that are direct from the cameras image array with little or no processing by the camera. These are normally proprietary belonging to the camera makers own format. Picture formats are those formats that applications can view directly because they follow a industrial standard that is widely recognized. These are JPG, TIFF, PNG, BMP etc.

Supporting RAW and Picture formats.

As digital photography evolves new file formats will be introduced, so in order to support new formats Image Archive contains all the file format extension into a configuration file. When Image Archive starts it loads this file. So that new formats can be added the Allow command allows to add those formats see Image Types allowed into the archive

Considerations to be made when importing images

When you add an image or images if you do some activates now it will make things simpler later. For example: if a batch of images is of a holiday all at the same location now is the time to tag them with the same information rather than having to tag them later individually.

Adding Metadata

As part of the import process, bulk metadata information can be added. This can take the form:

  • Process information

  • File information

  • EXIF information

  • User added information

Process information

This data is add as part of the import process, an example is the image sequence number. The sequence number is given to each image making it indexable within the archive. It is generated a import time.

  • Sequence ID

  • UUID

File information

The image is contained within a file. This file contain important properties such as the file name, size etc. These file attributes are held as part of the Metadata. In addition the contents of the file provides the uniqueness of the image. A list of these attributes are as follows:

  • File Name

  • Original Name

  • File Path

  • Media Type

  • MD5

  • CRC

  • File Size

  • Date Create

  • Date Modified

  • Date Added

These file attributes are added to the images metadata. Note the MD5 and CRC are generated from the contents of the file by the import process. see

EXIF information

This data is generated when the image is captured in the camera. The EXIF data contains a lot of information about the image.

User added information

This information is added by the person importing the image or images. This takes the form of a template of information that can be applied to all the images as a batch. This is normally referred to as Bulk Metadata.

File Naming

The Import Work flow

The Import Work flow is how you put images into the Archive. This work flow is a multi-stage process and follows a standard work flow.

This can be split in into two main parts.

  • Identifying images

  • Importing images into the archive

Most of today’s cameras can shoot a number of formats, but these are grouped into two types, RAW and Picture formats.

RAW images are those that are direct from the cameras image array with little or no processing by the camera. These are normally proprietary belonging to the camera makers own format. Picture formats are those formats that applications can view directly because they follow a industrial standard that is widely recognized. These are JPG, TIFF, PNG, BMP etc.

A growing number of cameras can now shoot a mixture and both RAW and Picture formats. In addition, cameras can save both a RAW image from the cameras image array and a Picture formats image normally a JPG, to the camera’s memory card containing the same image.

Supporting RAW and Picture formats.

ImgArchive during the scan process must identify images but reject non-images. It does this by reading the files extension.

As digital photography evolves new file formats will be introduced, so in order to support new formats Image Archive contains all the file format extension into a configuration file. When Image Archive starts it loads this list, as each file is scanned if the files extension matches an image format it will be included in the import otherwise rejected.

A most detailed description can be found in Image Types allowed into the archive.

Image groupings

Then a camera generates both a RAW and a Picture it would be useful to group the images together with the same number. ImgArchive will try and carry out this when scanning the folders containing the images. However as this moment ImgArchive needs both images to be found in the same folder. For example if a folder contained a files called DSC_675455.nef and DSC_675455.jpg ImgArchive will pair these images together.

Part 1. - Identifying images

The work flows common steps are as follows:

  1. Check the file is an image.

  2. If file is a image make entry

Part 2. - Importing images into the archive

Once the images are identified, then the images can be imported. Images can be identified using a Journal file. or as input from part 1.

Importing images into the Archive

This is the most common and most time consuming work flow a the set. The work flows common steps are as follows:

  1. Check for duplication.

  2. rename image file names

  3. copy from camera to master archive.

  4. copy to backup one

  5. copy to backup two

  6. check the integrity of the image on the camera, the master archive and the two backups are identical.

  7. erase images from camera (normally re-format camera card).

Deduplication

Deduplication is a process by which images are tested for uniqueness. If an images is not unique then it is a duplication of a an image already in the archive. for example, some images may have been imported already as part of an earlier set of images.

We do not want the same image duplicated in the archive? To stop this all images have a CRC and MD5 checksum made. when an image is a candidate for inclusion into the archive its checksum is matched. If it matches then the image is a duplicate.

 The probability of just two checksums accidentally being the same with different images is approximately:  1.47*10-29

That is 1.47 followed by 29 zeros. The number of zeros in a billon is 9.

Extracting EXIF Information

Historically digital cameras shot JPG formatted and included information about the image embedded within the image file. However as cameras progressed camera manufacturers started using proprietary RAW image formats. This meant, proprietary reader is needed to be employed to read the EXIF information out of RAW images.

ImgArchive contains a basic EXIF reader for reading picture format images such as JPG. However it can not read most RAW formatted images.

To address this problem ImgArchive provides the capability of installing an external Exit reader see External EXIF Reader

These external readers such as exiftool will read these proprietary RAW image formats.

Image file name clash problem

When a camera creates images on the memory card inserted in the camera the file name will normally conform to the DCIM (Digital Camera IMages) standard. As part of the standard will create a number sequence between image files. This will make the image file names on the memory card unique. However each time a new empty memory card is inserted then the number sequence will restart. When images are imported into the archive image file names may be the same. If the image file has the same file name as a previously imported image taken on the same day it will over-write the previous one.

For example, each time the memory card is used the first image file name will be DSC0000001.jpg. So there will always be a file name of DSC0000001.jpg each time the card is re-used.

ImgArchive prevents this from happening by spotting the clash and re-naming the second instance of the same name.

The Import Work flow - This is the work flow for getting images into the archive.