In order to add images into the archive you must Import them. Image Archive allows you to import images in many ways so to support the growing number of options cameras have to shoot and manipulate those images while on the camera memory card.
Most of today’s cameras can shoot a number of formats, but these are grouped into two types, RAW and Picture formats. A growing number of cameras can now shoot a mixture.
RAW images are those that are direct from the cameras image array with little or no processing by the camera. These are normally proprietary belonging to the camera makers own format. Picture formats are those formats that applications can view directly because they follow a industrial standard that is widely recognized. These are JPG, TIFF, PNG, BMP etc.
Supporting RAW and Picture formats.
As digital photography evolves new file formats will be introduced, so in order to support new formats Image Archive contains all the file format extension into a configuration file. When Image Archive starts it loads this file. So that new formats can be added the Allow command allows to add those formats see Image Types allowed into the archive
Considerations to be made when importing images
When you add an image or images if you do some activates now it will make things simpler later. For example: if a batch of images is of a holiday all at the same location now is the time to tag them with the same information rather than having to tag them later individually.
Adding Metadata
As part of the import process, bulk metadata information can be added. This can take the form:
Process information
File information
EXIF information
User added information
Process information
This data is add as part of the import process, an example is the image sequence number. The sequence number is given to each image making it indexable within the archive. It is generated a import time.
Sequence ID
UUID
File information
The image is contained within a file. This file contain important properties such as the file name, size etc. These file attributes are held as part of the Metadata. In addition the contents of the file provides the uniqueness of the image. A list of these attributes are as follows:
File Name
Original Name
File Path
Media Type
MD5
CRC
File Size
Date Create
Date Modified
Date Added
These file attributes are added to the images metadata. Note the MD5 and CRC are generated from the contents of the file by the import process. see
EXIF information
This data is generated when the image is captured in the camera. The EXIF data contains a lot of information about the image.
User added information
This information is added by the person importing the image or images. This takes the form of a template of information that can be applied to all the images as a batch. This is normally referred to as Bulk Metadata.
File Naming
The Import Work flow
The Import Work flow is how you put images into the Archive. This work flow is a multi-stage process and follows a standard work flow.
This can be split in into two main parts.
Identifying images
Importing images into the archive
Most of today’s cameras can shoot a number of formats, but these are grouped into two types, RAW and Picture formats.
RAW images are those that are direct from the cameras image array with little or no processing by the camera. These are normally proprietary belonging to the camera makers own format. Picture formats are those formats that applications can view directly because they follow a industrial standard that is widely recognized. These are JPG, TIFF, PNG, BMP etc.
A growing number of cameras can now shoot a mixture and both RAW and Picture formats. In addition, cameras can save both a RAW image from the cameras image array and a Picture formats image normally a JPG, to the camera’s memory card containing the same image.
Supporting RAW and Picture formats.
ImgArchive during the scan process must identify images but reject non-images. It does this by reading the files extension.
As digital photography evolves new file formats will be introduced, so in order to support new formats Image Archive contains all the file format extension into a configuration file. When Image Archive starts it loads this list, as each file is scanned if the files extension matches an image format it will be included in the import otherwise rejected.
A most detailed description can be found in Image Types allowed into the archive.
Image groupings
Then a camera generates both a RAW and a Picture it would be useful to group the images together with the same number. ImgArchive will try and carry out this when scanning the folders containing the images. However as this moment ImgArchive needs both images to be found in the same folder. For example if a folder contained a files called DSC_675455.nef and DSC_675455.jpg ImgArchive will pair these images together.
Part 1. - Identifying images
The work flows common steps are as follows:
Check the file is an image.
If file is a image make entry
Part 2. - Importing images into the archive
Once the images are identified, then the images can be imported. Images can be identified using a Journal file. or as input from part 1.
Importing images into the Archive
This is the most common and most time consuming work flow a the set. The work flows common steps are as follows:
Check for duplication.
rename image file names
copy from camera to master archive.
copy to backup one
copy to backup two
check the integrity of the image on the camera, the master archive and the two backups are identical.
erase images from camera (normally re-format camera card).
Deduplication
Deduplication is a process by which images are tested for uniqueness. If an images is not unique then it is a duplication of a an image already in the archive. for example, some images may have been imported already as part of an earlier set of images.
We do not want the same image duplicated in the archive? To stop this all images have a CRC and MD5 checksum made. when an image is a candidate for inclusion into the archive its checksum is matched. If it matches then the image is a duplicate.
The probability of just two checksums accidentally being the same with different images is approximately: 1.47*10-29
That is 1.47 followed by 29 zeros. The number of zeros in a billon is 9.
Extracting EXIF Information
Historically digital cameras shot JPG formatted and included information about the image embedded within the image file. However as cameras progressed camera manufacturers started using proprietary RAW image formats. This meant, proprietary reader is needed to be employed to read the EXIF information out of RAW images.
ImgArchive contains a basic EXIF reader for reading picture format images such as JPG. However it can not read most RAW formatted images.
To address this problem ImgArchive provides the capability of installing an external Exit reader see External EXIF Reader
These external readers such as exiftool will read these proprietary RAW image formats.
Image file name clash problem
When a camera creates images on the memory card inserted in the camera the file name will normally conform to the DCIM (Digital Camera IMages) standard. As part of the standard will create a number sequence between image files. This will make the image file names on the memory card unique. However each time a new empty memory card is inserted then the number sequence will restart. When images are imported into the archive image file names may be the same. If the image file has the same file name as a previously imported image taken on the same day it will over-write the previous one.
For example, each time the memory card is used the first image file name will be DSC0000001.jpg. So there will always be a file name of DSC0000001.jpg each time the card is re-used.
ImgArchive prevents this from happening by spotting the clash and re-naming the second instance of the same name.
The Import Work flow - This is the work flow for getting images into the archive.