Introduction

All major computer systems have some form of mass storage, this is normally one or more hard drives. These drives can now have 18T bytes or more capacity, coupled with the fact that digital cameras can shot a large number of images in one photo shoot means having some formal method of storing images on the disk becomes much more important. In addition having an automated method even more important as there is less chance of the images being stored in the wrong place and possibly losing them. This is one of the key uses of DAM software such as ImgArchive.

The storage strategy

The storage strategy must be simple but robust and independent of content. Using the image content is a popular way of organising an archive. However this in the majority of cases renders this folder management unwieldy. For example, using this approach do you organise the folders by keyword, client, project etc. how is handling images that belong to a number of groups which will translate to folders. The image will end up being duplicated in a lot of folders. This can also result in a lot of work managing which folders belong at the time the image are added to the archive. Addition, backing up will be needlessly complicated. Enter the partition structure.

The partition structure

Some also call this the bucket system, as each folder contains a bucket of images of unknown content. However data partitions are a common way of organising data within a database. This is done for manageability, performance and possibly availability reasons. This is used in this case for the same reasons, i.e. partitioning of data into manageable units, the data in this case is images. The manageability, performance issue is that locating an image file in large numbers other image files will be time consuming and having large numbers of image in one folder is more difficult to manage.

In databases partitions are organised by a primary index. However, ImgArchive uses the image capture date as a primary index for storing images in the archive. The archive uses folders to partition the stored images into the manageable subsets. These subsets are partitioned into years. All images using a date that fall in a particular year will be stored under folder of that year. Within a year folder the year is partitioned further into days, as there are 365 maximum days in the year then this is a manageable number of partitions within a year’s main partition. So for any image with an archive date, can be found a year / day folder in which to place it. This also is a useful scheme in that most photographers will have a rough idea when an image was taken, so to then find images in the archive folders is not too difficult. Flickr uses a similar system and calls it a camera roll.

The folders are named in a simple but consistent way. The year folder is simply the year, i.e. “2014”. Then the day folder is created using the year, month and day in the form , for example: “2014_08_12” will be 12 August 2014. Having the sequence year, month and day allows the folders to be automatically sorted in date order when viewed using a file browser such as File Explorer on windows or finder on the MAC. The location of images can be found quickly and unambiguously. Time also moves normally in one way so as time moves on new partitions are added, older partitions are used lesser normally so archiving to more permanent media easier.

Using the partition structure above means any image has a unique address in the archive using the partition address and image file name.

Metadata

Metadata is one of the most important andi powerful tools which into manage the archive. Metadata is a term that generally refers to data about data: In this case the information (data) is about an image or images (data) or put simply, it means information about your photographs.

Many photographers think that metadata is the IPTC information, XMP information, or EXIF information. however, it is more general than that: it encompasses everything that is known about an image, thus can be the IPTC information, XMP information, or EXIF information but can also be more than that.

Master and Derivative Repositories

When you take images on a digital camera the images are normally stored on something like an SD card. These images will then be imported into ImgArchive. As far as ImgArchive in concerned the images imported are the original masters from the camera. One of the key concepts is that ImgArchive will not damage or modify an image only copy then modified. So ImgArchive maintains a Master Repository that contains only original masters from the camera. When a modification to a master is to be made then a copy of the master is made and put into the Workspace. If modifications are made to this copy. then the copy is a derivative of the master and will be put into the derivative Repository.

The two Repositories share a common structure but store different images properties. For example: only the master repository will store how the image was created in the digital camera. details such as ISO, Aperture and shutter speed etc. The new version of the original will inherit all the properties of the original plus details of the changes made to original.

Master Repository

Derivative Repository

The Archive Date

The SIA as explanted above the archive uses a date to place images. The most significant dates are: 1. The date the image was captured in the camera 2. The file date on the image file 3. The archived date

The date the image was captured in the camera

Out of the three dates the capture date is probably the most useful as it places the image in the archive at the time it was taken. This date may not be found as when the images file is taken from the camera a new create date in the image of the date the image was moved from the camera to the computer storing the image. However where is a way of recovering the capture date but this will only be present in some images that have EXIF data? EXIF data is used by the camera manufactures to store information about an image in the image. One of the values stored is the capture date. By reading this date from the image file then the capture date can be recovered. This relies on the EXIF data being present if not then the capture date is lost.

The file date on the image file

The file image creation date will always be present as it is a property of the image file that always must exist. This date may be a large period away from the capture date as this is the date you transfer your captured image from the camera. If you do this frequently then the then the date will not be too far from the capture date.

The archived date

The date on which the image was place in the archive is the last significant date. Is date can be even further away in the future from the capture date? However if you add images into the archive at the same time as transferring from the camera then again you will not be too far out from the capture time.

Default archive date

By default SIA will try to find the earliest date or a date that is most likely to be the capture date. This is carried out by the following. 1. Read EXIF information from all images in the set (in the case of RAW and JPG pairs). 2. Find the create date from all image files. 3. Use todays date.

Reading EXIF information

SIA will read EXIF information in two ways: Using an Internal EXIF reader and (if configured) using an external reader. SIA will try both methods; the internal reader will be tried first then the external reader. All images in the set will be read for any EXIF and the results stored. This EXIF data will also be used to populate the metadata information for each image and set metadata. This is metadata that is shared across all images in the set and crucially the shared archive date. If one of the images is missing data, by default the information will be filled in from other images if available. By default in an image set with both a RAW and JPG images the information in the RAW will be the primary information source. However the JPG will most likely contain EXIF data.

Finding the create date from an image pair

If the capture date cannot be found then the create date will need to be used. however in an RAW/JPG image pair, the RAW file create-time will be used if the capture date cannot be found in the EXIF data in the JPG image in the pair,

Use todays date.

As a fall back and no other date can be found todays date will be used. This is normally a last resort.

Date checks

SIA will do a sanity test on the dates. if the dates do not make sense then the image will not be added to the archive and an error raised to flag there is a possible error in the dates. The Capture dates, the image file dates and todays date must be each be in the future i.e. the capture date must be the earliest date, the image file must be the same or later that the capture date and finally todays date must be the same or later than the image file date. Anomalies in the dates can be due to the time and date being incorrectly set on the camera or archive computer. The default archive date can be overridden in these cases. One common problem is that the camera date may not be set from the manufacture’s default.

Specifying and controlling the archive date.

SIA has a number of options to specify and control the archive date. These can be used to override the default archive date selection. These are as follows: Use date options: These are used to override the archive date being selected. These are used if the default dates are somehow incorrect and a new date needs to be used. These options are as follows: 1. Use todays date 2. Use file date 3. Use date

Use todays date This option forces the archive date to be todays date. All other dates are ignored. --use_date_today

Use file date

Use file date forces the archive data to be the image file date. In the case of a RAW/JPG pair then the RAW image file date will be used. --use_file_date

Use date

This option forces the archive date to be the date specified in the date argument. This can be used in the case were the capture date in the camera may be incorrect and the correct date to be explicitly specified. --use_date= For example: --use_date=2014.07.12 The above will set the archive date to the 12 day of July 2014.

Image Extensions

In the SIA configuration folder is the Image extension file. This file contains the file extensions that SIA uses to identify the media encoding type and determine which category type an image fits. There are three categories; 1. Displayable images (Picture types). 2. Raw image (RAW Types). 3. Images that may be displayable, but will not be raw images. These may have been generated by a camera or some third party image editor. Most image editors will save the edited image with a known extension. For example Photoshop uses the extension *.psd” (Image types).

Some images may fit in more than one category. The most often images the a camera generates are “JPG” files these will normally have the extension “.jpg” the other most often generated files that a camera generates are RAW images these are more tricky. Each camera manufacturer uses their own defined file extension to denote RAW image, in some cases more than one extension to denote RAW images For example Nikon uses “.NEF” to denote the file is a raw image file, Canon on the other hand use “CRF” to denote a raw image file.

The image extensions file is split into three sections, one for each category type. An example file is shown below. ` <!-- Raw image (RAW Types)-- > Dng=raw,Adobe Digital Negative NEF=raw,Nikon RAW CRF=raw,Canon RAW